Kraken is a convenience package that orchestrates data extraction by integrating SQL with Python, while managing safe storage and recall of sensitive connection credentials.
Developed in the NHS at University Hospital Southampton to facilitate clinical research in a complex multi-database environment, Kraken provides streamlined management of Reproducible Analytical Pipelines (RAPs) from multiple data sources by automating extraction, parsing, connection and execution of SQL files. Offering multiple levels of control, Kraken can run an entire SQL data pipeline from extraction to export with as little as one line of Python code, or provide more fine-tuned management of entire ETL flows through use of underlying connector objects. Kraken also provides standardised statistical summaries and graphing for speedy interrogation of results.
Wrapping around packages like keyring
, sqlalchemy
, pyodbc
, pandas
, matplotlib
and seaborn
, Kraken:
- Safely stores sensitive database connection credentials under an 'alias' in your operating system's credential store, and recalls them on demand and as dictated by SQL files;
- Extracts all SQL from a targeted file or folder of files (sequentially or simultaneously) - parsing and splitting them into queries before executing them in order;
- Returns DataFrame results that include provenance metadata for auditable tracking;
- Allows database upload of data with fine-tuned control;
- Integrates extraction of local data files;
- Provides fast data interrogation with single-line graphing and statistical summary functions.
pip install data-kraken
import kraken
# Set default database alias (once)
kraken.set_default_alias("TEST")
# Save test connection (once)
kraken.save_connection_MSSQL(alias="TEST", server="xxx", database="xxx", username=None, password=None, default=True)
# Execute SQL file in folder
results = kraken.run(filepaths='sql_folder')
# Examine summary statistics
for result in results:
result.examine()
save_connection()
- save database service credentials to OS's credential manager under an alias for recallsave_connection_XXX()
- convenience wrapper aroundsave_connection()
for various supported database platformsexecute()
- execute SQL querycreate_connector()
- create Connector object to reuse for SQL execution
extract_sql()
- extract & parse SQL files from provided filepath, preparing collection of queriesexecute_sql()
- execute collection of queries fromextract_sql()
export()
- export results to various filetypesrun()
- wrapper around above functions, allowing extraction, execution, and optional export results from SQL files
extract_spreadsheets()
- extract data from spreadsheets
upload()
- upload single DataFrame to database (note that this always commits)upload_results()
- upload collection of DataFrames to database (note that this always commits)
examine()
- analyse DataFrame & provide high-level statistical summarygraph()
- graph DataFrame quickly with support for multiple graph types
generate_where_clause()
- loop over a DataFrame to convert rows or columns into batches of WHERE clausesdatestamp()
- get current time or datestamp as a string with formatting optionscalculate_runtime()
- get timedelta and formatted string between two datetime valuesreadout.activate()/suppress()
- turn readout on or offencode()
/decode()
- convenient wrappers around keyring to store/recall secrets
Connector
(usecreate_connector
) - object allowing fine-tuned SQL connection & execution controlParser
- object allowing SQL parsing (as used inextract_sql()
)Progress
- heavily customisable progress bar/ticker with threaded auto-refresh and context management
These database platforms are currently supported:
- Oracle
- Microsoft SQL Server
- PostgreSQL
- MySQL
- MariaDB
- IBM Informix
- Intersystems Cache
- Intersystems Iris
What we're working on next:
- Integration with BitWarden for credential storage
- Direct import/export to Microsoft SharePoint
- Pipeline execution logging and run reports
- Expanded database platform support
Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.
This work is licensed under a
Creative Commons Attribution-NonCommercial 4.0 International License.
Kraken was developed within the NHS by Cai Davis, Michael George, and Faizan Hemotra at University Hospital Southampton NHSFT, as part of the Southampton Emerging Therapies and Technology (SETT) Centre.