A data quality profiling tool built using Python and Shiny for Python, designed to assess oncology data in up to six data quality dimensions. Completeness, Validity, Uniqueness, Timeliness, Consistency, and Accuracy.
Users can upload their dataset, configure tests, and view results in an interactive dashboard.
Forked from https://github.com/christie-nhs-data-science/DQMaRC
Instructions for Windows 11, macOS, and Linux.
- Python 3.10 or later — download from: https://www.python.org/downloads/
- Ensure
python
andpip
are accessible from your command line or terminal e.g.
pip --version
python --version
git clone https://github.com/UoMResearchIT/Data-Quality-Profiling-Tool.git
cd Data-Quality-Profiling-Tool
python -m venv venv
venv\Scripts\activate
python3 -m venv venv
source venv/bin/activate
Install all required Python packages from requirements.txt
:
pip install -r requirements.txt
Start the Shiny application:
shiny run --reload app.py
Then open your browser and navigate to:
http://localhost:8000
-
Upload your data (as a ZIP of multiple CDM files, or as a flat CSV or Excel file) in the Data Upload tab.
-
Initialise, edit, and/or upload thresholds and test parameter 'rule sheets' in the Test Parameters tab.
-
Run your checks in the Dashboard tab.
-
Test results will be displayed in an interactive dashboard in the Dashboard tab.
-
Download results from the Dashboard tab if required.