The CDISC Controlled Terminology RDF/OWL Downloader is a Python automation tool that downloads and extracts official CDISC Controlled Terminology archives directly from the NCI EVS (National Cancer Institute Enterprise Vocabulary Services) FTP repository.
With a single command, you can fetch the latest CDISC RDF/OWL files for multiple standards, organize them into date-stamped folders, and keep a clean local archive for easy retrieval.
-
Automatic Download of OWL ZIP archives for:
- CDASH – Clinical Data Acquisition Standards Harmonization
- SDTM – Study Data Tabulation Model
- SEND – Standard for Exchange of Nonclinical Data
- ADaM – Analysis Data Model
- Define-XML – Define Standards for Submission
-
Organized Output – Saves each release into its own date-stamped directory by standard.
-
Batch or Single Standard – Download all standards in one go or target a specific one.
-
ZIP Extraction – Automatically unzips archives and removes the original ZIP.
-
Python 3.x
-
Required packages:
beautifulsoup4
,urllib3
,certifi
,pycurl
Install via:pip install beautifulsoup4 urllib3 certifi pycurl
-
Set Download Path Update the
location
variable inside the script:location = 'D:\\Data\\CT_OWL'
-
Choose Standards to Download
-
Default: Download all standards
std = 'ALL'
-
Single standard options:
'CDASH', 'SDTM', 'SEND', 'ADaM', 'Define-XML'
-
-
Run the Script
python cdisc_ct_downloader.py
-
Resulting Directory Structure
CT_OWL/ ├── SDTM/ │ ├── 2024-03-15/ │ │ ├── sdtm_2024-03-15.owl │ └── 2023-12-01/ ├── ADaM/ │ └── 2024-01-20/ └── ...
- This script uses web scraping to list available files; changes in the NCI EVS site structure could require updates.
- Always review and comply with the NCI EVS Terms of Use.
- Originally built as a Python learning project — functional but not fully optimized for performance or error handling.
Author: Jimmy James GitHub: A142763
This project is licensed under the MIT License — see the LICENSE file for details.