A Python tool for validating DCAT-US 1.1 (Data Catalog Vocabulary - United States) JSON files.
DCATUS-Validator is a command-line utility developed by NASA to ensure that data catalog files conform to the Project Open Data metadata format required for federal open data initiatives. The tool validates JSON files containing dataset metadata against the official GSA DCAT schema and provides detailed error reporting for non-compliant datasets.
- Python 3.12 or higher
- Dependencies managed via
uv
(seepyproject.toml
)
- Clone the repository:
git clone https://github.com/nasa/DCATUS-Validator.git
cd DCATUS-Validator
- Install dependencies using
uv
:
uv sync
Validate a DCAT-US JSON file:
python validate.py path/to/your/data.json
python validate.py test-json/dcat1-sample.json
The validator provides console output showing:
- Number of datasets being validated
- Validation completion status
If validation errors are found:
- An
invalid_datasets.json
file is created with detailed error information - Each invalid dataset includes:
- Dataset title
- List of specific validation errors with field paths
The test-json/
directory contains sample dataset for testing:
dcat1-sample.json
- Basic valid dataset example