Transform biomedical schemas to LinkML format
LinkMorph is a modular, extensible framework for converting various biomedical schema formats (FHIR, OMOP, graph schemas, JSON Schema, etc.) into LinkML format. It's designed specifically for biomedical data integration workflows.
- Pluggable Architecture: Strategy pattern with processors for different schema types
- Biomedical-Aware: Built-in patterns for genomic, clinical, and research data
- Configurable: YAML-based configuration for domain-specific customization
- Graph Schema Support: Native support for vertex/edge graph schemas
- FHIR Ready: Handles FHIR resources and extensions
- Multi-Format: Supports JSON Schema, custom YAML schemas, and more
- Research-Friendly: Patterns for cohorts, studies, omics data, and clinical trials
- Schema-Centric Transformation Declarative field alignment using schema logic, not hardcoded scripts
git clone https://github.com/bmeg/linkmorph.git
cd linkmorph
pip install -e .
linkmorph /path/to/iceberg/schemas/graph \
--output-dir ./output \
--project-name bmeg_iceberg \
--verbose
linkmorph ./schemas \
--config ./bmeg_config.yaml \
--output-dir ./linkml_schemas
This generates:
<project_name>_linkml_schema.yaml
- main LinkML schema<project_name>_conversion_mapping.yaml
- conversion documentation
import logging
logger = logging.getLogger(__name__)
from linkmorph import LinkMorphConverter, ConversionConfig
config = ConversionConfig.default_biomedical("my_project")
converter = LinkMorphConverter(config, logger)
output_path = converter.convert_directory("./input_schemas", "./output")
Format | Description | Example Use Cases |
---|---|---|
Graph Schemas | Vertex/edge definitions | BMEG, knowledge graphs |
FHIR Resources | HL7 FHIR format | Clinical data, EHRs |
JSON Schema | Standard JSON Schema | APIs, data validation |
Generic YAML | Custom YAML structures | Research data models |
Create a linkmorph_config.yaml
file to customize behavior:
project_name: "my_biomedical_project"
type_mappings:
gene_expression_value: "float"
p_value: "float"
field_patterns:
r'.*_cohort$': 'research_group'
r'.*_pvalue$': 'statistical_significance'
r'.*_chromosome$': 'genomic_location'
processing:
auto_enum_threshold: 10
ignore_patterns:
- r'^_.*'