This project uses Synthea™ to generate realistic synthetic patient data for medical notes.
from mednotegen.generator import NoteGenerator
gen = NoteGenerator.from_config("config.yaml")
gen.generate_notes(10, "output_dir")
# Or specify Synthea CSV directory directly:
gen = NoteGenerator(synthea_csv_dir="/path/to/synthea/output/csv")
gen.generate_notes(10, "output_dir")
You can specify the Synthea CSV directory directly in your config file. Add the following line to your config.yaml
:
Example config.yaml
:
count: 10
output_dir: output_dir
synthea_csv_dir: /path/to/synthea/output/csv
Then generate notes using:
from mednotegen.generator import NoteGenerator
gen = NoteGenerator.from_config("config.yaml")
gen.generate_notes(10, "output_dir")
This project requires Synthea™, an open-source synthetic patient generator, as an external dependency. You must clone and build Synthea yourself before using mednotegen
.
To set up Synthea:
- Clone Synthea
git clone https://github.com/synthetichealth/synthea.git
- Build the Synthea JAR
Ensure
cd synthea ./gradlew build check test cp build/libs/synthea-with-dependencies.jar . cd ..
synthea-with-dependencies.jar
is in thesynthea/
directory at the root of your project.
You can customize patient generation and report output using a config.yaml
file. Example options:
count: 10 # Number of reports to generate
output_dir: output_dir # Output directory for PDFs
use_llm: false # Use LLM for report generation
synthea_csv_dir: /path/to/synthea/output/csv # Path to Synthea-generated CSV files
seed: 1234 # Random seed for reproducibility
reference_date: "20250628" # Reference date for data generation (YYYYMMDD)
clinician_seed: 5678 # Optional: separate seed for clinician assignment
gender: female # male, female, or any
min_age: 30 # Minimum patient age
max_age: 60 # Maximum patient age
state: New York # Synthea state parameter
modules:
- cardiovascular-disease
- diabetes
- hypertension
- asthma
local_config: custom_synthea.properties # Custom Synthea config file
local_modules: ./synthea_modules # Directory for custom modules
- count: Number of reports to generate
- output_dir: Directory to save generated PDFs
- use_llm: If true, uses OpenAI LLM for report text
- seed: Random seed for reproducibility
- reference_date: Reference date for age calculations (YYYYMMDD)
- clinician_seed: Optional, separate seed for clinician assignment
- gender: Gender filter for patients (
male
,female
, orany
) - min_age, max_age: Age range for patients
- state: US state for Synthea simulation
- modules: Synthea disease modules to enable
- local_config: Path to a custom Synthea config file
- local_modules: Directory for custom Synthea modules
For an up-to-date and complete list of available modules, see the official Synthea modules directory.
If you see errors about missing patients.csv
, medications.csv
, or conditions.csv
, make sure you have generated Synthea data and that the path you provide (via synthea_csv_dir
, CLI, or config) points to the correct directory containing those files.
If you installed mednotegen
via pip, the default location is inside the package directory. For custom or system-wide Synthea runs, always specify the output CSV directory explicitly.
- No CSV files generated:
- Make sure you edited the correct
synthea.properties
and used the-c
flag when running Synthea. - Ensure
exporter.csv.export = true
is set and not overridden elsewhere in the file.
- Make sure you edited the correct
- FileNotFoundError for CSVs:
- Confirm the CSV files exist in the path specified by
synthea_csv_dir
or in the expected package location.
- Confirm the CSV files exist in the path specified by
- ValueError: No patients found matching the specified filters:
- Check your age/gender filters in
config.yaml
. Try relaxing them if you have too few patients.
- Check your age/gender filters in
Edit src/main/resources/synthea.properties
in your Synthea directory:
exporter.csv.export = true
(Ensure any exporter.csv.export = false
lines are removed or commented out.)
From your Synthea directory, clean any old output and generate new data:
rm -rf output/
java -jar synthea-with-dependencies.jar -c src/main/resources/synthea.properties -p 1000
- The
-p 1000
flag generates 1000 patients. - After running, check for CSV files in
output/csv/
.
See README_SYNTHEA_NOTICE.md
and LICENSE-APACHE-2.0
for license and attribution requirements.