Generating “STD‑Only” Synthetic Data — Best Practice? #1596
              
                Unanswered
              
          
                  
                    
                      CtrlAltAryan
                    
                  
                
                  asked this question in
                Q&A
              
            Replies: 0 comments
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
        
    
Uh oh!
There was an error while loading. Please reload this page.
-
Hi Synthea team,
I’m using Synthea in a university project to build a probabilistic AI that predicts sexually‑transmitted infections (STIs). To keep the dataset focused (and small), I’d like to generate only the records relevant to STDs and strip everything else (claims, payers, unrelated chronic‑disease modules, etc.).
What I’m hoping to do
Generate patients who either
Export only diagnostic/clinical data
patients.csv,conditions.csv,observations.csv,encounters.csv,procedures.csv,medications.csvAvoid manual surgery on the modules folder if there’s a built‑in whitelist feature.
Any guidance (or “don’t do that, do this instead!”) would be hugely appreciated.
Thanks for maintaining such a fantastic open‑source project!
Best regards,
Aryan
1st‑year student — IIoT & AI Enthusiast
Beta Was this translation helpful? Give feedback.
All reactions