@@ -13,27 +13,33 @@ Cumulus ETL wants data, and lots of it.
1313It's happy to ingest data that you've gathered elsewhere (as a separate export),
1414but it's also happy to download the data itself as needed during the ETL (as an on-the-fly export).
1515
16- ## Separate Exports  
16+ ## Export Options  
17+ 
18+ ### External Exports  
1719
18201 .  If you have an existing process to export health data, you can do that bulk export externally,
1921and then just feed the resulting files to Cumulus ETL.
22+ (Though note that you will need to provide some export information manually,
23+ with the ` --export-group `  and ` --export-timestamp `  options. See ` --help `  for more info.)
2024
21252 .  Cumulus ETL has an ` export `  command to perform just a bulk export without an ETL step.
2226   Run it like so: ` cumulus-etl export FHIR_URL ./output `  (see ` --help `  for more options).
23-    You can use all sorts of
27+    -   You can use all sorts of
2428   [ interesting FHIR options] ( https://hl7.org/fhir/uv/bulkdata/export.html#query-parameters ) 
2529   like ` _typeFilter `  or ` _since `  in the URL.
30+    -  This workflow will generate an export log file, from which Cumulus ETL can pull
31+    some export metadata like the Group name and export timestamp.
2632
27333 .  Or you may need more advanced options than our internal exporter supports.
2834   The [ SMART Bulk Data Client] ( https://github.com/smart-on-fhir/bulk-data-client ) 
29-    is a great tool with lots of features.
35+    is a great tool with lots of features (and also generates an export log file) .
3036
3137In any case, it's simple to feed that data to the ETL:
32381 .  Pass Cumulus ETL the folder that holds the downloaded data as the input path.
33391 .  Pass ` --fhir-url= `  pointing at your FHIR server so that externally referenced document notes
3440   and medications can still be downloaded as needed.
3541
36- ## On-The-Fly Exports  
42+ ###  On-The-Fly Exports  
3743
3844If it's easier to just do it all in one step,
3945you can also start an ETL run with your FHIR URL as the input path.
@@ -44,6 +50,60 @@ You can save the exported files for archiving after the fact with `--export-to=P
4450However, bulk exports tend to be brittle and slow for many EHRs at the time of this writing.
4551It might be wiser to separately export, make sure the data is all there and good, and then ETL it.
4652
53+ ## Cumulus Assumptions  
54+ 
55+ Cumulus ETL makes some specific assumptions about the data you feed it and the order you feed it in.
56+ 
57+ This is because Cumulus tracks which resources were exported from which FHIR Groups and when.
58+ It only allows Encounters that have had all their data fully imported to be queried by SQL,
59+ to prevent an in-progress ETL workflow from affecting queries against the database.
60+ (i.e. to prevent an Encounter that hasn't yet had Conditions loaded in from looking like an
61+ Encounter that doesn't _ have_  any Conditions)
62+ 
63+ Of course, even in the normal course of events, resources may show up weeks after an Encounter
64+ (like lab results).
65+ So an Encounter can never knowingly be truly _ complete_ ,
66+ but Cumulus ETL makes an effort to keep a consistent view of the world at least for a given
67+ point in time.
68+ 
69+ ### Encounters First  
70+ 
71+ ** Please export Encounters along with or before you export other Encounter-linked resources.** 
72+ (Patients can be exported beforehand, since they don't depend on Encounters.)
73+ 
74+ To prevent incomplete Encounters, Cumulus only looks at Encounters that have an export
75+ timestamp at the same time or before linked resources like Condition.
76+ (As a result, there may be extra Conditions that point to not-yet-loaded Encounters.
77+ But that's fine, they will also be ignored until their Encounters do get loaded.)
78+ 
79+ If you do export Encounters last, you may not see any of those Encounters in the ` core `  study
80+ tables once you run Cumulus Library on the data.
81+ (Your Encounter data is safe and sound,
82+ just temporarily ignored by the Library until later exports come through.)
83+ 
84+ ### No Partial Group Exports  
85+ 
86+ ** Please don't slice and dice your Group resources when exporting.** 
87+ Cumulus ETL assumes that when you feed it an input folder of export files,
88+ that everything in the Group is available (at least, for the exported resources).
89+ You can export one resource from the Group at a time, just don't slice that resource further.
90+ 
91+ This is because when you run ETL on say, Conditions exported from Group ` Group1234 ` ,
92+ it will mark Conditions in ` Group1234 `  as completely loaded (up to the export timestamp).
93+ 
94+ Using ` _since `  or a date-oriented ` _typeFilter `  is still fine, to grab new data for an export.
95+ The concern is more about an incomplete view of the data at a given point in time.
96+ 
97+ For example, if you sliced Conditions according to category when exporting
98+ (e.g. ` _typeFilter=Condition?category=problem-list-item ` ),
99+ Cumulus will have an incorrect view of the world
100+ (thinking it got all Conditions when it only got problem list items).
101+ 
102+ You can still do this if you are careful!
103+ For example, maybe exporting Observations is too slow unless you slice by category.
104+ Just make sure that after you export all the Observations separately,
105+ you then combine them again into one big Observation folder before running Cumulus ETL.
106+ 
47107## Archiving Exports  
48108
49109Exports can take a long time, and it's often convenient to archive the results.
0 commit comments