Skip to content

Commit 32aa23c

Browse files
committed
Improve FHIR Search Performance with Subject References
Closes: #2161
1 parent 042e347 commit 32aa23c

File tree

8 files changed

+463
-11
lines changed

8 files changed

+463
-11
lines changed

docs/performance/fhir-search.md

Lines changed: 124 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -8,13 +8,14 @@ Under ideal conditions, Blaze can execute a FHIR Search query for a single code
88

99
The following systems with rising resources were used for performance evaluation:
1010

11-
| System | Provider | CPU | Cores | RAM | SSD | Heap Mem ¹ | Block Cache ² | Resource Cache ³ |
12-
|--------|----------|------------|------:|--------:|-------:|-----------:|--------------:|-----------------:|
13-
| LEA25 | on-prem | EPYC 7543P | 4 | 32 GiB | 2 TB | 8 GiB | 8 GiB | 2.5 M |
14-
| LEA36 | on-prem | EPYC 7543P | 8 | 64 GiB | 2 TB | 16 GiB | 16 GiB | 5 M |
15-
| LEA47 | on-prem | EPYC 7543P | 16 | 128 GiB | 2 TB | 32 GiB | 32 GiB | 10 M |
16-
| LEA58 | on-prem | EPYC 7543P | 32 | 256 GiB | 2 TB | 64 GiB | 64 GiB | 20 M |
17-
| CCX42 | Hetzner | EPYC 7763 | 16 | 64 GiB | 360 GB | 16 GiB | 8 GiB | 5 M |
11+
| System | Provider | CPU | Cores | RAM | SSD | Heap Mem ¹ | Block Cache ² | Resource Cache ³ |
12+
|--------|----------|-------------|------:|--------:|-------:|-----------:|--------------:|-----------------:|
13+
| LEA25 | on-prem | EPYC 7543P | 4 | 32 GiB | 2 TB | 8 GiB | 8 GiB | 2.5 M |
14+
| LEA36 | on-prem | EPYC 7543P | 8 | 64 GiB | 2 TB | 16 GiB | 16 GiB | 5 M |
15+
| LEA47 | on-prem | EPYC 7543P | 16 | 128 GiB | 2 TB | 32 GiB | 32 GiB | 10 M |
16+
| LEA58 | on-prem | EPYC 7543P | 32 | 256 GiB | 2 TB | 64 GiB | 64 GiB | 20 M |
17+
| CCX42 | Hetzner | EPYC 7763 | 16 | 64 GiB | 360 GB | 16 GiB | 8 GiB | 5 M |
18+
| A5N46 | on-prem | Ryzen 9900X | 24 | 96 GiB | 2 TB | 24 GiB | 24 GiB | 10 M |
1819

1920
¹ Size of the Java Heap (`JAVA_TOOL_OPTIONS`)
2021
² Size of the block cache (`DB_BLOCK_CACHE_SIZE`)
@@ -216,6 +217,122 @@ blazectl download --server http://localhost:8080/fhir Observation -q "code=http:
216217
217218
¹ time in seconds per 1 million resources, ² block cache hit ratio is near zero
218219
220+
## Code and Patient Search
221+
222+
In this section, FHIR Search for selecting Observation resources with a certain code and 100 Patients is used.
223+
224+
### Counting
225+
226+
Counting is done using the following `curl` command:
227+
228+
```sh
229+
curl -s "http://localhost:8080/fhir/Observation?code=http://loinc.org|$CODE&subject=$PATIENT_REFS&_summary=count"
230+
```
231+
232+
| System | Dataset | Code | # Hits | Time (s) | StdDev | T/1M ¹ |
233+
|--------|---------|---------|-------:|---------:|-------:|-------:|
234+
| A5N46 | 1M | 8310-5 | 109 | 0.00 | 0.001 | 36.46 |
235+
| A5N46 | 1M | 55758-7 | 1 k | 0.01 | 0.001 | 4.55 |
236+
| A5N46 | 1M | 72514-3 | 4 k | 0.01 | 0.001 | 1.79 |
237+
238+
¹ time in seconds per 1 million resources
239+
240+
### Download of Resources
241+
242+
Most measurements are done after Blaze is in a steady state with all resources to download in it's resource cache in order to cancel out resource load times from disk or file system cache.
243+
244+
Download is done using the following `blazectl` command:
245+
246+
```sh
247+
blazectl download --server http://localhost:8080/fhir Observation -q "code=http://loinc.org|$CODE&subject=$PATIENT_REFS&_count=1000" > /dev/null"
248+
```
249+
250+
| System | Dataset | Code | # Hits | Time (s) | StdDev | T/1M ¹ |
251+
|--------|---------|---------|-------:|---------:|-------:|-------:|
252+
| A5N46 | 1M | 8310-5 | 109 | 0.00 | 0.005 | 30.57 |
253+
| A5N46 | 1M | 55758-7 | 1 k | 0.05 | 0.005 | 39.95 |
254+
| A5N46 | 1M | 72514-3 | 4 k | 0.09 | 0.005 | 20.88 |
255+
256+
¹ time in seconds per 1 million resources
257+
258+
### Download of Resources with Subsetting
259+
260+
In case only a subset of information of a resource is needed, the special [_elements][1] search parameter can be used to retrieve only certain properties of a resource. Here `_elements=subject` was used.
261+
262+
Most measurements are done after Blaze is in a steady state with all resources to download in it's resource cache in order to cancel out resource load times from disk or file system cache.
263+
264+
Download is done using the following `blazectl` command:
265+
266+
```sh
267+
blazectl download --server http://localhost:8080/fhir Observation -q "code=http://loinc.org|$CODE&subject=$PATIENT_REFS&_elements=subject&_count=1000" > /dev/null"
268+
```
269+
270+
| System | Dataset | Code | # Hits | Time (s) | StdDev | T/1M ¹ |
271+
|--------|---------|---------|-------:|---------:|-------:|-------:|
272+
| A5N46 | 1M | 8310-5 | 109 | 0.00 | 0.000 | 0.00 ² |
273+
| A5N46 | 1M | 55758-7 | 1 k | 0.03 | 0.000 | 25.68 |
274+
| A5N46 | 1M | 72514-3 | 4 k | 0.06 | 0.000 | 14.46 |
275+
276+
¹ time in seconds per 1 million resources, ² to fast to be measured
277+
278+
## Code, Date and Patient Search
279+
280+
In this section, FHIR Search for selecting Observation resources with a certain code, a certain date and 100 Patients is used.
281+
282+
### Counting
283+
284+
Counting is done using the following `curl` command:
285+
286+
```sh
287+
curl -s "http://localhost:8080/fhir/Observation?code=http://loinc.org|$CODE&date=2020&subject=$PATIENT_REFS&_summary=count"
288+
```
289+
290+
| System | Dataset | Code | # Hits | Time (s) | StdDev | T/1M ¹ |
291+
|--------|---------|---------|-------:|---------:|-------:|-------:|
292+
| A5N46 | 1M | 8310-5 | 29 | 0.00 | 0.001 | 126.55 |
293+
| A5N46 | 1M | 55758-7 | 121 | 0.01 | 0.001 | 63.52 |
294+
| A5N46 | 1M | 72514-3 | 420 | 0.01 | 0.001 | 33.34 |
295+
296+
¹ time in seconds per 1 million resources
297+
298+
### Download of Resources
299+
300+
Most measurements are done after Blaze is in a steady state with all resources to download in it's resource cache in order to cancel out resource load times from disk or file system cache.
301+
302+
Download is done using the following `blazectl` command:
303+
304+
```sh
305+
blazectl download --server http://localhost:8080/fhir Observation -q "code=http://loinc.org|$CODE&date=2020&subject=$PATIENT_REFS&_count=1000" > /dev/null"
306+
```
307+
308+
| System | Dataset | Code | # Hits | Time (s) | StdDev | T/1M ¹ |
309+
|--------|---------|---------|-------:|---------:|-------:|-------:|
310+
| A5N46 | 1M | 8310-5 | 29 | 0.00 | 0.000 | 0.00 ² |
311+
| A5N46 | 1M | 55758-7 | 121 | 0.01 | 0.000 | 82.64 |
312+
| A5N46 | 1M | 72514-3 | 420 | 0.02 | 0.000 | 47.61 |
313+
314+
¹ time in seconds per 1 million resources, ² to fast to be measured
315+
316+
### Download of Resources with Subsetting
317+
318+
In case only a subset of information of a resource is needed, the special [_elements][1] search parameter can be used to retrieve only certain properties of a resource. Here `_elements=subject` was used.
319+
320+
Most measurements are done after Blaze is in a steady state with all resources to download in it's resource cache in order to cancel out resource load times from disk or file system cache.
321+
322+
Download is done using the following `blazectl` command:
323+
324+
```sh
325+
blazectl download --server http://localhost:8080/fhir Observation -q "code=http://loinc.org|$CODE&date=2020&subject=$PATIENT_REFS&_elements=subject&_count=1000" > /dev/null"
326+
```
327+
328+
| System | Dataset | Code | # Hits | Time (s) | StdDev | T/1M ¹ |
329+
|--------|---------|---------|-------:|---------:|-------:|-------:|
330+
| A5N46 | 1M | 8310-5 | 29 | 0.00 | 0.000 | 0.00 ² |
331+
| A5N46 | 1M | 55758-7 | 121 | 0.00 | 0.000 | 0.00 ² |
332+
| A5N46 | 1M | 72514-3 | 420 | 0.02 | 0.000 | 47.61 |
333+
334+
¹ time in seconds per 1 million resources, ² to fast to be measured
335+
219336
## Simple Date Search
220337
221338
In this section, FHIR Search for selecting Observation resources with a certain effective year is used.
Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
#!/bin/bash -e
2+
3+
SCRIPT_DIR="$(dirname "$(readlink -f "$0")")"
4+
. "$SCRIPT_DIR/util.sh"
5+
6+
COMPOSE_FILE="$1"
7+
BASE="${2:-http://localhost:8080/fhir}"
8+
START_EPOCH="$(date +"%s")"
9+
PATIENT_COUNT=100
10+
PATIENT_REFS="$(curl -sf "$BASE/Patient?_count=$PATIENT_COUNT&_elements=id" | jq -r '.entry[].resource.id | "Patient/" + .' | tr '\n' ',' | sed 's/,$//')"
11+
12+
count-resources() {
13+
CODE="$1"
14+
15+
echo "Counting Observations with code $CODE, date 2020 and $PATIENT_COUNT Patients..."
16+
count-resources-raw "$BASE" "Observation" "code=http://loinc.org|$CODE&date=2020&subject=Patient/$PATIENT_REFS" "$START_EPOCH-count-$CODE.times"
17+
}
18+
19+
download-resources() {
20+
CODE="$1"
21+
22+
echo "Downloading Observations with code $CODE, date 2020 and $PATIENT_COUNT Patients..."
23+
download-resources-raw "$BASE" "Observation" "code=http://loinc.org|$CODE&date=2020&subject=Patient/$PATIENT_REFS" "$START_EPOCH-download-$CODE.times"
24+
}
25+
26+
download-resources-elements-subject() {
27+
CODE="$1"
28+
29+
echo "Downloading Observations with code $CODE, date 2020, $PATIENT_COUNT Patients and _elements=subject..."
30+
download-resources-raw "$BASE" "Observation" "code=http://loinc.org|$CODE&date=2020&subject=Patient/$PATIENT_REFS&_elements=subject" "$START_EPOCH-download-subject-$CODE.times"
31+
}
32+
33+
restart "$COMPOSE_FILE"
34+
count-resources "8310-5"
35+
download-resources "8310-5"
36+
download-resources-elements-subject "8310-5"
37+
38+
restart "$COMPOSE_FILE"
39+
count-resources "55758-7"
40+
download-resources "55758-7"
41+
download-resources-elements-subject "55758-7"
42+
43+
restart "$COMPOSE_FILE"
44+
count-resources "72514-3"
45+
download-resources "72514-3"
46+
download-resources-elements-subject "72514-3"
Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
#!/bin/bash -e
2+
3+
SCRIPT_DIR="$(dirname "$(readlink -f "$0")")"
4+
. "$SCRIPT_DIR/util.sh"
5+
6+
COMPOSE_FILE="$1"
7+
BASE="${2:-http://localhost:8080/fhir}"
8+
START_EPOCH="$(date +"%s")"
9+
PATIENT_COUNT=100
10+
PATIENT_REFS="$(curl -sf "$BASE/Patient?_count=$PATIENT_COUNT&_elements=id" | jq -r '.entry[].resource.id | "Patient/" + .' | tr '\n' ',' | sed 's/,$//')"
11+
12+
count-resources() {
13+
CODE="$1"
14+
15+
echo "Counting Observations with code $CODE and $PATIENT_COUNT Patients..."
16+
count-resources-raw "$BASE" "Observation" "code=http://loinc.org|$CODE&subject=Patient/$PATIENT_REFS" "$START_EPOCH-count-$CODE.times"
17+
}
18+
19+
download-resources() {
20+
CODE="$1"
21+
22+
echo "Downloading Observations with code $CODE and $PATIENT_COUNT Patients..."
23+
download-resources-raw "$BASE" "Observation" "code=http://loinc.org|$CODE&subject=Patient/$PATIENT_REFS" "$START_EPOCH-download-$CODE.times"
24+
}
25+
26+
download-resources-elements-subject() {
27+
CODE="$1"
28+
29+
echo "Downloading Observations with code $CODE, $PATIENT_COUNT Patients and _elements=subject..."
30+
download-resources-raw "$BASE" "Observation" "code=http://loinc.org|$CODE&subject=Patient/$PATIENT_REFS&_elements=subject" "$START_EPOCH-download-subject-$CODE.times"
31+
}
32+
33+
restart "$COMPOSE_FILE"
34+
count-resources "8310-5"
35+
download-resources "8310-5"
36+
download-resources-elements-subject "8310-5"
37+
38+
restart "$COMPOSE_FILE"
39+
count-resources "55758-7"
40+
download-resources "55758-7"
41+
download-resources-elements-subject "55758-7"
42+
43+
restart "$COMPOSE_FILE"
44+
count-resources "72514-3"
45+
download-resources "72514-3"
46+
download-resources-elements-subject "72514-3"

docs/performance/fhir-search/util.sh

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,9 +31,11 @@ calc-print-stats() {
3131
if (( $(echo "$COUNT > 1000000" | bc) )); then
3232
COUNT=$(echo "scale=2; $COUNT / 1000000" | bc)
3333
COUNT_FORMAT="%4.1f M"
34-
else
34+
elif (( $(echo "$COUNT > 1000" | bc) )); then
3535
COUNT=$(echo "scale=2; $COUNT / 1000" | bc)
3636
COUNT_FORMAT="%4.0f k"
37+
else
38+
COUNT_FORMAT="%6.0f"
3739
fi
3840

3941
printf "| $COUNT_FORMAT | %8.2f | %6.3f | %6.2f |\n" "$COUNT" "$AVG" "$(echo "$STATS" | jq .stddev)" "$AVG_1M"

modules/db/src/blaze/db/impl/batch_db.clj

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -299,6 +299,31 @@
299299
(-query-clauses [_]
300300
(decode-clauses clauses)))
301301

302+
(def ^:private ^:const ^long patient-c-hash (codec/c-hash "Patient"))
303+
304+
(defrecord PatientTypeQuery [tid patient-ids compartment-clause clauses]
305+
p/Query
306+
(-count [query batch-db]
307+
(ac/completed-future (count (p/-execute query batch-db))))
308+
(-execute [_ batch-db]
309+
(coll/eduction
310+
(mapcat
311+
#(index/compartment-query
312+
batch-db [patient-c-hash (codec/id-byte-string %)] tid
313+
clauses))
314+
patient-ids))
315+
(-execute [_ batch-db start-id]
316+
(coll/eduction
317+
(comp
318+
(mapcat
319+
#(index/compartment-query
320+
batch-db [patient-c-hash (codec/id-byte-string %)] tid
321+
clauses))
322+
(drop-while #(not= start-id (rh/id %))))
323+
patient-ids))
324+
(-query-clauses [_]
325+
(decode-clauses (into [compartment-clause] clauses))))
326+
302327
(defrecord EmptyTypeQuery [tid]
303328
p/Query
304329
(-count [_ batch-db]

modules/db/src/blaze/db/node.clj

Lines changed: 35 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -29,17 +29,20 @@
2929
[blaze.db.node.validation :as validation]
3030
[blaze.db.node.version :as version]
3131
[blaze.db.resource-store :as rs]
32-
[blaze.db.search-param-registry]
32+
[blaze.db.search-param-registry :as sr]
3333
[blaze.db.search-param-registry.spec]
3434
[blaze.db.tx-log :as tx-log]
3535
[blaze.executors :as ex]
3636
[blaze.fhir.spec :as fhir-spec]
37+
[blaze.fhir.spec.references :as fsr]
3738
[blaze.fhir.spec.type :as type]
3839
[blaze.module :as m :refer [reg-collector]]
3940
[blaze.scheduler :as sched]
4041
[blaze.spec]
4142
[blaze.util :refer [conj-vec]]
43+
[clojure.set :as set]
4244
[clojure.spec.alpha :as s]
45+
[clojure.string :as str]
4346
[cognitect.anomalies :as anom]
4447
[integrant.core :as ig]
4548
[java-time.api :as time]
@@ -216,12 +219,42 @@
216219
(ac/completed-future (deleted-resource resource-handle))
217220
(rs/get resource-store (rh/hash resource-handle))))
218221

222+
(defn- clause-with-code-fn? [codes]
223+
(fn [[search-param]]
224+
(codes (:code search-param))))
225+
226+
(defn- has-system? [value]
227+
(let [[system code] (str/split value #"\|")]
228+
(not (or (str/blank? system) (str/blank? code)))))
229+
230+
(defn- token-clause? [[search-param _ values]]
231+
(and (= "token" (:type search-param))
232+
(every? has-system? values)))
233+
234+
(defn- compartment-clause-patient-ids [[_ _ values]]
235+
(map second (keep fsr/split-literal-ref values)))
236+
237+
(defn- compile-patient-type-query [search-param-registry type clauses]
238+
(let [codes (set/intersection
239+
#{"subject" "patient"}
240+
(set (sr/compartment-resources search-param-registry "Patient" type)))]
241+
(when (seq codes)
242+
(let [[compartment-clause & more] (filter (clause-with-code-fn? codes) clauses)
243+
[token-clause] (filter token-clause? clauses)]
244+
(when (and compartment-clause (empty? more) token-clause)
245+
(batch-db/->PatientTypeQuery
246+
(codec/tid type)
247+
(compartment-clause-patient-ids compartment-clause)
248+
compartment-clause
249+
(into [token-clause] (remove (clause-with-code-fn? (conj codes (:code (first token-clause))))) clauses)))))))
250+
219251
(defn- compile-type-query [search-param-registry type clauses lenient?]
220252
(when-ok [clauses (index/resolve-search-params search-param-registry type clauses
221253
lenient?)]
222254
(if (empty? clauses)
223255
(batch-db/->EmptyTypeQuery (codec/tid type))
224-
(batch-db/->TypeQuery (codec/tid type) clauses))))
256+
(or (compile-patient-type-query search-param-registry type clauses)
257+
(batch-db/->TypeQuery (codec/tid type) clauses)))))
225258

226259
(defn- compile-compartment-query
227260
[search-param-registry code type clauses lenient?]

0 commit comments

Comments
 (0)