Skip to content

Commit 90874b3

Browse files
huddlejvictorlin
authored andcommitted
Demo prefilter rule for Nextstrain GISAID build
Adds a prefilter rule to reduce the size of the input metadata for the GISAID build before running the whole workflow.
1 parent 4ce71eb commit 90874b3

File tree

2 files changed

+28
-1
lines changed

2 files changed

+28
-1
lines changed

nextstrain_profiles/nextstrain-gisaid/builds.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@ auspice_json_prefix: ncov_gisaid
22

33
# Define custom rules for pre- or post-standard workflow processing of data.
44
custom_rules:
5+
- workflow/snakemake_rules/prefilter.smk
56
- workflow/snakemake_rules/export_for_nextstrain.smk
67

78
# These parameters are only used by the `export_for_nextstrain` rule and shouldn't need to be modified.
@@ -25,7 +26,7 @@ files:
2526

2627
inputs:
2728
- name: gisaid
28-
metadata: "s3://nextstrain-ncov-private/metadata.tsv.zst"
29+
metadata: "data/prefiltered_metadata.tsv"
2930
aligned: "s3://nextstrain-ncov-private/aligned.fasta.zst"
3031
skip_sanitize_metadata: true
3132

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
rule download_metadata:
2+
params:
3+
metadata_url="s3://nextstrain-ncov-private/metadata.tsv.zst",
4+
output:
5+
metadata="data/metadata.tsv.zst",
6+
shell:
7+
"""
8+
aws s3 cp {params.metadata_url} {output.metadata}
9+
"""
10+
11+
rule filter_metadata:
12+
input:
13+
metadata="data/metadata.tsv.zst",
14+
output:
15+
metadata="data/prefiltered_metadata.tsv",
16+
params:
17+
max_sequences=500000,
18+
group_by="division year month",
19+
shell:
20+
"""
21+
augur filter \
22+
--metadata {input.metadata} \
23+
--subsample-max-sequences {params.max_sequences} \
24+
--group-by {params.group_by} \
25+
--output-metadata {output.metadata}
26+
"""

0 commit comments

Comments
 (0)