-
Notifications
You must be signed in to change notification settings - Fork 0
d. Module organize
jaclew edited this page Mar 31, 2025
·
1 revision
The organize-module is used to select datasets by metadata and group them in separate output folders.
- Select Francisella and Brucella datasets and group them by genus:
organize.py --metadata_file metadata.tsv --output organized_output_genus --select_column genus --select_values Francisella,Burkholderia --group_by genus
The organized output is shown by command tree organized_output_genus
:
organized_output_genus/
├── Burkholderia
│ ├── GCA_000292915.1.fasta
│ ├── GCA_009911875.1.fasta
│ ├── ..
│ ├── ..
│ ├── ..
│ ├── GCF_902833225.1.fasta
│ └── GCF_905232215.1.fasta
└── Francisella
├── GCA_000018925.1.fasta
├── GCA_000153845.1.fasta
├── ..
├── ..
├── ..
├── GCF_009823375.1.fasta
└── GCF_012224145.1.fasta
- An alternative example: Suppose there is a column in the database
ABR
that denotes to which antibiotics a sample dataset is resistant to. Antibiotics-resistant samples of Francisella can be selected and organized into folders of antibiotics-resistance by:
organize.py --metadata_file metadata.tsv --output organized_output_genus --select_column genus --select_values Francisella --group_by ABR