Skip to content

d. Module organize

jaclew edited this page Mar 31, 2025 · 1 revision

The organize-module is used to select datasets by metadata and group them in separate output folders.

  • Select Francisella and Brucella datasets and group them by genus:
organize.py --metadata_file metadata.tsv --output organized_output_genus --select_column genus --select_values Francisella,Burkholderia --group_by genus

The organized output is shown by command tree organized_output_genus:

organized_output_genus/
├── Burkholderia
│   ├── GCA_000292915.1.fasta
│   ├── GCA_009911875.1.fasta
│   ├── ..
│   ├── ..
│   ├── ..
│   ├── GCF_902833225.1.fasta
│   └── GCF_905232215.1.fasta
└── Francisella
    ├── GCA_000018925.1.fasta
    ├── GCA_000153845.1.fasta
    ├── ..
    ├── ..
    ├── ..
    ├── GCF_009823375.1.fasta
    └── GCF_012224145.1.fasta
  • An alternative example: Suppose there is a column in the database ABR that denotes to which antibiotics a sample dataset is resistant to. Antibiotics-resistant samples of Francisella can be selected and organized into folders of antibiotics-resistance by:
organize.py --metadata_file metadata.tsv --output organized_output_genus --select_column genus --select_values Francisella --group_by ABR
Clone this wiki locally