Prepare data for wastewater materials

We have a subset of 60 samples from [Karthikeyan et al.](https://doi.org/10.1038/s41586-022-05049-6) that show a transition between delta and omicron variants. 

To run it live in a workshop we need to have the pipeline finish in under 1h. 
If that is not possible, we may need to instead use a `preprocessed` folder with results from 60 samples and in the workshop they only process 5-10 samples. 

- [x] test how long it takes to run 30 samples - locally, this takes over 2h and didn't finish (we killed the process)
- [x] test running 5 samples only with default options (see [comment below](https://github.com/cambiotraining/sars-cov-2-genomics/issues/43#issuecomment-1878404105)).
- [x] test using `--freyja_repeats 0` with viralrecon on a small number of samples. To see if it's possible to skip this step. If this throws error use 1 and see if that works. This is to save time running the pipeline.
- [x] Find primer locations for the kit used in the publication: Swift Normalase Amplicon Panels (SNAP) kit (PN: SN-5X296 (core) COVG1V2-96 (amplicon primers), Integrated DNA Technologies)
  - Hugo contacted [idtdna](https://eu.idtdna.com/pages/products/next-generation-sequencing/workflow/xgen-ngs-amplicon-sequencing/predesigned-amplicon-panels/sars-cov-2-amp-panel#product-details) who now commercialise this product.
  - Bajuna will open issue on [C-VIEW](https://github.com/ucsd-ccbb/C-VIEW/issues) repo to ask what they did in the publication.
- [x] Run 3 samples using the [SWIFT BED file](https://github.com/ucsd-ccbb/C-VIEW/blob/main/reference_files/sarscov2_v2_primers.bed) directly using the `--primer_bed` option. Also download the reference [FASTA](https://github.com/nf-core/test-datasets/raw/viralrecon/genome/MN908947.3/GCA_009858895.3_ASM985889v3_genomic.200409.fna.gz) file and [GFF](https://github.com/nf-core/test-datasets/raw/viralrecon/genome/MN908947.3/GCA_009858895.3_ASM985889v3_genomic.200409.gff.gz) and pass directly with `--fasta` and `--gff`.
 - [ ] prepare participant data directories with the files needed for the workshop. There is a folder on the hpc under `sars-wastewater/participants` for this. This includes:
  - `data/reads` - FASTQ files for the 5 samples to be processed
  - `resources` - reference genome FASTA and GFF annotation (may be useful for some analysis)
  - `preprocessed` - with results from 30 samples
  - `scripts` - shell scripts that they will fix in the exercises
  - `utilities` - python scripts we provide, e.g. to prepare samplesheet or tidy freyja output files
  - `sample_info.csv` metadata table with "sample,date,country,location,latitude,longitude"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Prepare data for wastewater materials #43

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Prepare data for wastewater materials #43

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions