This pipeline is designed to take either raw reads alone, or assemblies plus raw reads as input. If only reads are provided, they will be assembled with unicycler.
nextflow run BCCDC-PHL/plasmid-screen \
  --fastq_input </path/to/fastqs> \
  --mob_db </path/to/mob-suite-db> \
  --outdir </path/to/outdir> 
If assemblies are already available, they can be provided by adding the --pre_assembled flag, and supplying the assemblies to the --assembly_input flag.
nextflow run BCCDC-PHL/plasmid-screen \
  --pre_assembled \
  --assembly_input </path/to/assemblies> \
  --fastq_input </path/to/fastqs> \
  --mob_db </path/to/mob-suite-db> \
  --outdir </path/to/outdir> 
Alternatively, a 'samplesheet.csv' file may be provided with fields ID, R1, R2:
ID,R1,R2
sample-01,/path/to/sample-01_R1.fastq.gz,/path/to/sample-01_R2.fastq.gz
sample-02,/path/to/sample-02_R1.fastq.gz,/path/to/sample-02_R2.fastq.gz
...
nextflow run BCCDC-PHL/plasmid-screen \
  --samplesheet_input </path/to/samplesheet.csv> \
  --mob_db </path/to/mob-suite-db> \
  --outdir </path/to/outdir> 
...or if assemblies are available, the samplesheet.csv file may also include the field ASSEMBLY:
ID,R1,R2,ASSEMBLY
sample-01,/path/to/sample-01_R1.fastq.gz,/path/to/sample-01_R2.fastq.gz,/path/to/sample-01.fa
sample-02,/path/to/sample-02_R1.fastq.gz,/path/to/sample-02_R2.fastq.gz,/path/to/sample-01.fa
...
nextflow run BCCDC-PHL/plasmid-screen \
  --pre_assembled \
  --samplesheet_input </path/to/samplesheet.csv> \
  --mob_db </path/to/mob-suite-db> \
  --outdir </path/to/outdir> 
The main output of the pipeline is the 'Resistance gene report', which summarizes where the resistance gene was located (contig and position), the quality of the resitance gene match (% identity and % coverage) and a characterization of the plasmid reconstruction. The report includes the following fields:
sample_id
assembly_file
resistance_gene_contig_id
resistance_gene_contig_size
resistance_gene_id
resistance_gene_contig_position_start
resistance_gene_contig_position_end
percent_resistance_gene_coverage
percent_resistance_gene_identity
num_contigs_in_plasmid_reconstruction
plasmid_reconstruction_size
replicon_types
mob_suite_primary_cluster_id
mob_suite_secondary_cluster_id
mash_nearest_neighbor
mash_neighbor_distance
alignment_ref_plasmid
depth_coverage_threshold
percent_ref_plasmid_coverage_above_depth_threshold
num_snps_vs_ref_plasmid
For each sample, the following output files are created:
sample-01/
├── sample-01_20211207163723_provenance.yml
├── sample-01_abricate_ncbi.tsv
├── sample-01_abricate_plasmidfinder.tsv
├── sample-01_chromosome.fasta
├── sample-01_fastp.csv
├── sample-01_mash_screen.tsv
├── sample-01_mobtyper_contig_report.tsv
├── sample-01_mobtyper_plasmid_report.tsv
├── sample-01_resistance_gene_report.tsv
├── sample-01_NC_019152.1.snps.vcf
├── sample-01_NC_019152.1.sorted.bam
├── sample-01_NC_019152.1.sorted.bam.bai
├── sample-01_plasmid_AA023.fasta
├── sample-01_plasmid_AA026.fasta
├── sample-01_quast.csv
└── NC_019152.1.fa
| filename suffix | Generated by | Description | 
|---|---|---|
| _abricate_ncbi.tsv | abricate | All resistance genes found in the entire assembly | 
| _abricate_plasmidfinder.tsv | abricate | All replicon genes found in the entire assembly | 
| _chromosome.fasta | mob_recon | The set of contigs determined by mob_recon to belong to the chromosome (non-plasmid) | 
| _plasmid_<cluster_id>_.fasta | mob_recon | Plasmid reconstructions. Groups of contigs that were determined to be part of the same plasmid | 
| _fastp.csv | fastp | Read QC info | 
| _quast.csv | quast | Assembly QC info | 
| _mash_screen.tsv | mash | Containment of reference plasmids in reads | 
| _mobtyper_contig_report.csv | mob_typer | MOB Typer results for all contigs in the assembly (both chromosome and plasmid) | 
| _mobtyper_plasmid_report.csv | mob_typer | MOB Typer results for all plasmid reconstructions (including those that do not have resistance genes | 
| _<plasmid_id>.sorted.bam{.bai} | bwa | Alignment of reads against a reference plasmid | 
| .snps.vcf | freebayes | SNPs found in alignment of reads against a reference plasmid | 
| <plasmid_id>.fa | seqkit | Reference plasmid used for alignments | 
Each analysis will create a provenance.yml file for each sample. The filename of the provenance.yml file includes a timestamp with format YYYYMMDDHHMMSS to ensure
that a unique file will be produced if a sample is re-analyzed and outputs are stored to the same directory.
Example provenance output:
- pipeline_name: BCCDC-PHL/plasmid-screen
  pipeline_version: 0.2.3
  nextflow_session_id: c0cc6250-c767-4bfe-9254-0b49ff6dab91
  nextflow_run_name: mighty_panini
  timestamp_analysis_start: 2024-06-18T16:09:15.659426-07:00
- input_filename: sample-01_R1.fastq.gz
  input_path: /path/to/sample-01_R1.fastq.gz
  sha256: 497c99c5665bd0b89666c5fa625ae966f2ffaf218186db0e1ae95a15dac3ac76
- input_filename: sample-01_R2.fastq.gz
  input_path: /path/to/sample-01_R2.fastq.gz
  sha256: 46ec4c473b613d36c7ce109808c4510a10b205aaebcfe837eb542999fdbdf11f
- input_filename: sample-01_unicycler_short.fa
  input_path: /path/to/sample-01_unicycler_short.fa
  sha256: b0d012b23057095b305cf57a687d90406e7383051d2c845717f6e99fdb4d4ad7
- process_name: trim_reads
  tools:
    - tool_name: fastp
      tool_version: 0.22.0
      parameters:
        - parameter: cut_tail
          value: true
- process_name: quast
  tools:
    - tool_name: quast
      tool_version: 5.0.2
- process_name: mash_screen
  tools:
    - tool_name: mash
      tool_version: 2.3
      parameters
        - parameter: threshold
          value: 0.996
- process_name: mob_recon
  tools:
    - tool_name: mob_recon
      tool_version: 3.0.3
      parameters
        - parameter: database_directory
          value: /path/to/mob-suite/db
        - parameter: filter_db
          value: /path/to/mob-suite/chromosomes/2019-11-NCBI-Enterobacteriacea-Chromosomes.fasta
        - parameter: min_con_cov
          value: 95
- process_name: abricate
  tools:
    - tool_name: abricate
      tool_version: 1.0.1
      parameters:
        - parameter: db
          value: ncbi
- process_name: abricate
  tools:
    - tool_name: abricate
      tool_version: 1.0.1
      parameters:
        - parameter: db
          value: plasmidfinder
- process_name: align_reads_to_reference_plasmid
  process_tags:
    ref_plasmid_id: NZ_CP023897.1
    resistance_gene: blaOXA-181
  tools:
    - tool_name: bwa
      tool_version: 0.7.17-r1188
      subcommand: mem
      parameters:
        - parameter: output_all_alignments
          value: true
        - parameter: use_soft_clipping_for_supplementary_alignments
          value: true
        - parameter: mark_shorter_split_hits_as_secondary
          value: true
    - tool_name: samtools
      tool_version: 1.13
      subcommand: view
      parameters:
        - parameter: exclude_flags
          value: 1540
- process_name: call_snps
  process_tags:
    ref_plasmid_id: NZ_CP023897.1
    resistance_gene: blaOXA-181
  tools:
    - tool_name: freebayes
      tool_version: 1.3.5
      parameters:
        - parameter: ploidy
          value: 1
        - parameter: min_base_quality
          value: 20
        - parameter: min_mapping_quality
          value: 60
        - parameter: min_coverage
          value: 10
        - parameter: min_alternate_fraction
          value: 0.8
        - parameter: min_repeat_entropy
          value: 1.0
    - tool_name: bcftools
      tool_version: 1.20
      subcommand: view
      parameters:
        - parameter: include
          value: INFO/TYPE=snp