Metabarcoding-pipeline

1rst: Folders

The first step is create the folder for the project and the subfolders that we need.

mkdir project_A
cd project_A

mkdir rawdata
mkdir scripts
mkdit logs
mkdir output

2nd: Cleanning

The second step is cleaning the adapters from the sequences (cutadapt)

sbatch scripts/cutadapt.sh

Script Adapters:

SMART, Chaetoceros:
PRIMER_F="GCAGTTAAAAAGCTCGTAGT" # EK-565F
PRIMER_R="TTTAAGTTTCAGCCTTGCG" # 18S-EUK-1134R-UnonMet

PRIMER_F="CCAGCASCYGCGGTAATTCC" # TAReuk454FWD1
PRIMER_R="ACTTTCGTTCTTGATYRATGA" # TAReukREV3_modi

Input:

Folder: rawdata/
Files:

XX_R[1|2]*.fastq --> The raw data from the secuencer:

Output:

Folder: output/cutadapt
Files:

XX__trimmed_R[1|2].fastq --> same # of the input, seq. without adapter.
seqkit_stats_trimmed.tsv --> stats of the input files.
seqkit_stats_untrimmed.tsv --> stats of the output files.

2.5: score dsitribution

Pre-processing step, study the Qscore distribution.

sbatch scripts/00_qscore.sh

Script:

Change lines 43 & 43: revise that the paths are correct for the cutadapt (input) and output.

Input:

Folder: output/cutadapt
Files: output step 2

Output:

Folder: output/00_qprofiles
Files:

forward.pdf --> q_score graphs for forward files.
reverse.pdf --> q_score graphs for reverse files.

3rd: DADA2

The next step is for filtering, denoising and merge.

sbatch scripts/01_all.sh

Script:

Change lines: L8-11: the project name (project_A) and the version if needed

path_principal<- "/home/ntimoneda/ntimoneda/projects/project_A"
path <- "/home/ntimoneda/ntimoneda/projects/project_A/cutadapt"
output <- "/home/ntimoneda/ntimoneda/projects/project_A/output"
name.run <- "v1"

Change lines: L28-30: parameters for filtering

truncLen=c(280,270) --> Lenght to truncate low quality, forward & reverse, see in the graphs step 2.5.
maxEE=c(4,6) --> Maximum number of expectation errors, in the denoising (forward and reverse).

Change lines: L69: parameters for dereplication

minOverlap=10 --> number of overlap. Overlap = (truncLen_fwd + truncLen_rev) - Amplicon_expected

Input:

Folder: output/cutadapt
Files: output step 2

Output:

Folder: output/filtered Files:

XX_[F|R]_filt.fastq --> same # of the input, seq. filtered.
errors_name.run_[fwd|rev].pdf --> plots about errors learned.
v1_seqtab.rds --> the sequences.
v1_track_analysis.v3.tsv --> stats of the output.

4rth: Chimeras (DADA2)

The last step at DADA2 is to delete chimeras and trim the final sequences.

sbatch scripts/02_chimeras.sh

Script:

Change lines in 02_chimeras.sh. L38-41.

Make sure the paths are correct and the names of the files.
L41: The minimum and maximum length of the final sequences, depends on length amplicon and the previous trimming options

Input:

Folder: previous step File: v1_seqtab.rds (from the previous step)

Output:

XX_track_analysis_final.tsv --> loss of reads in each step
XX_seqtab_final.rds --> Final ASV table
XX_seqtab_final.fasta --> Fasta file with final ASVs

5rth: Chimersas & Taxonomy with DADA2

The aim of this step is to assign taxonomically the sequences, by I use to generate the table of ASV for phyloseq.

sbatch scripts/03_tax.sh

Script:

Change lines in 03_tax.sh. L42-46.

Make sure the paths are correct and the names of the files.
L46: the cut-off for the identity

Input:

Folder: previous step File: XX_seqtab_final.rds (from the previous step)

Output:

XX_tax_assignation.rds --> the sequences with no chimeras
XX_merged_table.txt --> Tax and counts for the ASVs

6th: taxonomy (VSearch)

This step is for performing taxonomic assignment with the VSEARCH. It can be done with global alignment, which involves a de Bruijn graph where you choose the minimum identity (similar to BLAST), or with SINTAX, where you specify the bootstrap.

sbatch tax_vsearch.sh

Script:

-Usearch_global (BLAST) vsearch --usearch_global INPUT.fa --db DDBB.fa --id CUTOFF --alnout ALIGNOUTPUT.aln --blast6out OUTPUT_taxt.out

bootstrap vsearch --sintax INPUT.fa --db DDBB.fa --sintax_cutoff CUTOFF --tabbedout OUTPUT.tbl

Input:

XX_seqtab_final.fasta --> Output fasta from step 4

Parsing the results

At this point, all the analysis in the server is done. However, we need to change the format of some files before going to RCRAN and continuing with the analysis.

ASV table counts

#Delete the taxonomy and the sequence of each ASV

#calcualte the number of columns
awk -F'\t' '{print NF; exit}' merged_table.txt
#change the naumber 32 for: the previous result - 7
cut -f 1-32 merge_table.txt | awk '!($2="")' > clean_ASV_table.txt
#change the number 32 for the number of samples in the project

ASV Taxonomy table

#Select only the ranks taxonomy with a bootstrap = or > of 0.6. Prepare the file for phyloseq.

sed 's/d.*+//'  output.tbl  | sed 's/;size=[0-9]*//' > pr2_custom.tbl
perl ~/Escriptori/ICM/perl/parse_tax_vsearch.v1.pl pr2_custom.tbl 8 > tax_table.tbl

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Metabarcoding-pipeline

1rst: Folders

2nd: Cleanning

Script Adapters:

Input:

Output:

2.5: score dsitribution

Script:

Input:

Output:

3rd: DADA2

Script:

Input:

Output:

4rth: Chimeras (DADA2)

Script:

Input:

Output:

5rth: Chimersas & Taxonomy with DADA2

Script:

Input:

Output:

6th: taxonomy (VSearch)

Script:

Input:

Parsing the results

ASV table counts

ASV Taxonomy table

About

Uh oh!

Releases

Packages

NataliaTimoneda/Metabarcoding-pipeline

Folders and files

Latest commit

History

Repository files navigation

Metabarcoding-pipeline

1rst: Folders

2nd: Cleanning

Script Adapters:

Input:

Output:

2.5: score dsitribution

Script:

Input:

Output:

3rd: DADA2

Script:

Input:

Output:

4rth: Chimeras (DADA2)

Script:

Input:

Output:

5rth: Chimersas & Taxonomy with DADA2

Script:

Input:

Output:

6th: taxonomy (VSearch)

Script:

Input:

Parsing the results

ASV table counts

ASV Taxonomy table

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages