You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This pipeline provides several useful tools for analysis of immune repertoire sequencing data. The pipeline utilizes unique nucleotide tags (UMIs) in order to filter experimental errors from resulting sequences. Those tags are attached to molecules before sequencing library preparation and allow to backtrack the original sequence of molecule. This pipeline is applicable for Illumina MiSeq and HiSeq 2500 reads. Sequencing libraries targeting CDR3 locus of immune receptor genes with high over-sequencing, i.e. ones that have at least 10 reads (optimally 30+ reads) per each starting molecule, should be used.
6
5
@@ -18,20 +17,20 @@ or simply download a standalone jar and execute
18
17
19
18
>$java -cp migec.jar Checkout
20
19
21
-
NOTE: The data from 454 platform should be used with caution, as it contains homopolymer errors which (in present framework) result in reads dropped during consensus assembly. The 454 platform has a relatively low read yield, so additional read dropping could result in over-sequencing level below required threshold. If you still wish to give it a try, we would recommend filtering off all short reads and repairing indels with Coral (http://www.cs.helsinki.fi/u/lmsalmel/coral/), the latter should be run with options ```-mr 2 -mm 1000 -g 3```.
20
+
NOTE: The data from 454 platform should be used with caution, as it contains homopolymer errors which (in present framework) result in reads dropped during consensus assembly. The 454 platform has a relatively low read yield, so additional read dropping could result in over-sequencing level below required threshold. If you still wish to give it a try, we would recommend filtering off all short reads and repairing indels with Coral (<http://www.cs.helsinki.fi/u/lmsalmel/coral/>), the latter should be run with options ```-mr 2 -mm 1000 -g 3```.
22
21
23
22
STANDARD PIPELINE
24
-
=================
23
+
-----------------
24
+
25
+
### 1. Checkout
25
26
26
-
1. Checkout
27
-
==============================
28
27
Description: A script to perform de-multiplexing and UMI tag extraction
@@ -45,6 +44,14 @@ will search for AAGGTT seed exact match, then for the remaining adapter sequence
45
44
46
45
Additional parameters:
47
46
47
+
```-c``` compressed output (gzip compression).
48
+
49
+
```-t``` trim adapter sequence from output.
50
+
51
+
```-e``` also remove trails of template-switching (poly-G) for the case when UMI-containing adapter is added using reverse-transcription (cDNA libraries).
52
+
53
+
Barcode search parameters:
54
+
48
55
```-o``` could speed up if reads are oriented (i.e. master adapter should be in R1).
49
56
50
57
```-r``` will apply a custom RC mask. By default it assumes Illumina reads with mates on different strands, so it reverse-complements read with slave adapter so that output reads will be on master strand.
@@ -53,9 +60,8 @@ Additional parameters:
53
60
54
61
55
62
63
+
### 2. Histogram
56
64
57
-
2. Histogram
58
-
==============================
59
65
Description: A script to generate over-sequencing statistics
60
66
61
67
Standard usage:
@@ -66,8 +72,8 @@ Will generate several files, the one important for basic data processing is ./ch
66
72
67
73
68
74
69
-
3. Assemble
70
-
==============================
75
+
### 3. Assemble
76
+
71
77
Description: A script to perform UMI-guided assembly
72
78
73
79
Standard usage:
@@ -98,8 +104,8 @@ To inspect the effect of such single-mismatch erroneous UMI sub-variants see "co
98
104
99
105
100
106
101
-
4. CdrBlast
102
-
===============================
107
+
### 4. CdrBlast
108
+
103
109
Description: A script to extract CDR3 sequences
104
110
105
111
Standard usage (assuming library contains T-cell Receptor Alpha Chain sequences)
@@ -115,15 +121,15 @@ For raw data:
115
121
116
122
NOTE:
117
123
118
-
1) NCBI-BLAST+ package required. Could be directly installed on Linux using a command like $sudo apt-get ncbi-blast+ or downloaded and installed from here: ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/
124
+
1) NCBI-BLAST+ package required. Could be directly installed on Linux using a command like $sudo apt-get ncbi-blast+ or downloaded and installed directly from here: <ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/>
119
125
120
126
2) Both raw and assembled data should be processed to apply the last step of filtration.
121
127
122
128
123
129
124
130
125
-
5. FilterCdrBlastResults
126
-
============================================
131
+
### 5. FilterCdrBlastResults
132
+
127
133
Description: A script to filter erroneous CDR3 sequences produced due to hot-spot PCR and NGS errors
0 commit comments