Skip to content

Commit ffee180

Browse files
committed
Readme update
1 parent 4aea8e3 commit ffee180

File tree

1 file changed

+6
-3
lines changed

1 file changed

+6
-3
lines changed

README.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,6 @@
44

55
This pipeline provides several useful tools for analysis of immune repertoire sequencing data. The pipeline utilizes unique nucleotide tags (UMIs) in order to filter experimental errors from resulting sequences. Those tags are attached to molecules before sequencing library preparation and allow to backtrack the original sequence of molecule. This pipeline is applicable for Illumina MiSeq and HiSeq 2500 reads. Sequencing libraries targeting CDR3 locus of immune receptor genes with high over-sequencing, i.e. ones that have at least 10 reads (optimally 30+ reads) per each starting molecule, should be used.
66

7-
The data from 454 platform should be used with caution, as it contains homopolymer errors which (in present framework) result in reads dropped during consensus assembly. The 454 platform has a relatively low read yield, so additional read dropping could result in over-sequencing level below required threshold. If you still wish to give it a try, we would recommend filtering off all short reads and repairing indels with Coral (http://www.cs.helsinki.fi/u/lmsalmel/coral/), the latter should be run with options ```-mr 2 -mm 1000 -g 3```.
8-
97
Features:
108
- Flexible de-multiplexing of NGS data and extraction of UMI sequence
119
- Assembly of consensuses of original molecules
@@ -20,6 +18,7 @@ or simply download a standalone jar and execute
2018

2119
>$java -cp migec.jar Checkout
2220
21+
NOTE: The data from 454 platform should be used with caution, as it contains homopolymer errors which (in present framework) result in reads dropped during consensus assembly. The 454 platform has a relatively low read yield, so additional read dropping could result in over-sequencing level below required threshold. If you still wish to give it a try, we would recommend filtering off all short reads and repairing indels with Coral (http://www.cs.helsinki.fi/u/lmsalmel/coral/), the latter should be run with options ```-mr 2 -mm 1000 -g 3```.
2322

2423
STANDARD PIPELINE
2524
=================
@@ -90,7 +89,11 @@ In case of library with overlapping reads, the script can try to overlap them pr
9089
9190
which will generate ./assembly/S1_RO.fastq.gz, containing assembly results _only_ for overlapping reads.
9291

93-
The ```--min-count``` option sets minimum number of reads in MIG.
92+
The ```--min-count``` option sets minimum number of reads in MIG. This should be set according to Histogram script output to separate two peaks: over-sequenced MIGs and erroneous MIGs that cluster around MIG size of 1.
93+
94+
Those erroneous MIGs could arise as experimental artifacts, however the most common reason for their presence is an error event in UMI sequence itself. Note that the latter is only valid when number of distinct UMIs is far lower than theoretically possible UMI diversity (e.g. 4^12 for 12-letter UMI regions)!
95+
96+
To inspect the effect of such single-mismatch erroneous UMI sub-variants see "collisions" output of Histogram script. Such collision events could interfere with real MIGs when over-sequencing is relatively low. In this case collisions could be filtered during MIG consensus assembly using ```-f``` option. The ```--collision-ratio``` could be change in order to prevent filtering of real collision occurred due to finite theoretically possible UMI diversity.
9497

9598

9699

0 commit comments

Comments
 (0)