You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* adding figure and its caption to UMI resolution
* adding .key to _static
* adding changelog
* Update changelog.d/363.added.md
Co-authored-by: Lukas Heumos <lukas.heumos@posteo.net>
* changed color scheme and improved caption
* removed *.pdf from .gitignore and added UMI.pdf
* changed format to PNG and added reference to figure in the text
* remove typos
* added book.pdf to gitignore
---------
Co-authored-by: Lukas Heumos <lukas.heumos@posteo.net>
Copy file name to clipboardExpand all lines: jupyter-book/introduction/raw_data_processing.md
+13-2Lines changed: 13 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -469,7 +469,7 @@ Several common strategies are used for cell barcode identification and correctio
469
469
After cell barcode (CB) correction, reads have either been discarded or assigned to a corrected CB.
470
470
Subsequently, we wish to quantify the abundance of each gene within each corrected CB.
471
471
472
-
Because of the {term}`amplification bias` as discussed in {ref}`exp-data:transcript-quantification`, reads must be deduplicated, based upon their UMI, to assess the true count of sampled molecules. Additionally, several other complicating factors present challenges when attempting to perform this estimation.
472
+
Because of the {term}`amplification bias` as discussed in {ref}`exp-data:transcript-quantification`, reads must be deduplicated, based upon their UMI, to assess the true count of sampled molecules ({numref}`umi-figure`). Additionally, several other complicating factors present challenges when attempting to perform this estimation.
473
473
474
474
The UMI deduplication step aims to identify the set of reads and UMIs derived from each original, pre-PCR molecule in each cell captured and sequenced in the experiment.
475
475
The result of this process is to allocate a molecule count to each gene in each cell, which is subsequently used in the downstream analysis as the raw expression estimate for this gene.
@@ -482,6 +482,17 @@ A read can be tagged by only one UMI but may belong to multiple references if it
482
482
Additionally, since molecule barcoding in scRNA-seq is typically isolated and independent for each cell (aside from the previously discussed challenges in resolving cell barcodes), _UMI resolution_ will be explained for a single cell without loss of generality.
483
483
This same procedure is generally applied to all cells independently.
UMIs reduce PCR amplification bias by tracking original molecules, but can be affected by different types of errors (blue boxes).
492
+
Nucleotide substitutions in UMI tags may occur during amplification or sequencing.
493
+
Multimapping can arise when reads sharing the same UMI are mapped to different genes (blue and red), when a single read maps to multiple genes (gray), or both.
494
+
```
495
+
485
496
(raw-proc:need-for-umi-resolution)=
486
497
487
498
### The need for UMI resolution
@@ -490,7 +501,7 @@ In the ideal case, where the correct (unaltered) UMIs tag reads, the reads of ea
490
501
Consequently, the UMI deduplication procedure is conceptually straightforward: the reads of a UMI are the PCR duplicates from a single pre-PCR molecule.
491
502
The number of captured and sequenced molecules of each gene is the number of distinct UMIs observed for this gene.
492
503
493
-
However, the problems encountered in practice make the simple rules described above insufficient for identifying the gene origin of UMIs in general and necessitate the development of more sophisticated models:
504
+
However, the problems encountered in practice make the simple rules described above insufficient for identifying the gene origin of UMIs in general and necessitate the development of more sophisticated models ({numref}`umi-figure`):
494
505
495
506
-**Errors in UMIs**:
496
507
These occur when the sequenced UMI tag of reads contains errors introduced during PCR or the sequencing process.
0 commit comments