Skip to content

Commit 20b4397

Browse files
Add UMI bias figure (#363)
* adding figure and its caption to UMI resolution * adding .key to _static * adding changelog * Update changelog.d/363.added.md Co-authored-by: Lukas Heumos <lukas.heumos@posteo.net> * changed color scheme and improved caption * removed *.pdf from .gitignore and added UMI.pdf * changed format to PNG and added reference to figure in the text * remove typos * added book.pdf to gitignore --------- Co-authored-by: Lukas Heumos <lukas.heumos@posteo.net>
1 parent d72812e commit 20b4397

File tree

4 files changed

+15
-3
lines changed

4 files changed

+15
-3
lines changed

.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ doc/_build
3333
jupyter-book/data
3434
*.h5ad
3535
*.h5mu
36-
*.pdf
36+
book.pdf
3737

3838
**/data/*
3939
**/figures/*

changelog.d/363.added.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Adding UMIs explanation figure ([#363](https://github.com/theislab/single-cell-best-practices/pull/363)) <sub>@LuisHeinzlmeier</sub>
254 KB
Loading

jupyter-book/introduction/raw_data_processing.md

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -469,7 +469,7 @@ Several common strategies are used for cell barcode identification and correctio
469469
After cell barcode (CB) correction, reads have either been discarded or assigned to a corrected CB.
470470
Subsequently, we wish to quantify the abundance of each gene within each corrected CB.
471471

472-
Because of the {term}`amplification bias` as discussed in {ref}`exp-data:transcript-quantification`, reads must be deduplicated, based upon their UMI, to assess the true count of sampled molecules. Additionally, several other complicating factors present challenges when attempting to perform this estimation.
472+
Because of the {term}`amplification bias` as discussed in {ref}`exp-data:transcript-quantification`, reads must be deduplicated, based upon their UMI, to assess the true count of sampled molecules ({numref}`umi-figure`). Additionally, several other complicating factors present challenges when attempting to perform this estimation.
473473

474474
The UMI deduplication step aims to identify the set of reads and UMIs derived from each original, pre-PCR molecule in each cell captured and sequenced in the experiment.
475475
The result of this process is to allocate a molecule count to each gene in each cell, which is subsequently used in the downstream analysis as the raw expression estimate for this gene.
@@ -482,6 +482,17 @@ A read can be tagged by only one UMI but may belong to multiple references if it
482482
Additionally, since molecule barcoding in scRNA-seq is typically isolated and independent for each cell (aside from the previously discussed challenges in resolving cell barcodes), _UMI resolution_ will be explained for a single cell without loss of generality.
483483
This same procedure is generally applied to all cells independently.
484484

485+
```{figure} ../_static/images/raw_data_processing/UMI.png
486+
:name: umi-figure
487+
:alt: Figure UMIs
488+
:with: 100%
489+
490+
491+
UMIs reduce PCR amplification bias by tracking original molecules, but can be affected by different types of errors (blue boxes).
492+
Nucleotide substitutions in UMI tags may occur during amplification or sequencing.
493+
Multimapping can arise when reads sharing the same UMI are mapped to different genes (blue and red), when a single read maps to multiple genes (gray), or both.
494+
```
495+
485496
(raw-proc:need-for-umi-resolution)=
486497

487498
### The need for UMI resolution
@@ -490,7 +501,7 @@ In the ideal case, where the correct (unaltered) UMIs tag reads, the reads of ea
490501
Consequently, the UMI deduplication procedure is conceptually straightforward: the reads of a UMI are the PCR duplicates from a single pre-PCR molecule.
491502
The number of captured and sequenced molecules of each gene is the number of distinct UMIs observed for this gene.
492503

493-
However, the problems encountered in practice make the simple rules described above insufficient for identifying the gene origin of UMIs in general and necessitate the development of more sophisticated models:
504+
However, the problems encountered in practice make the simple rules described above insufficient for identifying the gene origin of UMIs in general and necessitate the development of more sophisticated models ({numref}`umi-figure`):
494505

495506
- **Errors in UMIs**:
496507
These occur when the sequenced UMI tag of reads contains errors introduced during PCR or the sequencing process.

0 commit comments

Comments
 (0)