You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+25-9Lines changed: 25 additions & 9 deletions
Original file line number
Diff line number
Diff line change
@@ -1,4 +1,4 @@
1
-
A tool to GENerate COnsensus REads.
1
+
A fast tool to remove sequencing duplications and eliminate sequencing errors by generating consensus reads.
2
2
*[What's gencore](#whats-gencore)
3
3
*[A quick example](#a-quick-example)
4
4
*[Download, compile and install](#get-gencore)
@@ -10,24 +10,29 @@ A tool to GENerate COnsensus REads.
10
10
*[Read/cite gencore paper](#citation)
11
11
12
12
# what's gencore?
13
-
`gencore` is a tool to generate consensus reads from next-generation sequencing (NGS) data. It groups the reads derived from the same original DNA template, merges them and generates a consensus read, which contains much less errors than the original reads.
13
+
`gencore` is a tool for fast and powerful deduplication for next-generation sequencing (NGS) data. It is much faster and uses much less memory than Picard and other tools. It generates very informative reports in both HTML and JSON formats. It's based on an algorithm for `generating consensus reads`, and that's why it's named `gencore`.
14
14
15
-
This tool groups the reads of same origin by their mapping positions and unique molecular identifiers (UMI). It can run with or without UMI. If your FASTQ data has UMI integrated, you can use [fastp](https://github.com/OpenGene/fastp) to shift the UMI to read query names, and use `gencore` to generate consensus reads.
15
+
Basically, `gencore` groups the reads derived from the same original DNA template, merges them by generating a consensus read, which contains much less errors than the original reads.
16
+
17
+
`gencore` supports the data with unique molecular identifiers (UMI). If your FASTQ data has UMI integrated, you can use [fastp](https://github.com/OpenGene/fastp) to shift the UMI to read query names, and use `gencore` to generate consensus reads.
16
18
17
19
This tool can eliminate the errors introduced by library preparation and sequencing processes, and consenquently reduce the false positives for downstream variant calling. This tool can also be used to remove duplicated reads. Since it generates consensus reads from duplicated reads, it outputs much cleaner data than conventional duplication remover. ***Due to these advantages, it is especially useful for processing ultra-deep sequencing data for cancer samples.***
18
20
19
21
`gencore` accepts a sorted BAM/SAM with its corresponding reference fasta as input, and outputs an unsorted BAM/SAM.
20
22
21
-
# Take a quick glance of the informative report
23
+
# take a quick glance of the informative report
22
24
* Sample HTML report: http://opengene.org/gencore/gencore.html
* After the processing is finished, check the `gencore.html` and `gencore.json` in the working directory. The option `--coverage_sampling=50000` is to change the default setting (coverage_sampling=10000) to generate smaller report files by reduce coverage sampling rate.
31
36
32
37
# quick examples
33
38
The simplest way
@@ -38,7 +43,7 @@ With a BED file to specify the capturing regions
@@ -79,6 +84,17 @@ As described above, gencore can eliminate the errors introduced by library prepa
79
84
80
85
***This is the image showing the result of gencore processed BAM. It becomes much cleaner. Cheers!***
81
86
87
+
# QC result reported by gencore
88
+
gencore also performs some quality control when processing deduplication and generating consensus reads. Basically it reports mapping rate, duplication rate, mismatch rate and some statisticical results. Especially, gencore reports the coverate statistics of input BAM file in genome scale, and in capturing regions (if a BED file is specified).
89
+
90
+
gencore reports the results both in HTML format and JSON format for manually checking and downstream analysis. See the examples of interactive [HTML](http://opengene.org/gencore/gencore.html) report and [JSON](http://opengene.org/gencore/gencore.html) reports.
0 commit comments