Skip to content

Commit cc4fa1a

Browse files
committed
name changed
1 parent 3fb6411 commit cc4fa1a

File tree

12 files changed

+131
-50
lines changed

12 files changed

+131
-50
lines changed

README.md

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
![](artworks/twistmethnext.png)
1+
![](artworks/twistmethylflow.png)
22

33
[![DOI](https://zenodo.org/badge/490592846.svg)](https://doi.org/10.5281/zenodo.14204261)
44
[![GitBook Docs](https://img.shields.io/badge/docs-GitBook-blue?logo=gitbook)](https://jyotirmoys-organization.gitbook.io/TwistMethylFlow)
@@ -33,7 +33,7 @@ This Nextflow pipeline is designed for the analysis of Twist NGS Methylation dat
3333

3434

3535
## Pipeline Schema
36-
![](artworks/TN.png)
36+
![](artworks/TNF.png)
3737

3838
## Requirements
3939

@@ -48,7 +48,7 @@ This Nextflow pipeline is designed for the analysis of Twist NGS Methylation dat
4848

4949
3. User can also use `--skip_diff_meth` to avoid the differential methylation analysis.
5050

51-
### `--run_both_methods`
51+
### `--run_both_methods` (default: true)
5252

5353
```
5454
# when using the reference genome indexing, --genome_fasta
@@ -125,16 +125,20 @@ nextflow run JD2112/TwistMethylFlow \
125125
--gtf_file /data/Homo_sapiens.GRCh38.104.gtf \
126126
--outdir Results/TwistMethylFlow_methylKit
127127
```
128+
129+
> [!TIP] "demo data check"
130+
> Demo data runs with `hg19` reference genome. Rememeber to update the GTF/Refseq file accordingly
131+
128132
## Options:
129133

130134
| options | Description |
131135
|--------|-----------------------------------------------------------|
132-
| `--sample_sheet` | Path to the sample sheet CSV file (required) |
136+
| `--sample_sheet` | Path to the sample sheet CSV file (**required**) |
133137
| `--bismark_index` | Path to the Bismark index directory (required unless `--genome` or `--aligned_bams` is provided) |
134138
| `--genome` | Path to the reference genome FASTA file (required if `--bismark_index` not provided)|
135139
| `--aligned_bams` | Path to aligned BAM files (use this to start from aligned BAM files instead of FASTQ files) |
136-
| `--refseq_file` | Path to RefSeq file for annotation (reuired to run `both` or `methylkit`) |
137-
| `--gtf_file` | Path to GTF file for annotation (reuired to run `both` or `edger`) |
140+
| `--refseq_file` | Path to RefSeq file for annotation (**reuired** to run `both` or `methylkit`) |
141+
| `--gtf_file` | Path to GTF file for annotation (**reuired** to run `both` or `edger`) |
138142
| `--outdir` | Output directory (default: ./results) |
139143
| `--diff_meth_method` | Differential methylation method to use: 'edger' or 'methylkit' (default: edger) |
140144
| `--run_both_methods` | Run both edgeR and methylkit for differential methylation analysis (default: false) |

artworks/TNF.png

1.17 MB
Loading
File renamed without changes.

artworks/twistmethylflow.png

249 KB
Loading

data/Sample_sheet_twist.csv

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,13 @@
11
sample_id,group,read1,read2
2-
SN_1,Healthy,/data/SN1_S9_R1_001.fastq.gz,/data//SN1_S9_R2_001.fastq.gz
3-
SN_2,Disease,/data/SN2_S10_R1_001.fastq.gz,/data/SN2_S10_R2_001.fastq.gz
4-
SN_3,Healthy,/data/SN3_S11_R1_001.fastq.gz,/data/SN3_S11_R2_001.fastq.gz
5-
SN_0012,Disease,/data/SN4_S12_R1_001.fastq.gz,/data/SN4_S12_R2_001.fastq.gz
6-
SN_0013,Healthy,/data/SN5_S13_R1_001.fastq.gz,/data/SN5_S13_R2_001.fastq.gz
7-
SN_0014,Disease,/data/SN6_S14_R1_001.fastq.gz,/data/SN6_S14_R2_001.fastq.gz
8-
SN_0015,Healthy,/data/SN7_S15_R1_001.fastq.gz,/data/SN7_S15_R2_001.fastq.gz
9-
SN_0016,Disease,/data/SN8_S16_R1_001.fastq.gz,/data/SN8_S16_R2_001.fastq.gz
2+
12A,VD,FASTQ/12A_S9_L001_R1_001.fastq.gz,FASTQ/12A_S9_L001_R2_001.fastq.gz
3+
13A,CS,FASTQ/13A_S11_L001_R1_001.fastq.gz,FASTQ/13A_S11_L001_R2_001.fastq.gz
4+
1A,CS,FASTQ/1A_S1_L001_R1_001.fastq.gz,FASTQ/1A_S1_L001_R2_001.fastq.gz
5+
20A,VD,FASTQ/20A_S13_L001_R1_001.fastq.gz,FASTQ/20A_S13_L001_R2_001.fastq.gz
6+
21A,VD,FASTQ/21A_S15_L001_R1_001.fastq.gz,FASTQ/21A_S15_L001_R2_001.fastq.gz
7+
22A,VD,FASTQ/22A_S17_L001_R1_001.fastq.gz,FASTQ/22A_S17_L001_R2_001.fastq.gz
8+
23A,VD,FASTQ/23A_S19_L001_R1_001.fastq.gz,FASTQ/23A_S19_L001_R2_001.fastq.gz
9+
25A,CS,FASTQ/25A_S21_L001_R1_001.fastq.gz,FASTQ/25A_S21_L001_R2_001.fastq.gz
10+
26A,VD,FASTQ/26A_S23_L001_R1_001.fastq.gz,FASTQ/26A_S23_L001_R2_001.fastq.gz
11+
2A,CS,FASTQ/2A_S3_L001_R1_001.fastq.gz,FASTQ/2A_S3_L001_R2_001.fastq.gz
12+
3A,VD,FASTQ/3A_S5_L001_R1_001.fastq.gz,FASTQ/3A_S5_L001_R2_001.fastq.gz
13+
5A,CS,FASTQ/5A_S7_L001_R1_001.fastq.gz,FASTQ/5A_S7_L001_R2_001.fastq.gz

docs/faqs.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -99,3 +99,32 @@
9999
- [Hello Nextflow Training](https://training.nextflow.io/2.0/hello_nextflow/)
100100

101101

102+
---
103+
## Performance
104+
### Runtime, Memory, and Storage
105+
??? question "What are the typical runtime, memory, and storage requirements?"
106+
- **Runtime**: Varies based on dataset size and computational resources. For 24 paired-end samples, it can take several hours to days.
107+
- **Memory**: Ranges from 6 GB to 200 GB depending on the step. Alignment and differential methylation analysis are more memory-intensive.
108+
- **Storage**: Approximately 2 TB for 24 paired-end samples, including intermediate files and results.
109+
110+
| Process | Average Process Time | Average Wall Clock Time | Max Peak Memory | Total I/O (Read + Written) |
111+
|-------------------------------|----------------------|-------------------------|-----------------|----------------------------|
112+
| FastQC | 7m 3s | 7m 2s | 628.4 MB | 5.3 GB |
113+
| trim galore | 22m 1s | 22m 0s | 348.2 MB | 165.2 GB |
114+
| Bismark Genome Preparation | 2h 24m 38s | 2h 24m 37s | 12.8 GB | 47.8 GB |
115+
| Bismark Alignment | 14h 27m 57s | 14h 27m 56s | 10.2 GB | 229.8 GB |
116+
| Bismark Deduplication | 29m 3s | 29m 1s | 9.8 GB | 95.8 GB |
117+
| Samtools sort | 8m 32s | 8m 31s | 3.3 GB | 14.8 GB |
118+
| Samtools index | 1m 2s | 1m 1s | 21.1 MB | 4.8 GB |
119+
| Bismark Methylation extractor | 2h 56m 23s | 2h 56m 22s | 2 GB | 258.4 GB |
120+
| Qualimap | 6m 29s | 6m 29s | 12.1 GB | 4.7 GB |
121+
| Bismark report | 2.9s | 2.5s | 161.1 MB | 7.1 MB |
122+
| Multiqc | 14.1s | 13.3s | 80.5 MB | 3.0 GB |
123+
| EdgeR analysis | 46m 37s | 46m 36s | 44.9 GB | 8.9 GB |
124+
| Annotate results | 5m 23s | 5m 23s | 6.8 GB | 1.8 GB |
125+
| GO analysis EdgeR | 4m 15s | 4m 15s | 6.4 GB | 1.8 GB |
126+
| Post processing EdgeR | 18m 40s | 18m 40s | 9.3 GB | 7.3 GB |
127+
| MethylKit analysis | 2h 59m 19s | 2h 59m 19s | 37.6 GB | 9.0 GB |
128+
| GO analysis methylKit | 1m 44s | 1m 44s | 3.9 GB | 1.3 GB |
129+
| Post processing methylKit | 8m 4s | 8m 3s | 4.4 GB | 3.5 GB |
130+
File renamed without changes.

docs/images/TNF.png

1.17 MB
Loading

docs/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,6 @@
22

33
Welcome to the TwistMethylFlow documentation. This Nextflow pipeline is designed for the analysis of Twist NGS Methylation data, encompassing quality control, alignment, methylation calling, differential methylation analysis, and post-processing.
44

5-
![Pipeline Schematic](./images/Figure%201.png)
5+
![Pipeline Schematic](./images/TNF.png)
66

77
For a detailed overview, please refer to the [Overview](overview.md) section.

docs/overview.md

Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -42,14 +42,14 @@ TwistMethylFlow integrates various tools and custom scripts to provide a compreh
4242
* generates a **Chord diagram** for top 10 results from the GO analysis.
4343

4444

45-
# Read processing
45+
### 1. Read processing
4646

4747
Read processing subworkflow includes - 
4848

4949
* FASTQC - for Quality check of samples
5050
* TRIM Galore - adapter trimming
5151

52-
## FastQC
52+
#### 1.1 FastQC
5353

5454
**FASTQC** is a widely used tool for assessing the quality of raw and processed sequencing data. It provides a comprehensive quality check, including metrics like per-base quality scores, GC content, and adapter contamination.
5555

@@ -83,7 +83,7 @@ fastqc $args --threads $task.cpus $reads
8383
* Identifies frequently occurring sequences (e.g., adapters or contaminants).
8484

8585

86-
## Trim Galore
86+
#### 1.2 Trim Galore
8787

8888
**Trim Galore** is a versatile tool for trimming sequencing reads and removing adapter sequences. It’s particularly useful for preparing raw sequencing data for downstream applications like alignment or differential expression/methylation analysis. Trim Galore combines the functionalities of **Cutadapt** and **FastQC** for quality control and trimming.
8989

@@ -100,9 +100,9 @@ trim_galore --paired --cores $task.cpus $args $reads
100100
* `--fastqc`: Run **FastQC** before and after trimming.
101101
* `--cores <number>`: Use multiple cores for faster processing.
102102

103-
# Bismark Analysis
103+
### 2. Bismark Analysis
104104

105-
## Reference Genome Preparation
105+
#### 2.1 Reference Genome Preparation
106106

107107
Bismark needs to prepare the bisulfite index for the genome.
108108

@@ -137,7 +137,7 @@ bismark_genome_preparation --bowtie2 --parallel 4 <genome.fasta>
137137
...
138138
```
139139

140-
## Bismark Alignment
140+
#### 2.2 Bismark Alignment
141141

142142
This step aligns bisulfite-treated sequencing reads to a reference genome.
143143

@@ -155,7 +155,7 @@ bismark --genome <path_to_reference_genome> -1 <reads_R1.fq> -2 <reads_R2.fq> -o
155155
* Produces `.report.txt` and
156156
* `unmapped_reads.fq.gz` file.
157157

158-
## Bismark Deduplication
158+
#### 2.3 Bismark Deduplication
159159
This step removes duplicate reads to avoid overestimating methylation levels.
160160

161161
```
@@ -172,7 +172,7 @@ deduplicate_bismark ${paired_end} $args --bam $bam
172172
* Produces `deduplicated_report.txt` file.
173173

174174

175-
## Bismark Methylation Extractor
175+
#### 2.4 Bismark Methylation Extractor
176176

177177
Extract methylation data from deduplicated BAM files.
178178

@@ -192,7 +192,7 @@ bismark_methylation_extractor \
192192
* Also generates `splitting_report.txt` file.
193193

194194

195-
## Bismark Report
195+
#### 2.5 Bismark Report
196196
Generate a summary report of alignment and methylation statistics.
197197

198198
**Command:**
@@ -207,7 +207,7 @@ bismark2report
207207
* Duplicate rates.
208208
* Methylation levels (CpG, CHG, CHH contexts).
209209

210-
# Alignment Quality Mapping
210+
### 3. Alignment Quality Mapping
211211

212212
The main module for assessing alignment quality is `qualimap bamqc`.
213213

@@ -242,7 +242,7 @@ qualimap bamqc \
242242
* Distribution of mapping quality scores.
243243

244244

245-
# QC Reporting
245+
### 4. QC Reporting
246246

247247
_MultiQC_ is used for the QC reporting combining all results from the _FastQC, Trim galore, Bismark Alignment, Bismark Deduplication, Bismark summary report,_ and _Qualimap results._
248248

@@ -251,13 +251,13 @@ _MultiQC_ is used for the QC reporting combining all results from the _FastQC, T
251251
* Generates an interactive HTML report (`multiqc_report.html`) and a data file (`multiqc_data.json`).
252252
* Output includes summary statistics, plots, and tool-specific metrics.
253253

254-
# Differential Methylation Analysis
254+
### 5. Differential Methylation Analysis
255255
To calculate the differential methylation from the input samples, two different methods can be used -&#x20;
256256

257257
* [EdgeR](#edger) (Default) or
258258
* [MethylKit](#methylkit)
259259

260-
## EdgeR
260+
#### 5.1 EdgeR
261261

262262
**edgeR** is a Bioconductor package primarily used for RNA-seq differential expression analysis but can also handle differential methylation analysis when paired with bisulfite sequencing data. This requires pre-processed methylation data, such as counts of methylated (`M`) and unmethylated (`U`) reads at each cytosine position or region of interest.
263263

@@ -280,7 +280,7 @@ Rscript $baseDir/bin/edgeR_analysis.R \
280280
* Generates `EdgeR_group_<compare_str>.csv`.
281281

282282

283-
## MethylKit
283+
#### 5.2 MethylKit
284284
**MethylKit** is an R package designed for analyzing bisulfite sequencing data, particularly for differential methylation analysis. It supports genome-wide methylation data and is ideal for CpG, CHH, and CHG methylation studies.
285285

286286
```
@@ -301,7 +301,7 @@ Rscript $baseDir/bin/run_methylkit.R \
301301
??? note "MethylKit Results"
302302
* Generates `Methylkit_group_<compare_str>.csv`
303303

304-
# Post-processing
304+
### 6. Post-processing
305305

306306
Generates A) **Volcano Plot,** B) **MA Plot** and C) **Summary Statistics** from the Diffrential Methylation results.
307307

@@ -313,7 +313,7 @@ Generates A) **Volcano Plot,** B) **MA Plot** and C) **Summary Statistics** from
313313
![MA plot](./images/ma_plot.png)
314314

315315

316-
# Gene Ontology Analysis
316+
### 7. Gene Ontology Analysis
317317
The pipeline has also a module to perform the Gene Ontology analysis from the top `n` corresponding genes from the differential methylation results (EdgeR/MethylKit) using the _clusterProfiler_ package.
318318

319319
The results generates a full table with all _Biological Processes_ and a _Chord diagram_ with top 10 functions identified in the analysis.

0 commit comments

Comments
 (0)