JD2112
diff --git a/‎README.md‎
Lines changed: 10 additions & 6 deletions b/‎README.md‎
Lines changed: 10 additions & 6 deletions
diff --git a/‎artworks/TNF.png‎
1.17 MB b/‎artworks/TNF.png‎
1.17 MB
diff --git a/‎artworks/TN.png‎ renamed to ‎artworks/TN_old.png‎ b/‎artworks/TN.png‎ renamed to ‎artworks/TN_old.png‎
diff --git a/‎artworks/twistmethylflow.png‎
249 KB b/‎artworks/twistmethylflow.png‎
249 KB
diff --git a/‎data/Sample_sheet_twist.csv‎
Lines changed: 12 additions & 8 deletions b/‎data/Sample_sheet_twist.csv‎
Lines changed: 12 additions & 8 deletions
diff --git a/‎docs/faqs.md‎
Lines changed: 29 additions & 0 deletions b/‎docs/faqs.md‎
Lines changed: 29 additions & 0 deletions
diff --git a/‎docs/images/TN.png‎ renamed to ‎docs/images/TMN_old.png‎ b/‎docs/images/TN.png‎ renamed to ‎docs/images/TMN_old.png‎
diff --git a/‎docs/images/TNF.png‎
1.17 MB b/‎docs/images/TNF.png‎
1.17 MB
diff --git a/‎docs/index.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/index.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/overview.md‎
Lines changed: 16 additions & 16 deletions b/‎docs/overview.md‎
Lines changed: 16 additions & 16 deletions
@@ -1,4 +1,4 @@
-![](artworks/twistmethnext.png)
+![](artworks/twistmethylflow.png)
 
 [![DOI](https://zenodo.org/badge/490592846.svg)](https://doi.org/10.5281/zenodo.14204261)
 [![GitBook Docs](https://img.shields.io/badge/docs-GitBook-blue?logo=gitbook)](https://jyotirmoys-organization.gitbook.io/TwistMethylFlow)
@@ -33,7 +33,7 @@ This Nextflow pipeline is designed for the analysis of Twist NGS Methylation dat
 
 
 ## Pipeline Schema
-![](artworks/TN.png)
+![](artworks/TNF.png)
 
 ## Requirements
 
@@ -48,7 +48,7 @@ This Nextflow pipeline is designed for the analysis of Twist NGS Methylation dat
 
 3. User can also use `--skip_diff_meth` to avoid the differential methylation analysis.
 
-### `--run_both_methods`
+### `--run_both_methods` (default: true)
 
 ```
 # when using the reference genome indexing, --genome_fasta
@@ -125,16 +125,20 @@ nextflow run JD2112/TwistMethylFlow \
     --gtf_file /data/Homo_sapiens.GRCh38.104.gtf \
     --outdir Results/TwistMethylFlow_methylKit 
 ```
+
+> [!TIP] "demo data check"
+> Demo data runs with `hg19` reference genome. Rememeber to update the GTF/Refseq file accordingly
+
 ## Options:
 
 | options | Description |
 |--------|-----------------------------------------------------------|
-| `--sample_sheet`       | Path to the sample sheet CSV file (required) |                                           
+| `--sample_sheet`       | Path to the sample sheet CSV file (**required**) |                                           
 | `--bismark_index`      | Path to the Bismark index directory (required unless `--genome` or `--aligned_bams` is provided) |
 | `--genome`             | Path to the reference genome FASTA file (required if `--bismark_index` not provided)| 
 | `--aligned_bams`       | Path to aligned BAM files (use this to start from aligned BAM files instead of FASTQ files) |
-| `--refseq_file`        | Path to RefSeq file for annotation (reuired to run `both` or `methylkit`)  |
-| `--gtf_file`           | Path to GTF file for annotation (reuired to run `both` or `edger`)  |
+| `--refseq_file`        | Path to RefSeq file for annotation (**reuired** to run `both` or `methylkit`)  |
+| `--gtf_file`           | Path to GTF file for annotation (**reuired** to run `both` or `edger`)  |
 | `--outdir`             | Output directory (default: ./results) |
 | `--diff_meth_method`   | Differential methylation method to use: 'edger' or 'methylkit' (default: edger) | 
 | `--run_both_methods`   | Run both edgeR and methylkit for differential methylation analysis (default: false) | 
 
@@ -1,9 +1,13 @@
 sample_id,group,read1,read2
-SN_1,Healthy,/data/SN1_S9_R1_001.fastq.gz,/data//SN1_S9_R2_001.fastq.gz
-SN_2,Disease,/data/SN2_S10_R1_001.fastq.gz,/data/SN2_S10_R2_001.fastq.gz
-SN_3,Healthy,/data/SN3_S11_R1_001.fastq.gz,/data/SN3_S11_R2_001.fastq.gz
-SN_0012,Disease,/data/SN4_S12_R1_001.fastq.gz,/data/SN4_S12_R2_001.fastq.gz
-SN_0013,Healthy,/data/SN5_S13_R1_001.fastq.gz,/data/SN5_S13_R2_001.fastq.gz
-SN_0014,Disease,/data/SN6_S14_R1_001.fastq.gz,/data/SN6_S14_R2_001.fastq.gz
-SN_0015,Healthy,/data/SN7_S15_R1_001.fastq.gz,/data/SN7_S15_R2_001.fastq.gz
-SN_0016,Disease,/data/SN8_S16_R1_001.fastq.gz,/data/SN8_S16_R2_001.fastq.gz
+12A,VD,FASTQ/12A_S9_L001_R1_001.fastq.gz,FASTQ/12A_S9_L001_R2_001.fastq.gz
+13A,CS,FASTQ/13A_S11_L001_R1_001.fastq.gz,FASTQ/13A_S11_L001_R2_001.fastq.gz
+1A,CS,FASTQ/1A_S1_L001_R1_001.fastq.gz,FASTQ/1A_S1_L001_R2_001.fastq.gz
+20A,VD,FASTQ/20A_S13_L001_R1_001.fastq.gz,FASTQ/20A_S13_L001_R2_001.fastq.gz
+21A,VD,FASTQ/21A_S15_L001_R1_001.fastq.gz,FASTQ/21A_S15_L001_R2_001.fastq.gz
+22A,VD,FASTQ/22A_S17_L001_R1_001.fastq.gz,FASTQ/22A_S17_L001_R2_001.fastq.gz
+23A,VD,FASTQ/23A_S19_L001_R1_001.fastq.gz,FASTQ/23A_S19_L001_R2_001.fastq.gz
+25A,CS,FASTQ/25A_S21_L001_R1_001.fastq.gz,FASTQ/25A_S21_L001_R2_001.fastq.gz
+26A,VD,FASTQ/26A_S23_L001_R1_001.fastq.gz,FASTQ/26A_S23_L001_R2_001.fastq.gz
+2A,CS,FASTQ/2A_S3_L001_R1_001.fastq.gz,FASTQ/2A_S3_L001_R2_001.fastq.gz
+3A,VD,FASTQ/3A_S5_L001_R1_001.fastq.gz,FASTQ/3A_S5_L001_R2_001.fastq.gz
+5A,CS,FASTQ/5A_S7_L001_R1_001.fastq.gz,FASTQ/5A_S7_L001_R2_001.fastq.gz
@@ -99,3 +99,32 @@
     - [Hello Nextflow Training](https://training.nextflow.io/2.0/hello_nextflow/)
 
 
+---
+## Performance
+### Runtime, Memory, and Storage
+??? question "What are the typical runtime, memory, and storage requirements?"
+    - **Runtime**: Varies based on dataset size and computational resources. For 24 paired-end samples, it can take several hours to days.
+    - **Memory**: Ranges from 6 GB to 200 GB depending on the step. Alignment and differential methylation analysis are more memory-intensive.
+    - **Storage**: Approximately 2 TB for 24 paired-end samples, including intermediate files and results.
+    
+    | Process                       | Average Process Time | Average Wall Clock Time | Max Peak Memory | Total I/O (Read + Written) |
+    |-------------------------------|----------------------|-------------------------|-----------------|----------------------------|
+    | FastQC                        | 7m 3s                | 7m 2s                   | 628.4 MB        | 5.3 GB                     |
+    | trim galore                   | 22m 1s               | 22m 0s                  | 348.2 MB        | 165.2 GB                   |
+    | Bismark Genome Preparation    | 2h 24m 38s           | 2h 24m 37s              | 12.8 GB         | 47.8 GB                    |
+    | Bismark Alignment             | 14h 27m 57s          | 14h 27m 56s             | 10.2 GB         | 229.8 GB                   |
+    | Bismark Deduplication         | 29m 3s               | 29m 1s                  | 9.8 GB          | 95.8 GB                    |
+    | Samtools sort                 | 8m 32s               | 8m 31s                  | 3.3 GB          | 14.8 GB                    |
+    | Samtools index                | 1m 2s                | 1m 1s                   | 21.1 MB         | 4.8 GB                     |
+    | Bismark Methylation extractor | 2h 56m 23s           | 2h 56m 22s              | 2 GB            | 258.4 GB                   |
+    | Qualimap                      | 6m 29s               | 6m 29s                  | 12.1 GB         | 4.7 GB                     |
+    | Bismark report                | 2.9s                 | 2.5s                    | 161.1 MB        | 7.1 MB                     |
+    | Multiqc                       | 14.1s                | 13.3s                   | 80.5 MB         | 3.0 GB                     |
+    | EdgeR analysis                | 46m 37s              | 46m 36s                 | 44.9 GB         | 8.9 GB                     |
+    | Annotate results              | 5m 23s               | 5m 23s                  | 6.8 GB          | 1.8 GB                     |
+    | GO analysis EdgeR             | 4m 15s               | 4m 15s                  | 6.4 GB          | 1.8 GB                     |
+    | Post processing EdgeR         | 18m 40s              | 18m 40s                 | 9.3 GB          | 7.3 GB                     |
+    | MethylKit analysis            | 2h 59m 19s           | 2h 59m 19s              | 37.6 GB         | 9.0 GB                     |
+    | GO analysis methylKit         | 1m 44s               | 1m 44s                  | 3.9 GB          | 1.3 GB                     |
+    | Post processing methylKit     | 8m 4s                | 8m 3s                   | 4.4 GB          | 3.5 GB                     |
+
@@ -2,6 +2,6 @@
 
 Welcome to the TwistMethylFlow documentation. This Nextflow pipeline is designed for the analysis of Twist NGS Methylation data, encompassing quality control, alignment, methylation calling, differential methylation analysis, and post-processing.
 
-![Pipeline Schematic](./images/Figure%201.png)
+![Pipeline Schematic](./images/TNF.png)
 
 For a detailed overview, please refer to the [Overview](overview.md) section.
@@ -42,14 +42,14 @@ TwistMethylFlow integrates various tools and custom scripts to provide a compreh
   * generates a **Chord diagram** for top 10 results from the GO analysis.
 
 
-# Read processing
+### 1. Read processing
 
 Read processing subworkflow includes -&#x20;
 
 * FASTQC - for Quality check of samples
 * TRIM Galore - adapter trimming
 
-## FastQC
+#### 1.1 FastQC
 
 **FASTQC** is a widely used tool for assessing the quality of raw and processed sequencing data. It provides a comprehensive quality check, including metrics like per-base quality scores, GC content, and adapter contamination.
 
@@ -83,7 +83,7 @@ fastqc $args --threads $task.cpus $reads
     * Identifies frequently occurring sequences (e.g., adapters or contaminants).
 
 
-## Trim Galore
+#### 1.2 Trim Galore
 
 **Trim Galore** is a versatile tool for trimming sequencing reads and removing adapter sequences. It’s particularly useful for preparing raw sequencing data for downstream applications like alignment or differential expression/methylation analysis. Trim Galore combines the functionalities of **Cutadapt** and **FastQC** for quality control and trimming.
 
@@ -100,9 +100,9 @@ trim_galore --paired --cores $task.cpus $args $reads
 * `--fastqc`: Run **FastQC** before and after trimming.
 * `--cores <number>`: Use multiple cores for faster processing.
 
-# Bismark Analysis
+### 2. Bismark Analysis
 
-## Reference Genome Preparation
+#### 2.1 Reference Genome Preparation
 
 Bismark needs to prepare the bisulfite index for the genome.
 
@@ -137,7 +137,7 @@ bismark_genome_preparation --bowtie2 --parallel 4 <genome.fasta>
             ...
     ```
 
-## Bismark Alignment
+#### 2.2 Bismark Alignment
 
 This step aligns bisulfite-treated sequencing reads to a reference genome.
 
@@ -155,7 +155,7 @@ bismark --genome <path_to_reference_genome> -1 <reads_R1.fq> -2 <reads_R2.fq> -o
   * Produces `.report.txt`  and
   * `unmapped_reads.fq.gz` file.
 
-## Bismark Deduplication
+#### 2.3 Bismark Deduplication
 This step removes duplicate reads to avoid overestimating methylation levels.
 
 ```
@@ -172,7 +172,7 @@ deduplicate_bismark ${paired_end} $args --bam $bam
   * Produces `deduplicated_report.txt` file.
 
 
-## Bismark Methylation Extractor
+#### 2.4 Bismark Methylation Extractor
 
 Extract methylation data from deduplicated BAM files.
 
@@ -192,7 +192,7 @@ bismark_methylation_extractor \
   * Also generates `splitting_report.txt` file.
 
 
-## Bismark Report
+#### 2.5 Bismark Report
 Generate a summary report of alignment and methylation statistics.
 
 **Command:**
@@ -207,7 +207,7 @@ bismark2report
         * Duplicate rates.
         * Methylation levels (CpG, CHG, CHH contexts).
 
-# Alignment Quality Mapping
+### 3. Alignment Quality Mapping
 
 The main module for assessing alignment quality is `qualimap bamqc`.
 
@@ -242,7 +242,7 @@ qualimap bamqc \
        * Distribution of mapping quality scores.
 
 
-# QC Reporting
+### 4. QC Reporting
 
 _MultiQC_ is used for the QC reporting combining all results from the _FastQC, Trim galore, Bismark Alignment, Bismark Deduplication, Bismark summary report,_ and _Qualimap results._
 
@@ -251,13 +251,13 @@ _MultiQC_ is used for the QC reporting combining all results from the _FastQC, T
 * Generates an interactive HTML report (`multiqc_report.html`) and a data file (`multiqc_data.json`).
 * Output includes summary statistics, plots, and tool-specific metrics.
 
-# Differential Methylation Analysis
+### 5. Differential Methylation Analysis
 To calculate the differential methylation from the input samples, two different methods can be used -&#x20;
 
 * [EdgeR](#edger) (Default) or
 * [MethylKit](#methylkit)
 
-## EdgeR
+#### 5.1 EdgeR
 
 **edgeR** is a Bioconductor package primarily used for RNA-seq differential expression analysis but can also handle differential methylation analysis when paired with bisulfite sequencing data. This requires pre-processed methylation data, such as counts of methylated (`M`) and unmethylated (`U`) reads at each cytosine position or region of interest.
 
@@ -280,7 +280,7 @@ Rscript $baseDir/bin/edgeR_analysis.R \
     * Generates `EdgeR_group_<compare_str>.csv`.
 
 
-## MethylKit
+#### 5.2 MethylKit
 **MethylKit** is an R package designed for analyzing bisulfite sequencing data, particularly for differential methylation analysis. It supports genome-wide methylation data and is ideal for CpG, CHH, and CHG methylation studies.
 
 ```
@@ -301,7 +301,7 @@ Rscript $baseDir/bin/run_methylkit.R \
 ??? note "MethylKit Results"
     * Generates `Methylkit_group_<compare_str>.csv` 
 
-# Post-processing
+### 6. Post-processing
 
 Generates A) **Volcano Plot,** B) **MA Plot** and C) **Summary Statistics** from the Diffrential Methylation results.
 
@@ -313,7 +313,7 @@ Generates A) **Volcano Plot,** B) **MA Plot** and C) **Summary Statistics** from
     ![MA plot](./images/ma_plot.png)
 
 
-# Gene Ontology Analysis
+### 7. Gene Ontology Analysis
 The pipeline has also a module to perform the Gene Ontology analysis from the top `n` corresponding genes from the differential methylation results (EdgeR/MethylKit) using the _clusterProfiler_ package.
 
 The results generates a full table with all _Biological Processes_ and a _Chord diagram_ with top 10 functions identified in the analysis.