|
1 |
| -# General settings |
2 |
| -To configure this workflow, modify ``config/config.yaml`` according to your needs, following the explanations provided in the file. |
| 1 | +# General configuration |
3 | 2 |
|
4 |
| -# Sample and unit sheet |
| 3 | +To configure this workflow, modify `config/config.yaml` according to your needs, following the explanations provided in the file. |
5 | 4 |
|
6 |
| -* Add samples to `config/samples.tsv`. For each sample, the columns `sample_name`, and `condition` have to be defined. The `condition` (healthy/tumor, before Treatment / after Treatment) will be used as contrast for the DEG analysis in DESeq2. To include other relevant variables such as batches, add a new column to the sheet. |
7 |
| -* For each sample, add one or more sequencing units (runs, lanes or replicates) to the unit sheet `config/units.tsv`. By activating or deactivating `mergeReads` in the `config/config.yaml`, you can decide wether to merge replicates or run them individually. For each unit, define adapters, and either one (column `fq1`) or two (columns `fq1`, `fq2`) FASTQ files (these can point to anywhere in your system). Alternatively, you can define an SRA (sequence read archive) accession (starting with e.g. ERR or SRR) by using a column `sra`. In the latter case, the pipeline will automatically download the corresponding paired end reads from SRA. If both local files and SRA accession are available, the local files will be preferred. |
8 |
| -To choose the correct geneCounts produced by STAR, you can define the strandedness of a unit. STAR produces counts for unstranded ('None' - default), forward oriented ('yes') and reverse oriented ('reverse') protocols. |
| 5 | +## `DESeq2` differential expression analysis configuration |
9 | 6 |
|
| 7 | +To successfully run the differential expression analysis, you will need to tell DESeq2 which sample annotations to use (annotations are columns in the `samples.tsv` file described below). |
| 8 | +This is done in the `config.yaml` file with the entries under `diffexp:`. |
| 9 | +The comments for the entries should give all the necessary infos and linkouts. |
| 10 | +But if in doubt, please also consult the [`DESeq2` manual](https://www.bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html). |
| 11 | + |
| 12 | +# Sample and unit setup |
| 13 | + |
| 14 | +The sample and unit setup is specified via tab-separated tabular files (`.tsv`). |
10 | 15 | Missing values can be specified by empty columns or by writing `NA`.
|
11 | 16 |
|
12 |
| -# DESeq scenario |
| 17 | +## sample sheet |
| 18 | + |
| 19 | +The default sample sheet is `config/samples.tsv` (as configured in `config/config.yaml`). |
| 20 | +Each sample refers to an actual physical sample, and replicates (both biological and technical) may be specified as separate samples. |
| 21 | +For each sample, you will always have to specify a `sample_name`. |
| 22 | +In addition, all `variables_of_interest` and `batch_effects` specified in the `config/config.yaml` under the `diffexp:` entry, will have to have corresponding columns in the `config/samples.tsv`. |
| 23 | +Finally, the sample sheet can contain any number of additional columns. |
| 24 | +So if in doubt about whether you might at some point need some metadata you already have at hand, just put it into the sample sheet already---your future self will thank you. |
| 25 | + |
| 26 | +## unit sheet |
| 27 | + |
| 28 | +The default unit sheet is `config/units.tsv` (as configured in `config/config.yaml`). |
| 29 | +For each sample, add one or more sequencing units (for example if you have several runs or lanes per sample). |
| 30 | + |
| 31 | +### `.fastq` file source |
| 32 | + |
| 33 | +For each unit, you will have to define a source for your `.fastq` files. |
| 34 | +This can be done via the columns `fq1`, `fq2` and `sra`, with either of: |
| 35 | +1. A single `.fastq` file for single-end reads (`fq1` column only; `fq2` and `sra` columns present, but empty). |
| 36 | + The entry can be any path on your system, but we suggest something like a `raw/` data directory within your analysis directory. |
| 37 | +2. Two `.fastq` files for paired-end reads (columns `fq1` and `fq2`; column `sra` present, but empty). |
| 38 | + As for the `fq1` column, the `fq2` column can also point to anywhere on your system. |
| 39 | +3. A sequence read archive (SRA) accession number (`sra` column only; `fq1` and `fq2` columns present, but empty). |
| 40 | + The workflow will automatically download the corresponding `.fastq` data (currently assumed to be paired-end). |
| 41 | + The accession numbers usually start with SRR or ERR and you can find accession numbers for studies of interest with the [SRA Run Selector](https://trace.ncbi.nlm.nih.gov/Traces/study/). |
| 42 | +If both local files and an SRA accession are specified for the same unit, the local files will be used. |
| 43 | + |
| 44 | +### adapter trimming |
| 45 | + |
| 46 | +If you set `trimming: activate:` in the `config/config.yaml` to `True`, you will have to provide at least one `cutadapt` adapter argument for each unit in the `adapters` column of the `units.tsv` file. |
| 47 | +You will need to find out the adapters used in the sequencing protocol that generated a unit: from your sequencing provider, or for published data from the study's metadata (or its authors). |
| 48 | +Then, enter the adapter sequences into the `adapters` column of that unit, preceded by the [correct `cutadapt` adapter argument](https://cutadapt.readthedocs.io/en/stable/guide.html#adapter-types). |
13 | 49 |
|
14 |
| -To initialize the DEG analysis, you need to define a model in the `config/config.yaml`. The model can include all variables introduced as columns in `config/samples.tsv`. |
15 |
| -* The standard model is `~condition` - to include a batch variable, write `~batch + condition`. |
| 50 | +### strandedness of library preparation protocol |
16 | 51 |
|
| 52 | +To get the correct `geneCounts` from `STAR` output, you can provide information on the strandedness of the library preparation protocol used for a unit. |
| 53 | +`STAR` can produce counts for unstranded (`none` - this is the default), forward oriented (`yes`) and reverse oriented (`reverse`) protocols. |
| 54 | +Enter the respective value into a `strandedness` column in the `units.tsv` file. |
0 commit comments