You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
> Note: Rather than running the command below to create the runsheet needed for processing, the runsheet may also be created manually by following the [file specification](../Workflow_Documentation/NF_RCP-F/examples/runsheet/README.md).
1116
+
> Note: Rather than running the command below to create the runsheet needed for processing, the runsheet may also be created manually by following the [file specification](../Workflow_Documentation/NF_RCP/examples/runsheet/README.md).
1117
1117
1118
1118
```bash
1119
-
### Download the *ISA.zip file from the GeneLab Repository ###
1119
+
### Download the *ISA.zip file from the Open Science Data Repository ###
-*ISA.zip (compressed ISA directory containing Investigation, Study, and Assay (ISA) metadata files for the respective GLDS dataset, used to define sample groups - the *ISA.zip file is located in the [OSDR repository]([https://genelab-data.ndc.nasa.gov/genelab/projects](https://osdr.nasa.gov/bio/repo/)) under 'Files' -> 'Study Metadata Files')
1147
+
-*ISA.zip (compressed ISA directory containing Investigation, Study, and Assay (ISA) metadata files for the respective GLDS dataset, used to define sample groups - the *ISA.zip file is located in the [OSDR repository](https://osdr.nasa.gov/bio/repo/) under 'Files' -> 'Study Metadata Files')
1148
1148
1149
1149
-**{GLDS-Accession-ID}_bulkRNASeq_v{version}_runsheet.csv** (table containing metadata required for processing, version denotes the dp_tools schema used to specify the metadata to extract from the ISA archive)
Copy file name to clipboardExpand all lines: RNAseq/Workflow_Documentation/NF_RCP/README.md
+25-28Lines changed: 25 additions & 28 deletions
Original file line number
Diff line number
Diff line change
@@ -144,23 +144,23 @@ While in the location containing the `NF_RCP-G_2.0.0` directory that was downloa
144
144
```bash
145
145
nextflow run NF_RCP-G_2.0.0/main.nf \
146
146
-profile singularity \
147
-
--gldsAccession OSD-194
147
+
--accession OSD-194
148
148
```
149
149
150
150
<br>
151
151
152
152
#### 4b. Approach 2: Run the workflow on a GeneLab RNAseq dataset using local reference fasta and gtf files
153
153
154
-
> Note: The `--ref_source` and `--ensemblVersion` parameters should match the reference source and version number of the local reference fasta and gtf files used
154
+
> Note: The `--reference_source` and `--reference_version` parameters should match the reference source and version number of the local reference fasta and gtf files used
155
155
156
156
```bash
157
157
nextflow run NF_RCP-G_2.0.0/main.nf \
158
158
-profile singularity \
159
-
--gldsAccession OSD-194 \
160
-
--ensemblVersion 107 \
161
-
--ref_source ensembl \
162
-
--ref_fasta </path/to/fasta> \
163
-
--ref_gtf </path/to/gtf>
159
+
--accession OSD-194 \
160
+
--reference_version 107 \
161
+
--reference_source ensembl \
162
+
--reference_fasta </path/to/fasta> \
163
+
--reference_gtf </path/to/gtf>
164
164
```
165
165
166
166
<br>
@@ -172,8 +172,8 @@ nextflow run NF_RCP-G_2.0.0/main.nf \
172
172
```bash
173
173
nextflow run NF_RCP-G_2.0.0/main.nf \
174
174
-profile singularity \
175
-
--gldsAccession output_directory \
176
-
--runsheetPath </path/to/runsheet>
175
+
--accession output_directory \
176
+
--runsheet_path </path/to/runsheet>
177
177
```
178
178
179
179
<br>
@@ -184,43 +184,39 @@ nextflow run NF_RCP-G_2.0.0/main.nf \
184
184
185
185
* `-profile` - Specifies the configuration profile(s) to load, `singularity` instructs Nextflow to setup and use singularity for all software called in the workflow
186
186
187
-
* `--gldsAccession OSD-###` – specifies the OSD dataset to process through the RCP workflow (replace ### with the OSD number)
188
-
> Note: The primary output directory will be titled "OSD-###"
189
-
190
-
* `--gldsAccession output_directory` – specifies the output directory name to use when processing a non-OSD dataset, as indicated in [Approach 3 above](#4c-approach-3-run-the-workflow-on-a-non-glds-dataset-using-a-user-created-runsheet)
187
+
* `--accession [OSD-###|GLDS-###]` – specifies the OSDR dataset to process through the RCP workflow (replace ### with the OSD or GLDS number)
188
+
> Note: The primary output directory will be named after the accession input, e.g. "OSD-194" or "GLDS-194"
191
189
192
190
193
191
<br>
194
192
195
193
**Additional Required Parameters For [Approach 2](#4b-approach-2-run-the-workflow-on-a-genelab-rnaseq-dataset-using-local-ensembl-reference-fasta-and-gtf-files):**
196
194
197
-
* `--ensemblVersion` - specifies the Ensembl version to use for the reference genome (Ensembl release `107` is used in this example)
195
+
* `--reference_version` - specifies the Ensembl version to use for the reference genome (Ensembl release `107` is used in this example)
198
196
199
-
* `--ref_source` - specifies the source of the reference files used (the source indicated in the Approach 2 example is `ensembl`)
197
+
* `--reference_source` - specifies the source of the reference files used (the source indicated in the Approach 2 example is `ensembl`)
200
198
201
-
* `--ref_fasta` - specifices the path to a local fasta file
199
+
* `--reference_fasta` - specifices the path to a local fasta file
202
200
203
-
* `--ref_gtf` - specifices the path to a local gtf file
201
+
* `--reference_gtf` - specifices the path to a local gtf file
204
202
205
-
> Note: If the local reference files specified are different than the Ensembl reference files used to create the [GeneLab annotations table](https://github.com/nasa/GeneLab_Data_Processing/blob/master/GeneLab_Reference_Annotations/Pipeline_GL-DPPD-7110_Versions/GL-DPPD-7110/GL-DPPD-7110_annotations.csv), additional gene annotations associated with any Ensembl/TAIR IDs from the specified files that are not shared in the GeneLab annotations will not be added to the DGE output table(s).
203
+
> Note: If the local reference files specified are different than the reference files used to create the [GeneLab annotations table](https://github.com/nasa/GeneLab_Data_Processing/blob/master/GeneLab_Reference_Annotations/Pipeline_GL-DPPD-7110_Versions/GL-DPPD-7110/GL-DPPD-7110_annotations.csv), additional gene annotations associated with any gene IDs from the specified files that are not shared in the GeneLab annotations will not be added to the DGE output table(s).
206
204
207
205
<br>
208
206
209
207
**Optional Parameters:**
210
208
211
-
* `--skipVV` - skip the automated V&V processes (Default: the automated V&V processes are active)
209
+
* `--skip_vv` - skip the automated V&V processes (Default: the automated V&V processes are active)
212
210
213
-
* `--outputDir` - specifies the directory to save the raw and processed data files (Default: files are saved in the launch directory)
211
+
* `--outdir` - specifies the directory to save the raw and processed data files (Default: files are saved in a folder named `results` created in the launch directory)
214
212
215
213
* `--force_single_end` - forces the analysis to use single end processing; for paired end datasets, this means only R1 is used; for single end datasets, this should have no effect
216
214
217
-
* `--stageLocal TRUE|FALSE` - TRUE = download the raw reads files for the OSD dataset indicated, FALSE = disable raw reads download and processing (Default: TRUE)
218
-
219
-
* `--referenceStorePath` - specifies the directory to store the Ensembl fasta and gtf files (Default: within the directory structure created by default in the launch directory)
215
+
* `--reference_store_path` - specifies the directory to store the Ensembl fasta and gtf files (Default: within the directory structure created by default in the launch directory)
220
216
221
-
* `--derivedStorePath` - specifies the directory to store the tool-specific indices created during processing (Default: within the directory structure created by default in the launch directory)
217
+
* `--derived_store_path` - specifies the directory to store the tool-specific indices created during processing (Default: within the directory structure created by default in the launch directory)
222
218
223
-
* `--runsheetPath` - specifies the path to a local runsheet (Default: a runsheet is automatically generated using the metadata on the GeneLab Repository for the OSD dataset being processed)
219
+
* `--runsheet_path` - specifies the path to a local runsheet (Default: a runsheet is automatically generated using the metadata on the GeneLab Repository for the OSD dataset being processed)
224
220
> This is required when prcessing a non-OSD dataset as indicated in [Approach 3 above](#4c-approach-3-run-the-workflow-on-a-non-glds-dataset-using-a-user-created-runsheet)
225
221
226
222
<br>
@@ -272,8 +268,9 @@ Standard Nextflow resource usage logs are also produced as follows:
272
268
**Nextflow Resource Usage Logs**
273
269
274
270
- Output:
275
-
- Resource_Usage/execution_report_{timestamp}.html (an html report that includes metrics about the workflow execution including computational resources and exact workflow process commands)
276
-
- Resource_Usage/execution_timeline_{timestamp}.html (an html timeline forall processes executedin the workflow)
277
-
- Resource_Usage/execution_trace_{timestamp}.txt (an execution tracing file that contains information about each process executed in the workflow, including: submission time, start time, completion time, cpu and memory used, machine-readable output)
271
+
- nextflow_logs/execution_report_{timestamp}.html (an html report that includes metrics about the workflow execution including computational resources and exact workflow process commands)
272
+
- nextflow_logs/execution_timeline_{timestamp}.html (an html timeline forall processes executedin the workflow)
273
+
- nextflow_logs/execution_trace_{timestamp}.txt (an execution tracing file that contains information about each process executed in the workflow, including: submission time, start time, completion time, cpu and memory used, machine-readable output)
274
+
- nextflow_info/pipeline_dag_{timestamp}.html (a visualization of the workflow process DAG)
Copy file name to clipboardExpand all lines: RNAseq/Workflow_Documentation/NF_RCP/examples/runsheet/README.md
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -18,7 +18,7 @@
18
18
| Sample Name | string | Sample Name, added as a prefix to sample-specific processed data output files. Should not include spaces or weird characters. | Mmus_BAL-TAL_LRTN_BSL_Rep1_B7 |
19
19
| has_ERCC | bool | Set to True if ERCC spike-ins are included in the samples. This ensures ERCC normalized DGE is performed in addition to standard DGE. | True |
20
20
| paired_end | bool | Set to True if the samples were sequenced as paired-end. If set to False, samples are assumed to be single-end. | False |
21
-
| organism | string | Species name used to map to the appropriate gene annotations file. Supported species can be found in the `species` column of the [GL-DPPD-7110_annotations.csv](../../../../../GeneLab_Reference_Annotations/Pipeline_GL-DPPD-7110_Versions/GL-DPPD-7110/GL-DPPD-7110_annotations.csv) file. | Mus musculus |
21
+
| organism | string | Species name used to map to the appropriate gene annotations file. Supported species can be found in the `species` column of the [GL-DPPD-7110-A_annotations.csv](../../../../../GeneLab_Reference_Annotations/Pipeline_GL-DPPD-7110_Versions/GL-DPPD-7110-A/GL-DPPD-7110-A_annotations.csv) file. | Mus musculus |
22
22
| read1_path | string (url or local path) | Location of the raw reads file. For paired-end data, this specifies the forward reads fastq.gz file. | /my/data/sample_1.fastq.gz |
23
23
| read2_path | string (url or local path) | Location of the raw reads file. For paired-end data, this specifies the reverse reads fastq.gz file. For single-end data, this column should be omitted. | /my/data/sample_2.fastq.gz |
24
24
| Factor Value[<name, e.g. Spaceflight>]| string | A set of one or more columns specifying the experimental group the sample belongs to. In the simplest form, a column named 'Factor Value[group]' is sufficient. | Space Flight |
0 commit comments