You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
**For detailed tutorial on how to use this set of workflows refer to the [Documenation](https://barski-lab.github.io/scRNA-Seq-Analysis/) page.**
5
+
**For detailed documentation on how to use this set of workflows in SciDAP refer to the [Tutorials](https://barski-lab.github.io/scRNA-Seq-Analysis/) page.**
6
6
7
7
This repository contains CWL pipelines for scRNA-Seq data analysis. Each of the used command line tools was wrapped into CWL format and combined into the workflows.
8
8
@@ -15,4 +15,38 @@ This repository contains CWL pipelines for scRNA-Seq data analysis. Each of the
The workflows and tools are compatible with any CWL runner (see [CWL official page](https://www.commonwl.org/#Implementations) for a list of available runners).
18
+
**Running from command line**
19
+
20
+
All CWL files from this repository are compatible with any workflow management system or runner that implements CWL v1.0 standard (see the list [here](https://www.commonwl.org/#Implementations)). As an example, we will use [cwltool](https://github.com/common-workflow-language/cwltool) (the reference implementation) to show how to get the list of input parameters for any CWL file in order to run it from the command line. Additionally, we will show how to generate a template job definition file to be used as an alternative way of setting workflow input parameters. Please note, for a better portability and reprocibility all the tools used in our scRNA-Seq workflows are wrapped into Docker containers, thus a properly configured [Docker](https://www.docker.com/) installation is recommended.
21
+
22
+
1. To get a list of all input parameters for the CWL workflow file run the following command.
23
+
24
+
```
25
+
cwltool cellranger-mkref.cwl --help
26
+
27
+
Cell Ranger Build Reference Indices
28
+
Builds reference genome indices for Cell Ranger Gene Expression and Cell Ranger Multiome ATAC + Gene
29
+
Expression experiments.
30
+
31
+
positional arguments:
32
+
job_order Job input json file
33
+
34
+
optional arguments:
35
+
-h, --help show this help message and exit
36
+
--annotation_gtf_file ANNOTATION_GTF_FILE
37
+
Reference genome GTF annotation file that includes refGene and mitochondrial DNA annotations
38
+
--genome_fasta_file GENOME_FASTA_FILE
39
+
Reference genome FASTA file that includes all chromosomes
40
+
--memory_limit MEMORY_LIMIT
41
+
Maximum memory used (GB). The same will be applied to virtual memory
42
+
--threads THREADS Number of threads for those steps that support multithreading
43
+
```
44
+
2. To create a template for the job definition file run the following command.
| Number of highly variable genes to detect (used for dataset integration and dimensional reduction) | 3000 | 3000 |
129
129
| Number of principal components to use in UMAP projection and clustering (from 1 to 50) | 10 | 20<sup>3</sup> |
130
+
| The effective scale of embedded points on UMAP. In combination with the parameter below determines how clustered/clumped the embedded points are. | 1 | 1 |
131
+
| Controls how tightly the embedding is allowed compress points together on UMAP. Larger values ensure embedded points are more evenly distributed, while smaller values allow the algorithm to optimise more accurately with regard to local structure. Sensible values are in the range 0.001 to 0.5. | 0.3 | 0.3 |
132
+
| Determines the number of neighboring points used in UMAP. Larger values will result in more global structure being preserved at the loss of detailed local structure. In general this parameter should often be in the range 5 to 50. | 30 | 30 |
130
133
| Regress cell cycle as a confounding source of variation | False | False |
131
134
| Regress mitochondrial gene expression as a confounding source of variation | False | False |
132
135
| Clustering resolution | 0.1 | 0.5<sup>4</sup> |
@@ -155,9 +158,9 @@ The joint analysis of multiple scRNA-Seq datasets with [Seurat](https://satijala
***Figure 7. QC metrics for not filtered merged datasets***
173
176
174
-
*Genes per cell density distribution plot (C) is split into KPPC and KPPCN groups. Zoomed in section of the density plot (D) displays all 5 datasets within the selected boundaries. Cell rank plot (E) displays cells sorted by gene per cell counts within each dataset. The lower and upper limits for genes per cell values are shown as red and green lines correspondingly. On the genes per cell over UMIs per cell correlation plot (F) a vertical red line indicates the minimum threshold for UMIs per cell values. All the cells with the percentage of transcripts mapped to mitochondrial genes below 5% are marked as blue.*
177
+
*Genes per cell density distribution plot (C) is split into KPPC and KPPCN groups. Zoomed in section of the density plot (D) displays all 5 datasets within the selected boundaries. Cell rank plot (E) displays cells sorted by gene per cell counts within each dataset. The lower and upper limits for genes per cell values are shown for each dataset separately. On the genes per cell over UMIs per cell correlation plot (F) the vertical lines indicate the minimum thresholds for UMIs per cell values. All the cells with the percentage of transcripts mapped to the mitochondrial genes below 5% are marked as blue.*
175
178
176
179
- A combined effect of filtering by UMI counts, gene counts, and by the percentage of mitochondrial reads is shown on the genes per cell over UMIs per cell correlation plot (Figure 8A). The plot displays the remaining cells after all QC filters have been applied.
177
180
- The Elbow plot (Figure 8B) is used to evaluate the dimensionality of the filtered integrated datasets by selecting only those principal components that capture the majority of the data variation. Typically, it is defined by the principal component after which the plot starts to plateau.
*Depending on the option selected on the Annotation tab, UCSC Cell Browser highlights identified clusters (A), groups datasets by specified condition (B), colors cells based on the percentage of mitochondrial genes expressed (C), and generates a barcodes list for a selected group of cells (D).*
200
203
201
-
- On the **Putative gene markers** tab (Figure 11A) an interactive table includes gene markers for each cluster. The column names correspond to the output of [FindAllMarkers](https://www.rdocumentation.org/packages/Seurat/versions/4.0.3/topics/FindAllMarkers) function Seurat 4.0.1 R package. On the **Files** tab (Figure 11B) the list of all generated files is available for download. Among these files the **seurat_clst_data_rds.rds** (Figure 11C) includes Seurat clustering data in a format compatible with RStudio.
204
+
- On the **Putative gene markers** tab (Figure 11A) an interactive table includes gene markers for each cluster. The column names correspond to the output of [FindAllMarkers](https://www.rdocumentation.org/packages/Seurat/versions/4.0.3/topics/FindAllMarkers) function Seurat 4.0.3 R package. On the **Files** tab (Figure 11B) the list of all generated files is available for download. Among these files the **seurat_clst_data_rds.rds** (Figure 11C) includes Seurat clustering data in a format compatible with RStudio.
202
205
203
206

204
207
***Figure 11. Gene markers identification and direct download of workflow execution results***
0 commit comments