Barski-lab
diff --git a/‎README.md
Lines changed: 36 additions & 2 deletions b/‎README.md
Lines changed: 36 additions & 2 deletions
diff --git a/‎docs/images/tutorial/figure_7.jpg
-13.3 KB b/‎docs/images/tutorial/figure_7.jpg
-13.3 KB
diff --git a/‎docs/images/tutorial/figure_8.jpg
-39.5 KB b/‎docs/images/tutorial/figure_8.jpg
-39.5 KB
diff --git a/‎docs/images/tutorial/figure_9.jpg
6.98 KB b/‎docs/images/tutorial/figure_9.jpg
6.98 KB
diff --git a/‎docs/index.md
Lines changed: 6 additions & 3 deletions b/‎docs/index.md
Lines changed: 6 additions & 3 deletions
@@ -2,7 +2,7 @@
 [![Python 3.8](https://img.shields.io/badge/python-3.8-green.svg)](https://www.python.org/downloads/release/python-38/)
 # scRNA-Seq-Analysis
 
-**For detailed tutorial on how to use this set of workflows refer to the [Documenation](https://barski-lab.github.io/scRNA-Seq-Analysis/) page.**
+**For detailed documentation on how to use this set of workflows in SciDAP refer to the [Tutorials](https://barski-lab.github.io/scRNA-Seq-Analysis/) page.**
 
 This repository contains CWL pipelines for scRNA-Seq data analysis. Each of the used command line tools was wrapped into CWL format and combined into the workflows.
 
@@ -15,4 +15,38 @@ This repository contains CWL pipelines for scRNA-Seq data analysis. Each of the
 - [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) 0.11.5
 - [UCSC Cell Browser](https://github.com/maximilianh/cellBrowser) 1.0.1
 
-The workflows and tools are compatible with any CWL runner (see [CWL official page](https://www.commonwl.org/#Implementations) for a list of available runners).
+**Running from command line**
+
+All CWL files from this repository are compatible with any workflow management system or runner that implements CWL v1.0 standard (see the list [here](https://www.commonwl.org/#Implementations)). As an example, we will use [cwltool](https://github.com/common-workflow-language/cwltool) (the reference implementation) to show how to get the list of input parameters for any CWL file in order to run it from the command line. Additionally, we will show how to generate a template job definition file to be used as an alternative way of setting workflow input parameters. Please note, for a better portability and reprocibility all the tools used in our scRNA-Seq workflows are wrapped into Docker containers, thus a properly configured [Docker](https://www.docker.com/) installation is recommended.
+
+1. To get a list of all input parameters for the CWL workflow file run the following command.
+
+    ```
+    cwltool cellranger-mkref.cwl --help
+
+    Cell Ranger Build Reference Indices
+    Builds reference genome indices for Cell Ranger Gene Expression and Cell Ranger Multiome ATAC + Gene
+    Expression experiments.
+
+    positional arguments:
+    job_order             Job input json file
+
+    optional arguments:
+    -h, --help            show this help message and exit
+    --annotation_gtf_file ANNOTATION_GTF_FILE
+                            Reference genome GTF annotation file that includes refGene and mitochondrial DNA annotations
+    --genome_fasta_file GENOME_FASTA_FILE
+                            Reference genome FASTA file that includes all chromosomes
+    --memory_limit MEMORY_LIMIT
+                            Maximum memory used (GB). The same will be applied to virtual memory
+    --threads THREADS     Number of threads for those steps that support multithreading
+    ```
+2. To create a template for the job definition file run the following command.
+
+    ```
+    cwltool --make-template seurat-cluster.cwl > seurat-cluster-job.yml
+    ```
+    Open `seurat-cluster-job.yml` in a text editor, update input parameters, and run as follows.
+    ```
+    cwltool seurat-cluster.cwl seurat-cluster-job.yml
+    ```
@@ -127,6 +127,9 @@ The joint analysis of multiple scRNA-Seq datasets with [Seurat](https://satijala
 | Pattern to identify mitochondrial genes | ^Mt- | ^mt-<sup>2</sup> |
 | Number of highly variable genes to detect (used for dataset integration and dimensional reduction) | 3000 | 3000 |
 | Number of principal components to use in UMAP projection and clustering (from 1 to 50) | 10 | 20<sup>3</sup> |
+| The effective scale of embedded points on UMAP. In combination with the parameter below determines how clustered/clumped the embedded points are. | 1 | 1 |
+| Controls how tightly the embedding is allowed compress points together on UMAP. Larger values ensure embedded points are more evenly distributed, while smaller values allow the algorithm to optimise more accurately with regard to local structure. Sensible values are in the range 0.001 to 0.5. | 0.3 | 0.3 |
+| Determines the number of neighboring points used in UMAP. Larger values will result in more global structure being preserved at the loss of detailed local structure. In general this parameter should often be in the range 5 to 50. | 30 | 30 |
 | Regress cell cycle as a confounding source of variation | False | False |
 | Regress mitochondrial gene expression as a confounding source of variation | False | False |
 | Clustering resolution | 0.1 | 0.5<sup>4</sup> |
@@ -155,9 +158,9 @@ The joint analysis of multiple scRNA-Seq datasets with [Seurat](https://satijala
 |---|---|
 | KPPC 1 SRR12450154 | KPPC |
 | KPPC 2 SRR12450155 | KPPC |
+| KPPCN 2 SRR12450158 | KPPCN |
 | KPPC 3 SRR12450156 | KPPC |
 | KPPCN 1 SRR12450157 | KPPCN |
-| KPPCN 2 SRR12450158 | KPPCN |
 
 ## **Step 6.** Explore clustering results
 
@@ -171,7 +174,7 @@ Cell Ranger Count Gene Expression pipeline uses advanced cell-calling algorithm
 ![](./images/tutorial/figure_7.jpg)
 ***Figure 7. QC metrics for not filtered merged datasets***
 
-*Genes per cell density distribution plot (C) is split into KPPC and KPPCN groups. Zoomed in section of the density plot (D) displays all 5 datasets within the selected boundaries. Cell rank plot (E) displays cells sorted by gene per cell counts within each dataset. The lower and upper limits for genes per cell values are shown as red and green lines correspondingly. On the genes per cell over UMIs per cell correlation plot (F) a vertical red line indicates the minimum threshold for UMIs per cell values. All the cells with the percentage of transcripts mapped to mitochondrial genes below 5% are marked as blue.*
+*Genes per cell density distribution plot (C) is split into KPPC and KPPCN groups. Zoomed in section of the density plot (D) displays all 5 datasets within the selected boundaries. Cell rank plot (E) displays cells sorted by gene per cell counts within each dataset. The lower and upper limits for genes per cell values are shown for each dataset separately. On the genes per cell over UMIs per cell correlation plot (F) the vertical lines indicate the minimum thresholds for UMIs per cell values. All the cells with the percentage of transcripts mapped to the mitochondrial genes below 5% are marked as blue.*
 
 - A combined effect of filtering by UMI counts, gene counts, and by the percentage of mitochondrial reads is shown on the genes per cell over UMIs per cell correlation plot (Figure 8A). The plot displays the remaining cells after all QC filters have been applied.
 - The Elbow plot (Figure 8B) is used to evaluate the dimensionality of the filtered integrated datasets by selecting only those principal components that capture the majority of the data variation. Typically, it is defined by the principal component after which the plot starts to plateau.
@@ -198,7 +201,7 @@ Cell Ranger Count Gene Expression pipeline uses advanced cell-calling algorithm
 
 *Depending on the option selected on the Annotation tab, UCSC Cell Browser highlights identified clusters (A), groups datasets by specified condition (B), colors cells based on the percentage of mitochondrial genes expressed (C), and generates a barcodes list for a selected group of cells (D).*
 
-- On the **Putative gene markers** tab (Figure 11A) an interactive table includes gene markers for each cluster. The column names correspond to the output of [FindAllMarkers](https://www.rdocumentation.org/packages/Seurat/versions/4.0.3/topics/FindAllMarkers) function Seurat 4.0.1 R package. On the **Files** tab (Figure 11B) the list of all generated files is available for download. Among these files the **seurat_clst_data_rds.rds** (Figure 11C) includes Seurat clustering data in a format compatible with RStudio.
+- On the **Putative gene markers** tab (Figure 11A) an interactive table includes gene markers for each cluster. The column names correspond to the output of [FindAllMarkers](https://www.rdocumentation.org/packages/Seurat/versions/4.0.3/topics/FindAllMarkers) function Seurat 4.0.3 R package. On the **Files** tab (Figure 11B) the list of all generated files is available for download. Among these files the **seurat_clst_data_rds.rds** (Figure 11C) includes Seurat clustering data in a format compatible with RStudio.
 
 ![](./images/tutorial/figure_11.jpg)
 ***Figure 11. Gene markers identification and direct download of workflow execution results***