You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# GL_RefAnnotTable Workflow Information and Usage Instructions
1
+
# GL_RefAnnotTable-A Workflow Information and Usage Instructions<!-- omit in toc -->
2
2
3
-
## General workflow info
4
-
The current GeneLab Reference Annotation Table (GL_RefAnnotTable-A) pipeline is implemented as an R workflow that can be run from a command line interface (CLI) using bash. The workflow can be used even if you are unfamiliar with R, but if you want to learn more about R, visit the [R-project about page here](https://www.r-project.org/about.html). Additionally, an introduction to R along with installation help and information about using R for bioinformatics can be found [here at Happy Belly Bioinformatics](https://astrobiomike.github.io/R/basics).
3
+
## General Workflow Info <!-- omit in toc -->
5
4
6
-
## Utilizing the workflow
5
+
### Implementation Tools <!-- omit in toc -->
6
+
7
+
The current GeneLab Reference Annotation Table (GL_RefAnnotTable-A) pipeline is implemented as an R workflow and utilizes [Singularity](https://docs.sylabs.io/guides/3.10/user-guide/introduction.html) to run all tools in a containerized environment. This workflow is run using the command line interface (CLI) of any unix-based system.
7
8
8
-
1.[Install R and R packages](#1-install-r-and-r-packages)
9
-
2.[Download the workflow files](#2-download-the-workflow-files)
10
-
3.[Setup Execution Permission for Workflow Scripts](#3-setup-execution-permission-for-workflow-scripts)
11
-
4.[Run the workflow](#4-run-the-workflow)
12
-
5.[Run the annotations database creation function as a stand-alone script](#5-run-the-annotations-database-creation-function-as-a-stand-alone-script)
13
-
6.[Run the Workflow Using Docker or Singularity](#6-run-the-workflow-using-docker-or-singularity)
14
9
<br>
15
10
16
-
### 1. Install R and R packages
11
+
---
12
+
## Utilizing the Workflow
17
13
18
-
We recommend installing R via the [Comprehensive R Archive Network (CRAN)](https://cran.r-project.org/) as follows:
14
+
1.[Install Singularity](#1-install-singularity)
15
+
2.[Download the Workflow Files](#2-download-the-workflow-files)
5.[Run the annotations database creation function as a stand-alone script](#5-run-the-annotations-database-creation-function-as-a-stand-alone-script)
19
19
20
-
1. Select the [CRAN Mirror](https://cran.r-project.org/mirrors.html) closest to your location.
21
-
2. Click the link under the "Download and Install R" section that's consistent with your machine.
22
-
3. Click on the R-4.4.0 package consistent with your machine to download.
23
-
4. Double click on the R-4.4.0.pkg downloaded in step 3 and follow the installation instructions.
20
+
<br>
24
21
25
-
Once R is installed, open a CLI terminal and run the following command to activate R:
22
+
---
26
23
27
-
```bash
28
-
R
29
-
```
30
-
`
31
-
Within an active R environment, run the following commands to install the required R packages:
24
+
### 1. Install Singularity
32
25
33
-
```R
34
-
install.packages("tidyverse")
26
+
Singularity is a container platform that allows usage of containerized software. This enables the GL_RefAnnotTable-A workflow to retrieve and use all software required for processing without the need to install the software directly on the user's system.
35
27
36
-
install.packages("BiocManager")
28
+
We recommend installing Singularity on a system wide level as per the associated [documentation](https://docs.sylabs.io/guides/3.10/admin-guide/admin_quickstart.html).
37
29
38
-
BiocManager::install("STRINGdb")
39
-
BiocManager::install("PANTHER.db")
40
-
BiocManager::install("rtracklayer")
41
-
BiocManager::install("AnnotationForge")
42
-
BiocManager::install("biomaRt")
43
-
BiocManager::install("GO.db")
44
-
```
30
+
> Note: Singularity is also available through [Anaconda](https://anaconda.org/conda-forge/singularity).
45
31
46
32
<br>
47
33
34
+
---
35
+
48
36
### 2. Download the Workflow Files
49
37
50
-
All files required for utilizing the GL_RefAnnotTable-A workflow for generating reference annotation tables are in the [workflow_code](workflow_code) directory. To get a copy of latest GL_RefAnnotTable version on to your system, run the following command:
38
+
All files required for utilizing the GL_RefAnnotTable-A workflow for generating reference annotation tables are in the [workflow_code](workflow_code) directory. To get a copy of latest GL_RefAnnotTable-A version on to your system, run the following commands:
### 3. Setup Execution Permission for Workflow Scripts
47
+
---
59
48
60
-
Once you've downloaded the GL_RefAnnotTable-A workflow directory as a zip file, unzip the workflow then `cd` into the GL_RefAnnotTable-A_1.1.0 directory on the CLI. Next, run the following command to set the execution permissions for the R script:
49
+
### 3. Fetch Singularity Images
50
+
51
+
Although Singularity can fetch images from a url, doing so may cause issues as detailed [here](https://github.com/nextflow-io/nextflow/issues/1210).
52
+
53
+
To avoid this issue, run the following command to fetch the Singularity images prior to running the GL_RefAnnotTable-A workflow:
54
+
> Note: This command should be run in the location containing the `GL_RefAnnotTable-A_1.1.0` directory that was downloaded in [step 2](#2-download-the-workflow-files) above. Depending on your network speed, fetching the images will take ~20 minutes.
Once complete, a `singularity` folder containing the Singularity images will be created. Run the following command to export this folder as a Singularity configuration environment variable:
61
+
62
+
```bash
63
+
export SINGULARITY_CACHEDIR=$(pwd)/singularity
66
64
```
67
65
68
66
<br>
69
67
68
+
---
69
+
70
70
### 4. Run the Workflow
71
71
72
-
While in the GL_RefAnnotTable workflow directory, you are now able to run the workflow. Below is an example of how to run the workflow to build an annotation table for Mus musculus (mouse):
72
+
While in the location containing the `GL_RefAnnotTable-A_1.1.0`directory that was downloaded in [step 2](#2-download-the-workflow-files), you are now able to run the workflow. Below is an example of how to run the workflow to build an annotation table for Mus musculus (mouse):
- No input files are required. Specify the target organism using a positional command line argument. `Mus musculus` is used in the example above. To see a list of all available organisms, run `Rscript GL-DPPD-7110-A_build-genome-annots-tab.R` without positional arguments. The correct argument for each organism can also be found in the 'species' column of the [GL-DPPD-7110-A_annotations.csv](../../Pipeline_GL-DPPD-7110_Versions/GL-DPPD-7110-A/GL-DPPD-7110-A_annotations.csv)
82
+
- No input files are required. Specify the target organism using a positional command line argument. `Mus musculus` is used in the example above. To see a list of all available organisms, run the command without positional arguments. The correct argument for each organism can also be found in the 'species' column of the [GL-DPPD-7110-A_annotations.csv](../../Pipeline_GL-DPPD-7110_Versions/GL-DPPD-7110-A/GL-DPPD-7110-A_annotations.csv)
81
83
82
84
- Optional: a reference table CSV can be supplied as a second positional argument instead of using the default [GL-DPPD-7110-A_annotations.csv](../../Pipeline_GL-DPPD-7110_Versions/GL-DPPD-7110-A/GL-DPPD-7110-A_annotations.csv)
-*-GL-annotations.tsv (Tab delineated table of gene annotations)
87
89
-*-GL-build-info.txt (Text file containing information used to create the annotation table, including tool and tool versions and date of creation)
88
90
91
+
<br>
92
+
93
+
---
94
+
89
95
### 5. Run the annotations database creation function as a stand-alone script
90
96
91
97
When the workflow is run, if the reference table does not specify an annotations database for the target_organism in the `annotations` column, the `install_annotations` function, defined in the `install-org-db.R` script, will be executed. This script will locally create and install an annotations database R package using AnnotationForge. This function can also be run as a stand-alone script from the command line:
- org.*.eg.db/ (species-specific annotation database, as a local R package)
106
114
107
-
### 6. Run the Workflow Using Docker or Singularity
108
-
109
-
Rather than running the workflow in your local environment, you can use a Docker or Singularity container. This method ensures that all dependencies are correctly installed.
- No input files are required. Specify the target organism using a positional command line argument. `Mus musculus` is used in the example above. To see a list of all available organisms, run `Rscript GL-DPPD-7110-A_build-genome-annots-tab.R` without positional arguments. The correct argument for each organism can also be found in the 'species' column of the [GL-DPPD-7110-A_annotations.csv](../../Pipeline_GL-DPPD-7110_Versions/GL-DPPD-7110-A/GL-DPPD-7110-A_annotations.csv)
149
-
150
-
- Optional: a reference table CSV can be supplied as a second positional argument instead of using the default [GL-DPPD-7110-A_annotations.csv](../../Pipeline_GL-DPPD-7110_Versions/GL-DPPD-7110-A/GL-DPPD-7110-A_annotations.csv)
151
-
152
-
**Output data:**
153
-
154
-
-*-GL-annotations.tsv (Tab delineated table of gene annotations)
155
-
-*-GL-build-info.txt (Text file containing information used to create the annotation table, including tool and tool versions and date of creation)
0 commit comments