You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-[5. Run the Annotations Database Creation Function as a Stand-Alone Script](#5-run-the-annotations-database-creation-function-as-a-stand-alone-script)
12
+
-[Approach 2: Using a Local R Environment](#approach-2-using-a-local-r-environment)
13
+
-[1. Install R and Required R Packages](#1-install-r-and-required-r-packages)
14
+
-[2. Download the Workflow Files](#2-download-the-workflow-files-1)
15
+
-[3. Set Execution Permissions for Workflow Scripts](#3-set-execution-permissions-for-workflow-scripts)
16
+
-[4. Run the Workflow](#4-run-the-workflow-1)
17
+
-[5. Run the Annotations Database Creation Function as a Stand-Alone Script](#5-run-the-annotations-database-creation-function-as-a-stand-alone-script-1)
4
18
5
-
### Implementation Tools <!-- omit in toc -->
19
+
<br>
20
+
21
+
---
6
22
7
-
The current GeneLab Reference Annotation Table (GL_RefAnnotTable-A) pipeline is implemented as an R workflow and utilizes [Singularity](https://docs.sylabs.io/guides/3.10/user-guide/introduction.html) to run all tools in a containerized environment. This workflow is run using the command line interface (CLI) of any unix-based system.
23
+
## General Workflow Info
24
+
25
+
The current GeneLab Reference Annotation Table (GL_RefAnnotTable-A) pipeline is implemented as an R workflow that can be run from a command line interface (CLI) using bash. The workflow can be executed using either a Apptainer (formerly Singularity) container or a local R environment. The workflow can be used even if you are unfamiliar with R, but if you want to learn more about R, visit the [R-project about page here](https://www.r-project.org/about.html). Additionally, an introduction to R along with installation help and information about using R for bioinformatics can be found [here at Happy Belly Bioinformatics](https://astrobiomike.github.io/R/basics).
8
26
9
27
<br>
10
28
11
29
---
30
+
12
31
## Utilizing the Workflow
13
32
14
-
1.[Install Singularity](#1-install-singularity)
15
-
2.[Download the Workflow Files](#2-download-the-workflow-files)
2.**[Using a local R environment](#approach-2-using-a-local-r-environment)**.
38
+
39
+
Please follow the instructions for the approach that best matches your setup and preferences. Each method is explained in the sections below.
19
40
20
41
<br>
21
42
22
43
---
23
44
24
-
### 1. Install Singularity
45
+
### Approach 1: Using Apptainer
46
+
47
+
This approach allows you to run the workflow within a containerized environment, ensuring consistency and reproducibility.
48
+
49
+
<br>
50
+
51
+
---
25
52
26
-
Singularity is a container platform that allows usage of containerized software. This enables the GL_RefAnnotTable-A workflow to retrieve and use all software required for processing without the need to install the software directly on the user's system.
53
+
#### 1. Install Apptainer
27
54
28
-
We recommend installing Singularity on a system wide level as per the associated [documentation](https://docs.sylabs.io/guides/3.10/admin-guide/admin_quickstart.html).
55
+
Apptainer can be installed either through [Anaconda](https://anaconda.org/conda-forge/singularity) or as documented on the [Apptainer documentation page](https://apptainer.org/docs/admin/main/installation.html).
29
56
30
-
> Note: Singularity is also available through [Anaconda](https://anaconda.org/conda-forge/singularity).
57
+
> **Note**: If you prefer to use Anaconda, we recommend installing Miniconda for your system, as instructed by [Happy Belly Bioinformatics](https://astrobiomike.github.io/unix/conda-intro#getting-and-installing-conda).
58
+
>
59
+
> Once conda is installed on your system, you can install Apptainer by running:
60
+
>
61
+
> ```bash
62
+
> conda install -c conda-forge apptainer
63
+
>```
31
64
32
65
<br>
33
66
34
67
---
35
68
36
-
### 2. Download the Workflow Files
69
+
#### 2. Download the Workflow Files
37
70
38
-
All files required for utilizing the GL_RefAnnotTable-A workflow for generating reference annotation tables are in the [workflow_code](workflow_code) directory. To get a copy of latest GL_RefAnnotTable-A version on to your system, run the following commands:
71
+
Download the latest version of the GL_RefAnnotTable-A workflow:
> Note: This command should be run in the directory containing the GL_RefAnnotTable-A_1.1.0 folder downloaded in [step 2](#2-download-the-workflow-files). Depending on your network speed, this may take approximately 20 minutes.
91
+
92
+
Once complete, an apptainer folder containing the Apptainer images will be created. Export this folder as an Apptainer configuration environment variable:
93
+
94
+
```bash
95
+
export APPTAINER_CACHEDIR=$(pwd)/apptainer
96
+
```
97
+
98
+
<br>
99
+
100
+
---
101
+
102
+
#### 4. Run the Workflow
103
+
104
+
While in the `GL_RefAnnotTable-A_1.1.0` directory, you can now run the workflow. Below is an example for generating an annotation table for Mus musculus (mouse):
- Specify the target organism using a positional command line argument. `Mus musculus` is used in the example above.
116
+
- To see a list of all available organisms, run `Rscript GL-DPPD-7110-A_build-genome-annots-tab.R` without positional arguments. The correct argument for each organism can also be found in the 'species' column of the [GL-DPPD-7110-A_annotations.csv](../../Pipeline_GL-DPPD-7110_Versions/GL-DPPD-7110-A/GL-DPPD-7110-A_annotations.csv)
117
+
- Optional: a reference table CSV can be supplied as a second positional argument instead of using the default [GL-DPPD-7110-A_annotations.csv](../../Pipeline_GL-DPPD-7110_Versions/GL-DPPD-7110-A/GL-DPPD-7110-A_annotations.csv)
118
+
119
+
**Output data:**
120
+
121
+
-*-GL-annotations.tsv (Tab delineated table of gene annotations)
122
+
-*-GL-build-info.txt (Text file containing information used to create the annotation table, including tool and tool versions and date of creation)
123
+
124
+
<br>
125
+
126
+
---
127
+
128
+
#### 5. Run the Annotations Database Creation Function as a Stand-Alone Script
129
+
130
+
If the reference table does not specify an annotations database for the target organism in the annotations column, the `install_annotations` function (defined in `install-org-db.R`) will be executed. This function can also be run as a stand-alone script:
- The target organism must be specified as the first positional command line argument. `Bacillus subtilis` is used in the example above. The correct argument for each organism can be found in the 'species' column of [GL-DPPD-7110-A_annotations.csv](https://raw.githubusercontent.com/nasa/GeneLab_Data_Processing/master/GeneLab_Reference_Annotations/Pipeline_GL-DPPD-7110_Versions/GL-DPPD-7110-A/GL-DPPD-7110-A_annotations.csv)
141
+
- Optional: A local reference table can be supplied as a second positional argument. If not provided, the script will download the current version of GL-DPPD-7110-A_annotations.csv from Github by default.
142
+
143
+
**Output data:**
144
+
145
+
- org.*.eg.db/ (Species-specific annotation database, as a local R package)
146
+
45
147
<br>
46
148
47
149
---
48
150
49
-
### 3. Fetch Singularity Images
151
+
### Approach 2: Using a Local R Environment
152
+
153
+
This approach allows you to run the workflow directly in your local R environment without using Apptainer containers.
154
+
155
+
<br>
156
+
157
+
---
158
+
159
+
#### 1. Install R and Required R Packages
160
+
161
+
We recommend installing R via the [Comprehensive R Archive Network (CRAN)](https://cran.r-project.org/):
50
162
51
-
Although Singularity can fetch images from a url, doing so may cause issues as detailed [here](https://github.com/nextflow-io/nextflow/issues/1210).
163
+
1. Select the [CRAN Mirror](https://cran.r-project.org/mirrors.html) closest to your location.
164
+
2. Navigate to the download page for your operating system.
165
+
3. Download and install R (e.g., R-4.4.0).
52
166
53
-
To avoid this issue, run the following command to fetch the Singularity images prior to running the GL_RefAnnotTable-A workflow:
54
-
> Note: This command should be run in the location containing the `GL_RefAnnotTable-A_1.1.0` directory that was downloaded in [step 2](#2-download-the-workflow-files) above. Depending on your network speed, fetching the images will take ~20 minutes.
Once complete, a `singularity` folder containing the Singularity images will be created. Run the following command to export this folder as a Singularity configuration environment variable:
173
+
Within an active R environment, run the following commands to install the required R packages:
174
+
175
+
```R
176
+
install.packages("tidyverse")
177
+
178
+
install.packages("BiocManager")
179
+
180
+
BiocManager::install("STRINGdb")
181
+
BiocManager::install("PANTHER.db")
182
+
BiocManager::install("rtracklayer")
183
+
BiocManager::install("AnnotationForge")
184
+
BiocManager::install("biomaRt")
185
+
BiocManager::install("GO.db")
186
+
```
187
+
188
+
<br>
189
+
190
+
---
191
+
192
+
#### 2. Download the Workflow Files
193
+
194
+
All files required for utilizing the GL_RefAnnotTable-A workflow for generating reference annotation tables are in the [workflow_code](workflow_code) directory. To get a copy of latest GL_RefAnnotTable version on to your system, run the following command:
#### 3. Set Execution Permissions for Workflow Scripts
205
+
206
+
Once you've downloaded the GL_RefAnnotTable-A workflow directory as a zip file, unzip the workflow then `cd` into the GL_RefAnnotTable-A_1.1.0 directory on the CLI. Next, run the following command to set the execution permissions for the R script:
61
207
62
208
```bash
63
-
export SINGULARITY_CACHEDIR=$(pwd)/singularity
209
+
unzip GL_RefAnnotTable-A_1.1.0.zip
210
+
cd GL_RefAnnotTable-A_1.1.0
211
+
chmod -R u+x *R
64
212
```
65
213
66
214
<br>
67
215
68
216
---
69
217
70
-
### 4. Run the Workflow
218
+
####4. Run the Workflow
71
219
72
-
While in the location containing the `GL_RefAnnotTable-A_1.1.0` directory that was downloaded in [step 2](#2-download-the-workflow-files), you are now able to run the workflow. Below is an example of how to run the workflow to build an annotation table for Mus musculus (mouse):
220
+
While in the GL_RefAnnotTableworkflow directory, you are now able to run the workflow. Below is an example of how to run the workflow to build an annotation table for Mus musculus (mouse):
- No input files are required. Specify the target organism using a positional command line argument. `Mus musculus` is used in the example above. To see a list of all available organisms, run the command without positional arguments. The correct argument for each organism can also be found in the 'species' column of the[GL-DPPD-7110-A_annotations.csv](../../Pipeline_GL-DPPD-7110_Versions/GL-DPPD-7110-A/GL-DPPD-7110-A_annotations.csv)
228
+
- No input files are required. Specify the target organism using a positional command line argument. `Mus musculus` is used in the example above. To see a list of all available organisms, run `Rscript GL-DPPD-7110-A_build-genome-annots-tab.R` without positional arguments. The correct argument for each organism can also be found in the 'species' column of [GL-DPPD-7110-A_annotations.csv](../../Pipeline_GL-DPPD-7110_Versions/GL-DPPD-7110-A/GL-DPPD-7110-A_annotations.csv)
83
229
84
230
- Optional: a reference table CSV can be supplied as a second positional argument instead of using the default [GL-DPPD-7110-A_annotations.csv](../../Pipeline_GL-DPPD-7110_Versions/GL-DPPD-7110-A/GL-DPPD-7110-A_annotations.csv)
### 5. Run the annotations database creation function as a stand-alone script
241
+
####5. Run the Annotations Database Creation Function as a Stand-Alone Script
96
242
97
-
When the workflow is run, if the reference table does not specify an annotations database for the target_organism in the `annotations` column, the `install_annotations` function, defined in the `install-org-db.R` script, will be executed. This script will locally create and install an annotations database R package using AnnotationForge. This function can also be run as a stand-alone script from the command line:
243
+
If the reference table does not specify an annotations database for the target organism in the 'annotations' column, the `install_annotations` function (defined in `install-org-db.R`) will be executed. This function can also be run as a stand-alone script:
- The target organism must be specified as the first positional command line argument, `Bacillus subtilis` is used in the example above. The correct argument for each organism can be found in the 'species' column of the [GL-DPPD-7110-A_annotations.csv](../../Pipeline_GL-DPPD-7110_Versions/GL-DPPD-7110-A/GL-DPPD-7110-A_annotations.csv)
108
-
109
-
- The path to a local reference table must also be supplied as the second positional argument
251
+
- The target organism must be specified as the first positional command line argument. `Bacillus subtilis` is used in the example above. The correct argument for each organism can be found in the 'species' column of [GL-DPPD-7110-A_annotations.csv](https://raw.githubusercontent.com/nasa/GeneLab_Data_Processing/master/GeneLab_Reference_Annotations/Pipeline_GL-DPPD-7110_Versions/GL-DPPD-7110-A/GL-DPPD-7110-A_annotations.csv)
252
+
- Optional: A local reference table can be supplied as a second positional argument. If not provided, the script will download the current version of GL-DPPD-7110-A_annotations.csv from Github by default.
110
253
111
254
**Output data:**
112
255
113
256
- org.*.eg.db/ (species-specific annotation database, as a local R package)
Copy file name to clipboardExpand all lines: GeneLab_Reference_Annotations/Workflow_Documentation/GL_RefAnnotTable-A/workflow_code/bin/prepull_apptainer.sh
Copy file name to clipboardExpand all lines: GeneLab_Reference_Annotations/Workflow_Documentation/GL_RefAnnotTable-A/workflow_code/install-org-db.R
+13-5Lines changed: 13 additions & 5 deletions
Original file line number
Diff line number
Diff line change
@@ -3,12 +3,20 @@
3
3
# Function: Get annotations db from ref table. If no annotations db is defined, create the package name from genus, species, (and strain for microbes),
4
4
# Try to Bioconductor install annotations db. If fail then build the package using AnnotationForge, install it into the current directory.
0 commit comments