Skip to content

Commit 4f181bf

Browse files
committed
Refactor instructions for singularity use
1 parent 40e3652 commit 4f181bf

File tree

2 files changed

+84
-92
lines changed

2 files changed

+84
-92
lines changed
Lines changed: 52 additions & 92 deletions
Original file line numberDiff line numberDiff line change
@@ -1,83 +1,85 @@
1-
# GL_RefAnnotTable Workflow Information and Usage Instructions
1+
# GL_RefAnnotTable-A Workflow Information and Usage Instructions <!-- omit in toc -->
22

3-
## General workflow info
4-
The current GeneLab Reference Annotation Table (GL_RefAnnotTable-A) pipeline is implemented as an R workflow that can be run from a command line interface (CLI) using bash. The workflow can be used even if you are unfamiliar with R, but if you want to learn more about R, visit the [R-project about page here](https://www.r-project.org/about.html). Additionally, an introduction to R along with installation help and information about using R for bioinformatics can be found [here at Happy Belly Bioinformatics](https://astrobiomike.github.io/R/basics).
3+
## General Workflow Info <!-- omit in toc -->
54

6-
## Utilizing the workflow
5+
### Implementation Tools <!-- omit in toc -->
6+
7+
The current GeneLab Reference Annotation Table (GL_RefAnnotTable-A) pipeline is implemented as an R workflow and utilizes [Singularity](https://docs.sylabs.io/guides/3.10/user-guide/introduction.html) to run all tools in a containerized environment. This workflow is run using the command line interface (CLI) of any unix-based system.
78

8-
1. [Install R and R packages](#1-install-r-and-r-packages)
9-
2. [Download the workflow files](#2-download-the-workflow-files)
10-
3. [Setup Execution Permission for Workflow Scripts](#3-setup-execution-permission-for-workflow-scripts)
11-
4. [Run the workflow](#4-run-the-workflow)
12-
5. [Run the annotations database creation function as a stand-alone script](#5-run-the-annotations-database-creation-function-as-a-stand-alone-script)
13-
6. [Run the Workflow Using Docker or Singularity](#6-run-the-workflow-using-docker-or-singularity)
149
<br>
1510

16-
### 1. Install R and R packages
11+
---
12+
## Utilizing the Workflow
1713

18-
We recommend installing R via the [Comprehensive R Archive Network (CRAN)](https://cran.r-project.org/) as follows:
14+
1. [Install Singularity](#1-install-singularity)
15+
2. [Download the Workflow Files](#2-download-the-workflow-files)
16+
3. [Fetch Singularity Images](#3-fetch-singularity-images)
17+
4. [Run the Workflow](#4-run-the-workflow)
18+
5. [Run the annotations database creation function as a stand-alone script](#5-run-the-annotations-database-creation-function-as-a-stand-alone-script)
1919

20-
1. Select the [CRAN Mirror](https://cran.r-project.org/mirrors.html) closest to your location.
21-
2. Click the link under the "Download and Install R" section that's consistent with your machine.
22-
3. Click on the R-4.4.0 package consistent with your machine to download.
23-
4. Double click on the R-4.4.0.pkg downloaded in step 3 and follow the installation instructions.
20+
<br>
2421

25-
Once R is installed, open a CLI terminal and run the following command to activate R:
22+
---
2623

27-
```bash
28-
R
29-
```
30-
`
31-
Within an active R environment, run the following commands to install the required R packages:
24+
### 1. Install Singularity
3225

33-
```R
34-
install.packages("tidyverse")
26+
Singularity is a container platform that allows usage of containerized software. This enables the GL_RefAnnotTable-A workflow to retrieve and use all software required for processing without the need to install the software directly on the user's system.
3527

36-
install.packages("BiocManager")
28+
We recommend installing Singularity on a system wide level as per the associated [documentation](https://docs.sylabs.io/guides/3.10/admin-guide/admin_quickstart.html).
3729

38-
BiocManager::install("STRINGdb")
39-
BiocManager::install("PANTHER.db")
40-
BiocManager::install("rtracklayer")
41-
BiocManager::install("AnnotationForge")
42-
BiocManager::install("biomaRt")
43-
BiocManager::install("GO.db")
44-
```
30+
> Note: Singularity is also available through [Anaconda](https://anaconda.org/conda-forge/singularity).
4531
4632
<br>
4733

34+
---
35+
4836
### 2. Download the Workflow Files
4937

50-
All files required for utilizing the GL_RefAnnotTable-A workflow for generating reference annotation tables are in the [workflow_code](workflow_code) directory. To get a copy of latest GL_RefAnnotTable version on to your system, run the following command:
38+
All files required for utilizing the GL_RefAnnotTable-A workflow for generating reference annotation tables are in the [workflow_code](workflow_code) directory. To get a copy of latest GL_RefAnnotTable-A version on to your system, run the following commands:
5139

5240
```bash
5341
curl -LO https://github.com/nasa/GeneLab_Data_Processing/releases/download/GL_RefAnnotTable-A_1.1.0/GL_RefAnnotTable-A_1.1.0.zip
54-
```
42+
unzip GL_RefAnnotTable-A_1.1.0.zip
43+
```
5544

5645
<br>
5746

58-
### 3. Setup Execution Permission for Workflow Scripts
47+
---
5948

60-
Once you've downloaded the GL_RefAnnotTable-A workflow directory as a zip file, unzip the workflow then `cd` into the GL_RefAnnotTable-A_1.1.0 directory on the CLI. Next, run the following command to set the execution permissions for the R script:
49+
### 3. Fetch Singularity Images
50+
51+
Although Singularity can fetch images from a url, doing so may cause issues as detailed [here](https://github.com/nextflow-io/nextflow/issues/1210).
52+
53+
To avoid this issue, run the following command to fetch the Singularity images prior to running the GL_RefAnnotTable-A workflow:
54+
> Note: This command should be run in the location containing the `GL_RefAnnotTable-A_1.1.0` directory that was downloaded in [step 2](#2-download-the-workflow-files) above. Depending on your network speed, fetching the images will take ~20 minutes.
6155
6256
```bash
63-
unzip GL_RefAnnotTable-A_1.1.0.zip
64-
cd GL_RefAnnotTable-A_1.1.0
65-
chmod -R u+x *R
57+
bash GL_RefAnnotTable-A_1.1.0/bin/prepull_singularity.sh GL_RefAnnotTable-A_1.1.0/config/software/by_docker_image.config
58+
```
59+
60+
Once complete, a `singularity` folder containing the Singularity images will be created. Run the following command to export this folder as a Singularity configuration environment variable:
61+
62+
```bash
63+
export SINGULARITY_CACHEDIR=$(pwd)/singularity
6664
```
6765

6866
<br>
6967

68+
---
69+
7070
### 4. Run the Workflow
7171

72-
While in the GL_RefAnnotTable workflow directory, you are now able to run the workflow. Below is an example of how to run the workflow to build an annotation table for Mus musculus (mouse):
72+
While in the location containing the `GL_RefAnnotTable-A_1.1.0` directory that was downloaded in [step 2](#2-download-the-workflow-files), you are now able to run the workflow. Below is an example of how to run the workflow to build an annotation table for Mus musculus (mouse):
7373

7474
```bash
75-
Rscript GL-DPPD-7110-A_build-genome-annots-tab.R 'Mus musculus'
75+
singularity exec -B $(pwd)/GL_RefAnnotTable-A_1.1.0:/work \
76+
$SINGULARITY_CACHEDIR/gl-refannottable_v1.0.0.sif \
77+
bash -c "cd /work && Rscript GL-DPPD-7110-A_build-genome-annots-tab.R 'Mus musculus'"
7678
```
7779

7880
**Input data:**
7981

80-
- No input files are required. Specify the target organism using a positional command line argument. `Mus musculus` is used in the example above. To see a list of all available organisms, run `Rscript GL-DPPD-7110-A_build-genome-annots-tab.R` without positional arguments. The correct argument for each organism can also be found in the 'species' column of the [GL-DPPD-7110-A_annotations.csv](../../Pipeline_GL-DPPD-7110_Versions/GL-DPPD-7110-A/GL-DPPD-7110-A_annotations.csv)
82+
- No input files are required. Specify the target organism using a positional command line argument. `Mus musculus` is used in the example above. To see a list of all available organisms, run the command without positional arguments. The correct argument for each organism can also be found in the 'species' column of the [GL-DPPD-7110-A_annotations.csv](../../Pipeline_GL-DPPD-7110_Versions/GL-DPPD-7110-A/GL-DPPD-7110-A_annotations.csv)
8183

8284
- Optional: a reference table CSV can be supplied as a second positional argument instead of using the default [GL-DPPD-7110-A_annotations.csv](../../Pipeline_GL-DPPD-7110_Versions/GL-DPPD-7110-A/GL-DPPD-7110-A_annotations.csv)
8385

@@ -86,12 +88,18 @@ Rscript GL-DPPD-7110-A_build-genome-annots-tab.R 'Mus musculus'
8688
- *-GL-annotations.tsv (Tab delineated table of gene annotations)
8789
- *-GL-build-info.txt (Text file containing information used to create the annotation table, including tool and tool versions and date of creation)
8890

91+
<br>
92+
93+
---
94+
8995
### 5. Run the annotations database creation function as a stand-alone script
9096

9197
When the workflow is run, if the reference table does not specify an annotations database for the target_organism in the `annotations` column, the `install_annotations` function, defined in the `install-org-db.R` script, will be executed. This script will locally create and install an annotations database R package using AnnotationForge. This function can also be run as a stand-alone script from the command line:
9298

9399
```bash
94-
Rscript install-org-db.R 'Bacillus subtilis' /path/to/GL-DPPD-7110-A_annotations.csv
100+
singularity exec -B $(pwd)/GL_RefAnnotTable-A_1.1.0:/work \
101+
$SINGULARITY_CACHEDIR/gl-refannottable_v1.0.0.sif \
102+
bash -c "cd /work && Rscript install-org-db.R 'Bacillus subtilis' /path/to/GL-DPPD-7110-A_annotations.csv"
95103
```
96104

97105
**Input data:**
@@ -104,52 +112,4 @@ Rscript install-org-db.R 'Bacillus subtilis' /path/to/GL-DPPD-7110-A_annotations
104112

105113
- org.*.eg.db/ (species-specific annotation database, as a local R package)
106114

107-
### 6. Run the Workflow Using Docker or Singularity
108-
109-
Rather than running the workflow in your local environment, you can use a Docker or Singularity container. This method ensures that all dependencies are correctly installed.
110-
111-
1. **Pull the container image:**
112-
113-
Docker:
114-
```bash
115-
docker pull quay.io/nasa_genelab/gl-refannottable:v1.0.0
116-
```
117-
118-
Singularity:
119-
```bash
120-
singularity pull docker://quay.io/nasa_genelab/gl-refannottable:v1.0.0
121-
```
122-
123-
2. **Download the workflow files:**
124-
125-
```bash
126-
curl -LO https://github.com/nasa/GeneLab_Data_Processing/releases/download/GL_RefAnnotTable-A_1.1.0/GL_RefAnnotTable-A_1.1.0.zip
127-
unzip GL_RefAnnotTable-A_1.1.0.zip
128-
```
129-
130-
3. **Run the workflow:**
131-
132-
Docker:
133-
```bash
134-
docker run -it -v $(pwd)/GL_RefAnnotTable-A_1.1.0:/work \
135-
quay.io/nasa_genelab/gl-refannottable:v1.0.0 \
136-
bash -c "cd /work && Rscript GL-DPPD-7110-A_build-genome-annots-tab.R 'Mus musculus'"
137-
```
138-
139-
Singularity:
140-
```bash
141-
singularity exec -B $(pwd)/GL_RefAnnotTable-A_1.1.0:/work \
142-
gl-refannottable_v1.0.0.sif \
143-
bash -c "cd /work && Rscript GL-DPPD-7110-A_build-genome-annots-tab.R 'Mus musculus'"
144-
```
145-
146-
**Input data:**
147-
148-
- No input files are required. Specify the target organism using a positional command line argument. `Mus musculus` is used in the example above. To see a list of all available organisms, run `Rscript GL-DPPD-7110-A_build-genome-annots-tab.R` without positional arguments. The correct argument for each organism can also be found in the 'species' column of the [GL-DPPD-7110-A_annotations.csv](../../Pipeline_GL-DPPD-7110_Versions/GL-DPPD-7110-A/GL-DPPD-7110-A_annotations.csv)
149-
150-
- Optional: a reference table CSV can be supplied as a second positional argument instead of using the default [GL-DPPD-7110-A_annotations.csv](../../Pipeline_GL-DPPD-7110_Versions/GL-DPPD-7110-A/GL-DPPD-7110-A_annotations.csv)
151-
152-
**Output data:**
153-
154-
- *-GL-annotations.tsv (Tab delineated table of gene annotations)
155-
- *-GL-build-info.txt (Text file containing information used to create the annotation table, including tool and tool versions and date of creation)
115+
<br>
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
2+
#!/usr/bin/env bash
3+
4+
# Addresses issue: https://github.com/nextflow-io/nextflow/issues/1210
5+
6+
CONFILE=${1:-nextflow.config}
7+
OUTDIR=${2:-./singularity}
8+
9+
if [ ! -e $CONFILE ]; then
10+
echo "$CONFILE does not exist"
11+
exit
12+
fi
13+
14+
TMPFILE=`mktemp`
15+
16+
CURDIR=$(pwd)
17+
18+
mkdir -p $OUTDIR
19+
20+
cat ${CONFILE}|grep 'container'|perl -lane 'if ( $_=~/container\s*\=\s*\"(\S+)\"/ ) { $_=~/container\s*\=\s*\"(\S+)\"/; print $1 unless ( $1=~/^\s*$/ or $1=~/\.sif/ or $1=~/\.img/ ) ; }' > $TMPFILE
21+
22+
cd ${OUTDIR}
23+
24+
while IFS= read -r line; do
25+
name=$line
26+
name=${name/:/-}
27+
name=${name//\//-}
28+
echo $name
29+
singularity pull ${name}.img docker://$line
30+
done < $TMPFILE
31+
32+
cd $CURDIR

0 commit comments

Comments
 (0)