You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
SALTY,Salmonella enterica,serovar Typhimurium str. LT2,,ncbi,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/006/945/GCF_000006945.2_ASM694v2/GCF_000006945.2_ASM694v2_genomic.fna.gz,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/006/945/GCF_000006945.2_ASM694v2/GCF_000006945.2_ASM694v2_genomic.gtf.gz,99287,org.SentericaserovarTyphimuriumstrLT2.eg.db,https://figshare.com/ndownloader/files/48354385,https://figshare.com/ndownloader/files/48354391
- Salmonella enterica subsp. enterica serovar Typhimurium str. LT2
22
+
- Serratia liquefaciens ATCC 27592
23
+
- Staphylococcus aureus MRSA252
24
+
- Streptococcus mutans UA159
25
+
- Vibrio fischeri ES114
26
+
- Added AnnotationForge helper script install-org-db.R to create organism-specific annotation packages (org.*.eg.db) in R if not available on Bioconductor. Used for:
27
+
- Bacillus subtilis, subsp. subtilis 168
28
+
- Brachypodium distachyon
29
+
- Escherichia coli,str. K-12 substr. MG1655
30
+
- Oryzias latipes
31
+
- Salmonella enterica subsp. enterica serovar Typhimurium str. LT2
32
+
- Added NCBI as a source for FASTA and GTF files
33
+
34
+
### Fixed
35
+
36
+
- Fixed processing for ECOLI
37
+
38
+
### Changed
39
+
40
+
- Updated Ensembl versions
41
+
- Animals: Ensembl release 112
42
+
- Plants: Ensembl plants release 59
43
+
- Bacteria: Ensembl bacteria release 59
44
+
- Removed org.EcK12.eg.db and replaced it with a locally created annotations database, as it is no longer available on Bioconductor
45
+
- Changed the first argument of GL-DPPD-7110-A_build-genome-annots-tab.R from the 'name' column value to the 'species' column value (e.g., 'Mus musculus' instead of 'MOUSE')
# GL_RefAnnotTable Workflow Information and Usage Instructions
2
+
3
+
## General workflow info
4
+
The current GeneLab Reference Annotation Table (GL_RefAnnotTable) pipeline is implemented as an R workflow that can be run from a command line interface (CLI) using bash. The workflow can be used even if you are unfamiliar with R, but if you want to learn more about R, visit the [R-project about page here](https://www.r-project.org/about.html). Additionally, an introduction to R along with installation help and information about using R for bioinformatics can be found [here at Happy Belly Bioinformatics](https://astrobiomike.github.io/R/basics).
5
+
6
+
## Utilizing the workflow
7
+
8
+
1.[Install R and R packages](#1-install-r-and-r-packages)
9
+
2.[Download the workflow files](#2-download-the-workflow-files)
10
+
3.[Setup Execution Permission for Workflow Scripts](#3-setup-execution-permission-for-workflow-scripts)
11
+
4.[Run the workflow](#4-run-the-workflow)
12
+
5.[Run the annotations database creation function as a stand-alone script](#5-run-the-annotations-database-creation-function-as-a-stand-alone-script)
13
+
<br>
14
+
15
+
### 1. Install R and R packages
16
+
17
+
We recommend installing R via the [Comprehensive R Archive Network (CRAN)](https://cran.r-project.org/) as follows:
18
+
19
+
1. Select the [CRAN Mirror](https://cran.r-project.org/mirrors.html) closest to your location.
20
+
2. Click the link under the "Download and Install R" section that's consistent with your machine.
21
+
3. Click on the R-4.4.0 package consistent with your machine to download.
22
+
4. Double click on the R-4.4.0.pkg downloaded in step 3 and follow the installation instructions.
23
+
24
+
Once R is installed, open a CLI terminal and run the following command to activate R:
25
+
26
+
```bash
27
+
R
28
+
```
29
+
30
+
Within an active R environment, run the following commands to install the required R packages:
All files required for utilizing the GL_RefAnnotTable workflow for generating reference annotation tables are in the [workflow_code](workflow_code) directory. To get a copy of latest GL_RefAnnotTable version on to your system, run the following command:
### 3. Setup Execution Permission for Workflow Scripts
55
+
56
+
Once you've downloaded the GL_RefAnnotTable workflow directory as a zip file, unzip the workflow then `cd` into the GL_RefAnnotTable-A_1.1.0 directory on the CLI. Next, run the following command to set the execution permissions for the R script:
57
+
58
+
```bash
59
+
chmod -R u+x *R
60
+
```
61
+
62
+
<br>
63
+
64
+
### 4. Run the Workflow
65
+
66
+
While in the GL_RefAnnotTable workflow directory, you are now able to run the workflow. Below is an example of how to run the workflow to build an annotation table for Mus musculus (mouse):
- No input files are required. Specify the target organism using a positional command line argument. `Mus musculus` is used in the example above. To see a list of all available organisms, run `Rscript GL-DPPD-7110-A_build-genome-annots-tab.R` without positional arguments. The correct argument for each organism can also be found in the 'species' column of the [GL-DPPD-7110-A_annotations.csv](../../Pipeline_GL-DPPD-7110_Versions/GL-DPPD-7110-A/GL-DPPD-7110-A_annotations.csv)
75
+
76
+
- Optional: a reference table CSV can be supplied as a second positional argument instead of using the default [GL-DPPD-7110-A_annotations.csv](../../Pipeline_GL-DPPD-7110_Versions/GL-DPPD-7110-A/GL-DPPD-7110-A_annotations.csv)
77
+
78
+
**Output data:**
79
+
80
+
-*-GL-annotations.tsv (Tab delineated table of gene annotations)
81
+
-*-GL-build-info.txt (Text file containing information used to create the annotation table, including tool and tool versions and date of creation)
82
+
83
+
### 5. Run the annotations database creation function as a stand-alone script
84
+
85
+
When the workflow is run, if the reference table does not specify an annotations database for the target_organism in the `annotations` column, the `install_annotations` function, defined in the `install-org-db.R` script, will be executed. This script will locally create and install an annotations database R package using AnnotationForge. This function can also be run as a stand-alone script from the command line:
- The target organism must be specified as the first positional command line argument, `Bacillus subtilis` is used in the example above. The correct argument for each organism can be found in the 'species' column of the [GL-DPPD-7110-A_annotations.csv](../../Pipeline_GL-DPPD-7110_Versions/GL-DPPD-7110-A/GL-DPPD-7110-A_annotations.csv)
94
+
95
+
- The path to a local reference table must also be supplied as the second positional argument
96
+
97
+
Output data:
98
+
99
+
- org.*.eg.db/ (species-specific annotation database, as a local R package)
0 commit comments