Skip to content

Commit f646141

Browse files
Merge pull request #142 from torres-alexis/DEV_RNAseq_vG_pr
Updated Eukaryotic Pipeline to GL-DPPD-7101-G added Prokaryotic Pipeline, GL-DPPD-7115, Updated Workflow to [NF_RCP] 2.0.0
2 parents f48e788 + 829828d commit f646141

File tree

230 files changed

+40684
-2521
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

230 files changed

+40684
-2521
lines changed

Amplicon/Illumina/Pipeline_GL-DPPD-7104_Versions/GL-DPPD-7104-B.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ Amanda Saravia-Butler (GeneLab Data Processing Lead)
3838

3939
<!-- Included R packages -->
4040
- Assay-specific suffixes were added where needed for GeneLab repo ("GLAmpSeq")
41-
- The ITS UNITE reference database used was updated to "UNITE_v2023_July2023.RData", from http://www2.decipher.codes/Classification/TrainingSets/
41+
- The ITS UNITE reference database used was updated to "UNITE_v2023_July2023.RData", from https://www2.decipher.codes/data/Downloads/TrainingSets/
4242
- Several program versions were updated (all versions listed in [Software used](#software-used) below)
4343

4444
---
@@ -103,8 +103,8 @@ Amanda Saravia-Butler (GeneLab Data Processing Lead)
103103

104104
|Program used| Database| Relevant Links|
105105
|:-----|:-----:|--------:|
106-
|DECIPHER| SILVA SSU r138 | [http://www2.decipher.codes/Classification/TrainingSets/SILVA_SSU_r138_2019.RData](http://www2.decipher.codes/Classification/TrainingSets/)|
107-
|DECIPHER| UNITE v2020 | [http://www2.decipher.codes/Classification/TrainingSets/UNITE_v2020_February2020.RData](http://www2.decipher.codes/Classification/TrainingSets/)|
106+
|DECIPHER| SILVA SSU r138 | [https://www2.decipher.codes/data/Downloads/TrainingSets/SILVA_SSU_r138_2019.RData](https://www2.decipher.codes/data/Downloads/TrainingSets/)|
107+
|DECIPHER| UNITE v2023 | [https://www2.decipher.codes/data/Downloads/TrainingSets/UNITE_v2023_July2023.RData](https://www2.decipher.codes/data/Downloads/TrainingSets/)|
108108

109109
---
110110

@@ -443,7 +443,7 @@ dna <- DNAStringSet(getSequences(seqtab.nochim))
443443

444444
Downloading the reference R taxonomy object:
445445
```R
446-
download.file( url=http://www2.decipher.codes/Classification/TrainingSets/SILVA_SSU_r138_2019.RData”, destfile=SILVA_SSU_r138_2019.RData”)
446+
download.file( url=https://www2.decipher.codes/data/Downloads/TrainingSets/SILVA_SSU_r138_2019.RData”, destfile=SILVA_SSU_r138_2019.RData”)
447447
```
448448

449449
**Parameter Definitions:**

Amplicon/Illumina/Workflow_Documentation/SW_AmpIllumina-B/CHANGELOG.md

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,13 @@
11
# Workflow change log
22

3+
## [1.2.3](https://github.com/nasa/GeneLab_Data_Processing/tree/SW_AmpIllumina-B_1.2.3/Amplicon/Illumina/Workflow_Documentation/SW_AmpIllumina-B)
4+
- Fixed broken decipher reference database links to the following:
5+
- 16S: https://www2.decipher.codes/data/Downloads/TrainingSets/SILVA_SSU_r138_2019.RData
6+
- ITS: https://www2.decipher.codes/data/Downloads/TrainingSets/UNITE_v2023_July2023.RData
7+
- 18S: https://www2.decipher.codes/data/Downloads/TrainingSets/PR2_v4_13_March2021.RData
8+
- Visualizations default setting is now set to TRUE
9+
- Disable with optional `run_workflow.py` argument `--visualizations FALSE` or setting `config.yaml` `enable_visualizations` to "FALSE"
10+
311
## [1.2.2](https://github.com/nasa/GeneLab_Data_Processing/tree/SW_AmpIllumina-B_1.2.2/Amplicon/Illumina/Workflow_Documentation/SW_AmpIllumina-B)
412
- Visualizations are now optional with the default being off.
513
- Enable with optional `run_workflow.py` argument `--visualizations TRUE` or setting `config.yaml` `enable_visualizations` to "TRUE"
@@ -36,4 +44,4 @@
3644

3745
<br>
3846

39-
All previous workflow changes were associated with [version A of the GeneLab Amplicon Seq Illumina Pipeline](../../Pipeline_GL-DPPD-7104_Versions/GL-DPPD-7104-A.md), and can be found in the [change log of the SW_AmpIllumina-A workflow](../SW_AmpIllumina-A/CHANGELOG.md).
47+
All previous workflow changes were associated with [version A of the GeneLab Amplicon Seq Illumina Pipeline](../../Pipeline_GL-DPPD-7104_Versions/GL-DPPD-7104-A.md), and can be found in the [change log of the SW_AmpIllumina-A workflow](../SW_AmpIllumina-A/CHANGELOG.md).

GeneLab_Reference_Annotations/Pipeline_GL-DPPD-7110_Versions/GL-DPPD-7110-A/GL-DPPD-7110-A.md

Lines changed: 864 additions & 0 deletions
Large diffs are not rendered by default.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
name,species,strain,ensemblVersion,ref_source,fasta,gtf,taxon,bioconductor_annotations,custom_annotations,genelab_annots_link,genelab_annots_info_link
2+
ARABIDOPSIS,Arabidopsis thaliana,,59,ensembl_plants,https://ftp.ensemblgenomes.ebi.ac.uk/pub/plants/release-59/fasta/arabidopsis_thaliana/dna/Arabidopsis_thaliana.TAIR10.dna.toplevel.fa.gz,https://ftp.ensemblgenomes.ebi.ac.uk/pub/plants/release-59/gtf/arabidopsis_thaliana/Arabidopsis_thaliana.TAIR10.59.gtf.gz,3702,org.At.tair.db,,https://figshare.com/ndownloader/files/48354355,https://figshare.com/ndownloader/files/48354352
3+
BACSU,Bacillus subtilis,subsp. subtilis 168,59,ensembl_bacteria,https://ftp.ensemblgenomes.ebi.ac.uk/pub/bacteria/release-59/fasta/bacteria_0_collection/bacillus_subtilis_subsp_subtilis_str_168_gca_000009045/dna/Bacillus_subtilis_subsp_subtilis_str_168_gca_000009045.ASM904v1.dna.toplevel.fa.gz,https://ftp.ensemblgenomes.ebi.ac.uk/pub/bacteria/release-59/gtf/bacteria_0_collection/bacillus_subtilis_subsp_subtilis_str_168_gca_000009045/Bacillus_subtilis_subsp_subtilis_str_168_gca_000009045.ASM904v1.59.gtf.gz,224308,,org.Bsubtilissubspsubtilis168.eg.db,https://figshare.com/ndownloader/files/48354346,https://figshare.com/ndownloader/files/48354349
4+
BRADI,Brachypodium distachyon,,59,ensembl_plants,https://ftp.ensemblgenomes.ebi.ac.uk/pub/plants/release-59/fasta/brachypodium_distachyon/dna/Brachypodium_distachyon.Brachypodium_distachyon_v3.0.dna.toplevel.fa.gz,https://ftp.ensemblgenomes.ebi.ac.uk/pub/plants/release-59/gtf/brachypodium_distachyon/Brachypodium_distachyon.Brachypodium_distachyon_v3.0.59.gtf.gz,15368,,org.Bdistachyon.eg.db,https://figshare.com/ndownloader/files/48354370,https://figshare.com/ndownloader/files/48354361
5+
BRARP,Brassica rapa,,59,ensembl_plants,http://ftp.ensemblgenomes.org/pub/plants/release-59/fasta/brassica_rapa/dna/Brassica_rapa.Brapa_1.0.dna.toplevel.fa.gz,http://ftp.ensemblgenomes.org/pub/plants/release-59/gtf/brassica_rapa/Brassica_rapa.Brapa_1.0.59.gtf.gz,,,,,
6+
WORM,Caenorhabditis elegans,,112,ensembl,https://ftp.ensembl.org/pub/release-112/fasta/caenorhabditis_elegans/dna/Caenorhabditis_elegans.WBcel235.dna.toplevel.fa.gz,https://ftp.ensembl.org/pub/release-112/gtf/caenorhabditis_elegans/Caenorhabditis_elegans.WBcel235.112.gtf.gz,6239,org.Ce.eg.db,,https://figshare.com/ndownloader/files/48354373,https://figshare.com/ndownloader/files/48354364
7+
ZEBRAFISH,Danio rerio,,112,ensembl,http://ftp.ensembl.org/pub/release-112/fasta/danio_rerio/dna/Danio_rerio.GRCz11.dna.primary_assembly.fa.gz,http://ftp.ensembl.org/pub/release-112/gtf/danio_rerio/Danio_rerio.GRCz11.112.gtf.gz,7955,org.Dr.eg.db,,https://figshare.com/ndownloader/files/48354388,https://figshare.com/ndownloader/files/48354367
8+
FLY,Drosophila melanogaster,,112,ensembl,http://ftp.ensembl.org/pub/release-112/fasta/drosophila_melanogaster/dna/Drosophila_melanogaster.BDGP6.46.dna.toplevel.fa.gz,http://ftp.ensembl.org/pub/release-112/gtf/drosophila_melanogaster/Drosophila_melanogaster.BDGP6.46.112.gtf.gz,7227,org.Dm.eg.db,,https://figshare.com/ndownloader/files/48354382,https://figshare.com/ndownloader/files/48354376
9+
ERCC,,,,ThermoFisher,https://assets.thermofisher.com/TFS-Assets/LSG/manuals/ERCC92.zip,https://assets.thermofisher.com/TFS-Assets/LSG/manuals/ERCC92.zip,,,,,
10+
ECOLI,Escherichia coli,str. K-12 substr. MG1655,59,ensembl_bacteria,https://ftp.ensemblgenomes.ebi.ac.uk/pub/bacteria/release-59/fasta/bacteria_0_collection/escherichia_coli_str_k_12_substr_mg1655_gca_000005845/dna/Escherichia_coli_str_k_12_substr_mg1655_gca_000005845.ASM584v2.dna.toplevel.fa.gz,https://ftp.ensemblgenomes.ebi.ac.uk/pub/bacteria/release-59/gtf/bacteria_0_collection/escherichia_coli_str_k_12_substr_mg1655_gca_000005845/Escherichia_coli_str_k_12_substr_mg1655_gca_000005845.ASM584v2.59.gtf.gz,511145,,org.EcolistrK12substrMG1655.eg.db,https://figshare.com/ndownloader/files/48354379,https://figshare.com/ndownloader/files/48354394
11+
HUMAN,Homo sapiens,,112,ensembl,https://ftp.ensembl.org/pub/release-112/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz,https://ftp.ensembl.org/pub/release-112/gtf/homo_sapiens/Homo_sapiens.GRCh38.112.gtf.gz,9606,org.Hs.eg.db,,https://figshare.com/ndownloader/files/48354445,https://figshare.com/ndownloader/files/48354448
12+
,Lactobacillus acidophilus,NCFM,,ncbi,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/011/985/GCF_000011985.1_ASM1198v1/GCF_000011985.1_ASM1198v1_genomic.fna.gz,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/011/985/GCF_000011985.1_ASM1198v1/GCF_000011985.1_ASM1198v1_genomic.gtf.gz,272621,,,https://figshare.com/ndownloader/files/49061254,https://figshare.com/ndownloader/files/49061257
13+
MOUSE,Mus musculus,,112,ensembl,https://ftp.ensembl.org/pub/release-112/fasta/mus_musculus/dna/Mus_musculus.GRCm39.dna.primary_assembly.fa.gz,https://ftp.ensembl.org/pub/release-112/gtf/mus_musculus/Mus_musculus.GRCm39.112.gtf.gz,10090,org.Mm.eg.db,,https://figshare.com/ndownloader/files/48354460,https://figshare.com/ndownloader/files/48354457
14+
,Mycobacterium marinum,M,,ncbi,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/018/345/GCF_000018345.1_ASM1834v1/GCF_000018345.1_ASM1834v1_genomic.gtf.gz,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/018/345/GCF_000018345.1_ASM1834v1/GCF_000018345.1_ASM1834v1_genomic.gtf.gz,216594,,,https://figshare.com/ndownloader/files/49061260,https://figshare.com/ndownloader/files/49061263
15+
ORYSJ,Oryza sativa,Japonica,59,ensembl_plants,https://ftp.ensemblgenomes.ebi.ac.uk/pub/plants/release-59/fasta/oryza_sativa/dna/Oryza_sativa.IRGSP-1.0.dna.toplevel.fa.gz,https://ftp.ensemblgenomes.ebi.ac.uk/pub/plants/release-59/gtf/oryza_sativa/Oryza_sativa.IRGSP-1.0.59.gtf.gz,39947,,,https://figshare.com/ndownloader/files/48354451,https://figshare.com/ndownloader/files/48354454
16+
ORYLA,Oryzias latipes,,112,ensembl,http://ftp.ensembl.org/pub/release-112/fasta/oryzias_latipes/dna/Oryzias_latipes.ASM223467v1.dna.toplevel.fa.gz,http://ftp.ensembl.org/pub/release-112/gtf/oryzias_latipes/Oryzias_latipes.ASM223467v1.112.gtf.gz,8090,,org.Olatipes.eg.db,https://figshare.com/ndownloader/files/48354463,https://figshare.com/ndownloader/files/48354466
17+
,Pseudomonas aeruginosa,UCBPP-PA14,,ncbi,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/014/625/GCF_000014625.1_ASM1462v1/GCF_000014625.1_ASM1462v1_genomic.fna.gz,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/014/625/GCF_000014625.1_ASM1462v1/GCF_000014625.1_ASM1462v1_genomic.gtf.gz,208963,,,https://figshare.com/ndownloader/files/49061266,https://figshare.com/ndownloader/files/49061269
18+
RAT,Rattus norvegicus,,112,ensembl,http://ftp.ensembl.org/pub/release-112/fasta/rattus_norvegicus/dna/Rattus_norvegicus.mRatBN7.2.dna.toplevel.fa.gz,http://ftp.ensembl.org/pub/release-112/gtf/rattus_norvegicus/Rattus_norvegicus.mRatBN7.2.112.gtf.gz,10116,org.Rn.eg.db,,https://figshare.com/ndownloader/files/48354472,https://figshare.com/ndownloader/files/48354475
19+
YEAST,Saccharomyces cerevisiae,S288C,112,ensembl,http://ftp.ensembl.org/pub/release-112/fasta/saccharomyces_cerevisiae/dna/Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fa.gz,http://ftp.ensembl.org/pub/release-112/gtf/saccharomyces_cerevisiae/Saccharomyces_cerevisiae.R64-1-1.112.gtf.gz,559292,org.Sc.sgd.db,,https://figshare.com/ndownloader/files/48354469,https://figshare.com/ndownloader/files/48354478
20+
SALTY,Salmonella enterica,serovar Typhimurium str. LT2,,ncbi,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/006/945/GCF_000006945.2_ASM694v2/GCF_000006945.2_ASM694v2_genomic.fna.gz,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/006/945/GCF_000006945.2_ASM694v2/GCF_000006945.2_ASM694v2_genomic.gtf.gz,99287,,org.SentericaserovarTyphimuriumstrLT2.eg.db,https://figshare.com/ndownloader/files/49061272,https://figshare.com/ndownloader/files/49061275
21+
,Serratia liquefaciens,ATCC 27592,,ncbi,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/422/085/GCF_000422085.1_ASM42208v1/GCF_000422085.1_ASM42208v1_genomic.fna.gz,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/422/085/GCF_000422085.1_ASM42208v1/GCF_000422085.1_ASM42208v1_genomic.gtf.gz,1346614,,,https://figshare.com/ndownloader/files/49061278,https://figshare.com/ndownloader/files/49061281
22+
,Staphylococcus aureus,MRSA252,,ncbi,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/011/505/GCF_000011505.1_ASM1150v1/GCF_000011505.1_ASM1150v1_genomic.fna.gz,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/011/505/GCF_000011505.1_ASM1150v1/GCF_000011505.1_ASM1150v1_genomic.gtf.gz,282458,,,https://figshare.com/ndownloader/files/49061284,https://figshare.com/ndownloader/files/49061287
23+
,Streptococcus mutans,UA159,,ncbi,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/007/465/GCF_000007465.2_ASM746v2/GCF_000007465.2_ASM746v2_genomic.fna.gz,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/007/465/GCF_000007465.2_ASM746v2/GCF_000007465.2_ASM746v2_genomic.gtf.gz,210007,,,https://figshare.com/ndownloader/files/49061290,https://figshare.com/ndownloader/files/49061293
24+
,Vibrio fischeri,ES114,,ncbi,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/011/805/GCF_000011805.1_ASM1180v1/GCF_000011805.1_ASM1180v1_genomic.fna.gz,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/011/805/GCF_000011805.1_ASM1180v1/GCF_000011805.1_ASM1180v1_genomic.gtf.gz,312309,,,https://figshare.com/ndownloader/files/49061296,https://figshare.com/ndownloader/files/49061299

GeneLab_Reference_Annotations/README.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# GeneLab pipeline for generating reference annotation tables
22

3-
> **The document [`GL-DPPD-7110.md`](Pipeline_GL-DPPD-7110_Versions/GL-DPPD-7110/GL-DPPD-7110.md) holds an overview and example commands for how GeneLab generates reference annotation tables. See the [Repository Links](#repository-links) descriptions below for more information.**
3+
> **The document [`GL-DPPD-7110-A.md`](Pipeline_GL-DPPD-7110_Versions/GL-DPPD-7110-A/GL-DPPD-7110-A.md) holds an overview and example commands for how GeneLab generates reference annotation tables. See the [Repository Links](#repository-links) descriptions below for more information.**
44
55
---
66
## Repository Links
@@ -17,6 +17,9 @@
1717

1818
---
1919

20-
**Developed and maintained by:**
20+
**Developed by:**
2121
Mike Lee
2222

23+
**Maintained by:**
24+
Alexis Torres
25+
Crystal Han
Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
# Changelog
2+
3+
All notable changes to this project will be documented in this file.
4+
5+
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6+
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7+
8+
## [1.1.0](https://github.com/nasa/GeneLab_Data_Processing/blob/DEV_GeneLab_Reference_Annotations_vGL-DPPD-7110-A/GeneLab_Reference_Annotations/Workflow_Documentation/GL_RefAnnotTable-A)
9+
10+
### Added
11+
12+
- Added software:
13+
- AnnotationForge version 1.46.0
14+
- biomaRt version 2.60.1
15+
- GO.db version 3.19.1
16+
- Added support for:
17+
- Bacillus subtilis, subsp. subtilis 168
18+
- Brachypodium distachyon
19+
- Escherichia coli,str. K-12 substr. MG1655
20+
- Oryzias latipes
21+
- Lactobacillus acidophilus NCFM
22+
- Mycobacterium marinum M
23+
- Oryza sativa Japonica
24+
- Pseudomonas aeruginosa UCBPP-PA14
25+
- Salmonella enterica subsp. enterica serovar Typhimurium str. LT2
26+
- Serratia liquefaciens ATCC 27592
27+
- Staphylococcus aureus MRSA252
28+
- Streptococcus mutans UA159
29+
- Vibrio fischeri ES114
30+
- Added AnnotationForge helper script install-org-db.R to create
31+
organism-specific annotation packages (org.*.eg.db) in R if not available on
32+
Bioconductor. Used for:
33+
- Bacillus subtilis, subsp. subtilis 168
34+
- Brachypodium distachyon
35+
- Escherichia coli,str. K-12 substr. MG1655
36+
- Oryzias latipes
37+
- Salmonella enterica subsp. enterica serovar Typhimurium str. LT2
38+
- Added NCBI as a source for FASTA and GTF files
39+
40+
### Fixed
41+
42+
- Fixed processing for ECOLI
43+
44+
### Changed
45+
46+
- Updated Ensembl versions:
47+
- Animals: Ensembl release 112
48+
- Plants: Ensembl plants release 59
49+
- Bacteria: Ensembl bacteria release 59
50+
- Updated software:
51+
- tidyverse version updated from 1.3.2 to 2.0.0
52+
- STRINGdb version updated from 2.8.4 to 2.16.4
53+
- PANTHER.db version updated from 1.0.11 to 1.0.12
54+
- rtracklayer version updated from 1.56.1 to 1.64.0
55+
- Bioconductor version updated from 3.15.1 to 3.19
56+
- Removed org.EcK12.eg.db and replaced it with a locally created annotations
57+
database, as it is no longer available on Bioconductor
58+
- Changed the first argument of GL-DPPD-7110-A_build-genome-annots-tab.R from
59+
the 'name' column value to the 'species' column value (e.g., 'Mus musculus' instead of 'MOUSE')
60+
61+
62+
## [1.0.0](https://github.com/nasa/GeneLab_Data_Processing/releases/tag/GL_RefAnnotTable_1.0.0)

0 commit comments

Comments
 (0)