You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- target_organism (name of the target organism, output from [step 1](#1-define-variables-and-output-file-names))
343
-
- target_taxid (taxonomic identifier for the target organism, output from [step 1](#1-define-variables-and-output-file-names))
338
+
-`target_org_db` (variable specifying the name of the org.db R package for the target organism, output from [step 1](#1-define-variables-and-output-file-names))
339
+
-`ref_table` (variable specifying the reference table containing organism-specific information, output from [step 1](#1-define-variables-and-output-file-names))
340
+
-`target_organism` (variable specifying the full species name of the target organism, output from [step 1](#1-define-variables-and-output-file-names))
341
+
-`target_taxid` (variable specifying the taxonomic identifier for the target organism, output from [step 1](#1-define-variables-and-output-file-names))
344
342
345
343
**Output Data:**
346
344
347
-
- target_org_db (updated name of the org.db R package, if it was created locally)
345
+
-`target_org_db` (variable specifying the updated name of the org.db R package, if it was created locally)
348
346
- Locally installed org.db package (if the package is not available on Bioconductor, a new package is created and installed)
- gtf_link (URL to the GTF file for the target organism, output from [step 1](#1-define-variables-and-output-file-names))
384
-
- target_org_db (name of the org.eg.db R package for the target organism, output from [steps 1](#1-define-variables-and-output-file-names)and[2](#2-create-the-organism-package-if-it-is-not-hosted-by-bioconductor))
385
-
- target_organism (name of the target organism, output from [step 1](#1-define-variables-and-output-file-names))
386
-
- currently_accepted_orgs (list of currently supported organisms, output from [step 0](#0-set-up-environment))
387
-
- ref_tab_path (path to the reference table CSV file, output from [step 0](#0-set-up-environment))
381
+
-`gtf_link` (variable specifying the URL to the GTF file for the target organism, output from [step 1](#1-define-variables-and-output-file-names))
382
+
-`target_org_db` (variable specifying the name of the org.eg.db R package for the target organism, output from [steps 1](#1-define-variables-and-output-file-names)or[2](#2-create-the-organism-package-if-it-is-not-hosted-by-bioconductor))
383
+
-`target_organism` (variable specifying the full species name of the target organism, output from [step 1](#1-define-variables-and-output-file-names))
384
+
-`currently_accepted_orgs` (variable specifying the list of currently supported organisms, output from [step 0](#0-set-up-environment))
385
+
-`ref_tab_path` (variable specifying the path to the reference table CSV file, output from [step 0](#0-set-up-environment))
388
386
389
387
**Output Data:**
390
388
391
-
- GTF (data frame containing the GTF file for the target organism)
392
-
- no_org_db (list of organisms that do not use org.db annotations due to inconsistent gene names across GTF and org.db)
389
+
-`GTF` (variable holding the data frame containing the GTF file for the target organism)
390
+
-`no_org_db` (variable specifying the list of organisms that do not use org.db annotations due to inconsistent gene names across GTF and org.db)
393
391
394
392
<br>
395
393
@@ -465,14 +463,14 @@ if (target_organism == "Salmonella enterica") {
465
463
466
464
**Input Data:**
467
465
468
-
- GTF (data frame containing the parsed GTF file for the target organism, output from [step 3](#3-load-annotation-databases))
469
-
- target_organism (target organism's full species name, output from [step 1](#1-define-variables-and-output-file-names))
470
-
- gtf_keytype_mappings (list of keys to extract from the GTF, for each organism)
466
+
-`GTF` (variable holding the data frame containing the parsed GTF file for the target organism, output from [step 3](#3-load-annotation-databases))
467
+
-`target_organism` (variable specifying the full species name of the target organism, output from [step 1](#1-define-variables-and-output-file-names))
468
+
-`gtf_keytype_mappings` (variable specifying the list of keys to extract from the GTF, for each organism)
471
469
472
470
**Output Data:**
473
471
474
-
- annot_gtf (initial annotation table derived from the GTF file, containing only the relevant columns for the target organism)
475
-
- primary_keytype (the name of the primary key type being used, e.g., "ENSEMBL", "TAIR", "LOCUS", based on the GTF gene_id entries)
472
+
-`annot_gtf` (variable holding the initial annotation table derived from the GTF file, containing only the relevant columns for the target organism)
473
+
-`primary_keytype` (variable specifying the name of the primary key type being used, e.g., "ENSEMBL", "TAIR", "LOCUS", based on the GTF gene_id entries)
476
474
477
475
<br>
478
476
@@ -579,17 +577,17 @@ if (target_organism == "Saccharomyces cerevisiae") {
579
577
580
578
**Input Data:**
581
579
582
-
- annot_gtf (initial annotation table derived from the GTF file, output from [step 4](#4-build-initial-annotation-table))
583
-
- target_organism (target organism's full species name, output from [step 1](#1-define-variables-and-output-file-names))
584
-
- no_org_db (list of organisms that do not use annotations from an org.db, output from [step 3](#3-load-annotation-databases))
585
-
- primary_keytype (the name of the primary key type being used, output from [step 4](#4-build-initial-annotation-table))
586
-
- target_org_db (name of the org.eg.db R package for the target organism, output from [steps 1](#1-define-variables-and-output-file-names)and[2](#2-create-the-organism-package-if-it-is-not-hosted-by-bioconductor))
580
+
-`annot_gtf` (variable holding the initial annotation table derived from the GTF file, output from [step 4](#4-build-initial-annotation-table))
581
+
-`target_organism` (variable specifying the full species name of the target organism, output from [step 1](#1-define-variables-and-output-file-names))
582
+
-`no_org_db` (variable specifying the list of organisms that do not use annotations from an org.db, output from [step 3](#3-load-annotation-databases))
583
+
-`primary_keytype` (variable specifying the name of the primary key type being used, output from [step 4](#4-build-initial-annotation-table))
584
+
-`target_org_db` (variable specifying the name of the org.eg.db R package for the target organism, output from [steps 1](#1-define-variables-and-output-file-names)or[2](#2-create-the-organism-package-if-it-is-not-hosted-by-bioconductor))
587
585
588
586
**Output Data:**
589
587
590
-
- annot_orgdb (updated annotation table with additional keys from the organism-specific org.db)
591
-
- orgdb_query (the key type used to map to the org.db)
592
-
- orgdb_keytype (the name of the key type in the org.db)
588
+
-`annot_orgdb` (variable holding the updated annotation table with GTF and organism-specific org.db annotations)
589
+
-`orgdb_query` (variable specifying the key type used to map to the org.db)
590
+
-`orgdb_keytype` (variable specifying the name of the key type in the org.db)
593
591
594
592
<br>
595
593
@@ -624,7 +622,6 @@ stringdb_query <- if (!is.null(stringdb_query_list[[target_organism]])) {
- annot_orgdb (annotation table with GTF and org.db annotations, output from [step 5](#5-add-orgdb-keys))
709
-
- target_organism (target organism's full species name, output from [step 1](#1-define-variables-and-output-file-names))
710
-
- primary_keytype (the name of the primary key type being used, output from [step 4](#4-build-initial-annotation-table))
711
-
- target_taxid (taxonomic identifier for the target organism, output from [step 1](#1-define-variables-and-output-file-names))
705
+
-`annot_orgdb` (variable holding the annotation table with GTF and organism-specific org.db annotations, output from [step 5](#5-add-orgdb-keys))
706
+
-`target_organism` (variable specifying the full species name of the target organism, output from [step 1](#1-define-variables-and-output-file-names))
707
+
-`primary_keytype` (variable specifying the name of the primary key type being used, output from [step 4](#4-build-initial-annotation-table))
708
+
-`target_taxid` (variable specifying the taxonomic identifier for the target organism, output from [step 1](#1-define-variables-and-output-file-names))
712
709
713
710
**Output Data:**
714
711
715
-
- annot_stringdb (updated annotation table with added STRING IDs)
716
-
- no_stringdb (list of organisms that do not use STRING annotations)
717
-
- stringdb_query (the key type used for mapping to STRING database)
718
-
- uses_old_locus (list of organisms where GTF gene_id entries do not match those in STRING, so entries in OLD_LOCUS are used to query STRING)
712
+
-`annot_stringdb` (variable holding the updated annotation table with GTF, organism-specific org.db, and STRING annotations)
713
+
-`no_stringdb` (variable specifying the list of organisms that do not use STRING annotations)
714
+
-`stringdb_query` (variable specifying the key type used for mapping to STRING database)
715
+
-`uses_old_locus` (variable specifying the list of organisms where GTF gene_id entries do not match those in STRING, so entries in OLD_LOCUS are used to query STRING)
719
716
720
717
<br>
721
718
@@ -736,7 +733,6 @@ if (!(target_organism %in% no_panther_db)) {
736
733
pantherdb_keytype="ENTREZ"
737
734
738
735
# Retrieve target organism PANTHER GO slim annotations database using the UNIPROT / PANTHER short name
739
-
target_short_name<-target_species_designation
740
736
pthOrganisms(PANTHER.db) <-target_short_name
741
737
742
738
# Define a function to retrieve GO slim IDs for a given gene's ENTREZIDs, which may include entries separated by a "|"
@@ -768,17 +764,13 @@ if (!(target_organism %in% no_panther_db)) {
768
764
769
765
**Input Data:**
770
766
771
-
- annot_orgdb (annotation table with GTF and org.db annotations, output from [step 5](#5-add-orgdb-keys))
772
-
- target_organism (target organism's full species name, output from [step 1](#1-define-variables-and-output-file-names))
773
-
- primary_keytype (the name of the primary key type being used, output from [step 4](#4-build-initial-annotation-table))
774
-
- target_taxid (taxonomic identifier for the target organism, output from [step 1](#1-define-variables-and-output-file-names))
767
+
-`annot_stringdb` (variable holding the annotation table with GTF, organism-specific org.db, and STRING annotations, output from [step 6](#6-add-string-ids))
768
+
-`target_organism` (variable specifying the full species name of the target organism, output from [step 1](#1-define-variables-and-output-file-names))
775
769
776
770
**Output Data:**
777
771
778
-
- annot_stringdb (updated annotation table with added STRING IDs)
779
-
- no_stringdb (list of organisms that do not use STRING annotations)
780
-
- stringdb_query (the key type used for mapping to STRING database)
781
-
- uses_old_locus (list of organisms where the 'gene_id' column in the GTF dataframe does not match STRING identifiers, so the 'old_locus_tag' column from the GTF dataframe is used to query STRING instead)
772
+
-`annot_pantherdb` (variable holding the updated annotation table with GTF, organism-specific org.db, STRING, and PANTHER GO Slim annotations)
773
+
-`no_panther_db` (variable specifying the list of organisms that do not use PANTHER annotations)
- annot_pantherdb (annotation table with GTF, org.db, STRING, and PANTHER annotations, output from [step 7](#7-add-gene-ontology-go-slim-ids))
831
-
- primary_keytype (the name of the primary key type being used, output from [step 4](#4-build-initial-annotation-table))
832
-
- out_table_filename (name of the output annotation table file, output from [step 1](#1-define-variables-and-output-file-names))
833
-
- out_log_filename (name of the output log file, output from [step 1](#1-define-variables-and-output-file-names))
834
-
- GL_DPPD_ID (GeneLab Data Processing Pipeline Document ID, output from [step 0](#0-set-up-environment))
835
-
- gtf_link (URL to the GTF file for the target organism, output from [step 1](#1-define-variables-and-output-file-names))
836
-
- target_org_db (name of the org.eg.db R package for the target organism, output from [steps 1](#1-define-variables-and-output-file-names)and[2](#2-create-the-organism-package-if-it-is-not-hosted-by-bioconductor))
837
-
- no_org_db (list of organisms that do not use org.db annotations, output from [step 3](#3-load-annotation-databases))
822
+
-`annot_pantherdb` (variable holding the updated annotation table with GTF, organism-specific org.db, STRING, and PANTHER GO Slim annotations, output from [step 7](#7-add-gene-ontology-go-slim-ids))
823
+
-`primary_keytype` (variable specifying the name of the primary key type being used, output from [step 4](#4-build-initial-annotation-table))
824
+
-`out_table_filename` (variable specifying the name of the output annotation table file, output from [step 1](#1-define-variables-and-output-file-names))
825
+
-`out_log_filename` (variable specifying the name of the output log file, output from [step 1](#1-define-variables-and-output-file-names))
826
+
-`GL_DPPD_ID` (variable specifying the GeneLab Data Processing Pipeline Document ID, output from [step 0](#0-set-up-environment))
827
+
-`gtf_link` (variable specifying the URL to the GTF file for the target organism, output from [step 1](#1-define-variables-and-output-file-names))
828
+
-`target_org_db` (variable specifying the name of the org.eg.db R package for the target organism, output from [steps 1](#1-define-variables-and-output-file-names)or[2](#2-create-the-organism-package-if-it-is-not-hosted-by-bioconductor))
829
+
-`no_org_db` (variable specifying the list of organisms that do not use org.db annotations, output from [step 3](#3-load-annotation-databases))
838
830
839
831
**Output Data:**
840
832
841
-
- annot (final annotation table with annotations from the GTF, org.db, STRING, and PANTHER)
842
-
-***-GL-annotations.tsv** (annot saved as a tab-delimited table file)
833
+
-`annot` (variable holding the final annotation table with GTF, organism-specific org.db, STRING, and PANTHER GO Slim annotations)
834
+
-***-GL-annotations.tsv** (final annotation table saved as a tab-delimited table file)
843
835
-***-GL-build-info.txt** (annotation table build information log file)
0 commit comments