Which human genes are implicated in tumor development?
geneOncoX is an R package that address this question through the integration of a number of resources with respect to the functional roles of cancer genes, and also their representation in commercially available targeted sequencing assays (gene panels). The integrated annotations include the following resources:
- IntOGen - compendium of mutational cancer driver genes
- Network of Cancer Genes - collection of curated cancer genes
- CanVar-UK - cancer predisposition genes
- CancerMine - text-mined predictions of tumor suppressor genes, proto-oncogenes and cancer drivers
- DNA repair genes - collection of genes involved in DNA repair
- Genomics England PanelApp - collections of cancer gene panels used in clinical diagnostics
- TSO500 targets - cancer genes targeted by Illumina's TSO500 gene panel
- F1CDx targets - cancer genes targeted by Foundation One's F1CDx gene panel
The package offers a few pre-processed datasets, along with metadata, that the user can retrieve and use for their own projects or set-ups. The package utilizes the googledrive R package to download the pre-processed and documented datasets to a local cache directory provided by the user.
remotes::install_github('sigven/geneOncoX')
The package offers (currently) five different functions, that each retrieves a specific dataset that can be of use for gene annotation purposes.
-
get_basic()- retrieves basic, non-transcript-specific gene annotations. Includes tumor suppressor gene/oncogene/driver annotations from multiple resources, NCBI gene summary descriptions, as well as multiple predictions/scores when it comes to gene indispensability and loss-of-function tolerance -
get_gencode()- retrieves two datasets ( grch37 and grch38 ) with human gene transcripts from GENCODE, including cross-references to RefSeq, UniProt, APPRIS, and MANE -
get_alias()- retrieves a list of gene synonyms, indicating which synonyms are ambiguous or nonambiguous (with respect to primary gene symbols) -
get_predisposition()- retrieves a list of genes of relevance for cancer predisposition, utilizing multiple resources, including CanVar-UK, Genomics England PanelApp, TCGA's PanCancer study, and manually contributed entries. -
get_panels()- retrieves a collection of > 40 different panels for various cancer conditions, as found in the Genomics England PanelApp.
Technically, each dataset comes as a list object in R with
- a
metadatadata frame that lists URLs, citations, and versions of underlying resources - a
recordsdata frame that contains the actual gene/transcript annotations
If you use the datasets provided with geneOncoX, make sure you properly cite the original publications of the resources integrated, and that you comply with the licensing terms:
- IntOGen - Martínez-Jiménez et al., Nat Rev Cancer, 2020 - CC0 1.0
- CancerMine - Lever et al., Nat Methods, 2019 - CC0 1.0
- Network of Cancer Genes - Repana et al., Genome Biol, 2019 - Open Access
- DNA repair genes database - Woods et al., Science, 2001 - Open Access
- dbNSFP - Liu et al., Genome Med, 2020 - Open Access
- Genomics England PanelApp - Martin et al., Nat Genet, 2019 - Commercial use requires separate agreement with GEL, see licensing terms
- GENCODE - Frankish et al., Nucleic Acids Res, 2021 - Open Access
sigven AT ifi.uio.no
