Skip to content

lcgroff2/HepaRG_sql

Repository files navigation

SQL scripting in R for insertion of MetSim JSON formatted NoSQL/MongoDB metabolic predictions into the MySQL database VSSTox generated by Charles N. Lowe.

  • Full R Script for conversion of JSON formatted unstructured predicted metabolites into MySQL relational database format, followed by insertion into database. (DB connection URLs removed from document for security)
  • Requires JSON UTF-8 formatted input files (Example), which process the unstructured metabolism data via the R-script into MySQL relational database formatting. Example of MongoDB JSON output structure given below for one parent and two of its phase I metabolite children:
    {"datetime": "2025-03-14_12h55m10s",
     "software": "OASIS TIMES",
     "version": "2.31.2.82",
     "params": {"depth": 3,
                "organism": "Rat",
                "site_of_metabolism": false,
                "model": ["In Vitro Rat Liver S9 v12.18"]
               },
     "input": {"smiles": "C1=CC=C2C(=C1)C=C1C=CC3=CC=CC4=CC=C2C1=C34",
               "casrn": "50-32-8",
               "hcd_smiles": "C1=CC2C=C3C4C5C(C=CC=4C=2C=C1)=CC=CC=5C=C3",
               "inchikey": "FMMWHPNWAFZXNH-UHFFFAOYNA-N",
               "dtxsid": "DTXSID2020139",
               "chem_name": "Benzo[a]pyrene",
               "likelihood": null
              },
    "output": [{"precursor": {"smiles": "c1cccc2cc3ccc4cccc5ccc(c12)c3c45",
                              "casrn": "50-32-8",
                              "hcd_smiles": "C1=C2C(C=C3C4C5=C(C=CC2=4)C=CC=C5C=C3)=CC=C1",
                              "inchikey": "FMMWHPNWAFZXNH-UHFFFAOYNA-N",
                              "dtxsid": "DTXSID2020139",
                              "chem_name": "Benzo[a]pyrene",
                              "likelihood": null
                             },
                "successors": [{"enzyme": "[phase I]",
                                "mechanism": "Arene Epoxide Formation|PAH Arene Epoxide Formation",
                                "metabolite": {"smiles": "C1=CC2C(c3cc4ccc5cccc6ccc(c13)c4c56)O2",
                                               "casrn": "36504-65-1",
                                               "hcd_smiles": "C1C2=C(C=C3C4C5=C(C=CC2=4)C=CC=C5C=C3)C2OC2C=1",
                                               "inchikey": "OLLMQFHYRYHKTD-UHFFFAOYNA-N",
                                               "dtxsid": "DTXSID80957806",
                                               "chem_name": "6b,7a-Dihydrobenzo[1,12]tetrapheno[8,9-b]oxirene",
                                               "likelihood": null
                                              },
                                "generation": 1
                               },
                               {"enzyme": "[phase I]",
                                "mechanism": "Aromatic C-Hydroxylation|PAH Aromatic C-Hydroxylation",
                                "metabolite": {"smiles": "Oc1c2ccccc2c2ccc3cccc4ccc1c2c34",
                                               "casrn": "33953-73-0",
                                               "hcd_smiles": "OC1C2=C3C4C(C=C2)=CC=CC=4C=CC3=C2C=1C=CC=C2",
                                               "inchikey": "MGTMTTGZMPDIQS-UHFFFAOYNA-N",
                                               "dtxsid": "DTXSID30955534",
                                               "chem_name": "Benzo[pqr]tetraphen-6-ol",
                                               "likelihood": null
                                              },
                                "generation": 1
                               }
                              ]
              }]
    }
  • Example SQL Test Query Script for retrieving Distributed Structure-Searchable Toxicity (DSSTox) Database Substance Identifier (DTXSID, sid in code) based on InChIKey structural identifiers with commands from MariaDB R Package
sid_from_inchikey <- function(inchikey) {
  #try MySQL query on the inchikey derived from SMILES result of in silico tool:
  new_sid <- dbGetQuery(DSSTox,paste0("select gs.dsstox_substance_id
                                       from generic_substances gs
                                       left join generic_substance_compounds gsc on gsc.fk_generic_substance_id = gs.id
                                       left join compounds c on c.id = gsc.fk_compound_id where c.jchem_inchi_key = '",inchikey,"'"))
  if (nrow(new_sid) == 1){
    return(new_sid)
  } else if (nrow(new_sid) == 0)
{
    #try MySQL query on <inchikey first block>-UHFFFAOYSA-N (standard InChIKey mapping for no stereochemistry - neutral charge):
    new_sid <- dbGetQuery(DSSTox,paste0("select gs.dsstox_substance_id
                                         from generic_substances gs
                                         left join generic_substance_compounds gsc on gsc.fk_generic_substance_id = gs.id
                                         left join compounds c on c.id = gsc.fk_compound_id
                                         where c.jchem_inchi_key = '",str_split(inchikey,"-")[[1]][1],"-UHFFFAOYSA-N'"))
}
  if (nrow(new_sid) == 1){
    return(new_sid)
  } else if (nrow(new_sid) == 0)
{
    #try MySQL query on <inchikey first block>-UHFFFAOYNA-N (Nonstandard InChIKey mapping for no stereochemistry - neutral charge):
    new_sid <- dbGetQuery(DSSTox,paste0("select gs.dsstox_substance_id from generic_substances gs
                                         left join generic_substance_compounds gsc on gsc.fk_generic_substance_id = gs.id
                                         left join compounds c on c.id = gsc.fk_compound_id
                                         where c.jchem_inchi_key = '",str_split(inchikey,"-")[[1]][1],"-UHFFFAOYNA-N'"))
    return(new_sid)
  }

}

The results of processing each DTXSID through VSSTox turns the unstructured database into the following relational output:

About

SQL scripting in R for insertion of MetSim metabolic predictions into MySQL database.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages