-
Notifications
You must be signed in to change notification settings - Fork 0
Browsing resources and releases
In order to easily retrieve genomes, samples, datasets, and clusters information from MetaRefSGB, we provide an inspector tool that is already integrated into the pipeline and can be called by running the following command on your terminal:
MetaRefSGB --inspect --genome=663737656 --db=~/db --release=Jan21
In particular, the last command will search for the MetaRefSGB Unique Genome Identifier 663737656
into the Jan21 release and will print the results on screen as a dictionary, as reported below.
{
"hits": [
{
"category": "Metagenome-assembled Genome",
"closest_references": [
"152744499"
],
"completeness": "96.64",
"contamination": "1.34",
"dataset_id": "AsnicarF_2020",
"ecosystem": "Host-associated",
"ecosystem_category": "Human,Mammals",
"ecosystem_subtype": "Gut",
"ecosystem_type": "Digestive system",
"fgb": "FGB1476",
"ggb": "GGB3740",
"mag_id": "AsnicarF_2020__833__bin.16",
"metarefsgb_id": "663737656",
"notes": null,
"sample_id": "833",
"sgb": "SGB5075",
"sgb_centroid": "384699434",
"specific_ecosystem": "Fecal",
"strain_heterogeneity": "50.0"
}
]
}
Remember that all the releases are linked together. This means that when you specify a release, all the previous releases up to the specified one will be loaded.
Similarly, the inspector can be used by also specifying a sample (e.g. --sample=833
), a dataset (e.g. --dataset=AsnicarF_2020
), or a cluster (e.g. --cluster=SGB5075
, --cluster=GGB3740
, --cluster=FGB1476
).
Remember to always add the --db
argument followed by the path to the main folder of the MetaRefSGB database, which is the same used for running the main pipeline of assigning new genomes to SGBs.
In case you need to run the inspector a multitude of IDs, you are encouraged to use the --file
argument followed by the path to a one-column file with a predefined list of genomes, samples, datasets, or clusters. Please note that the first line of this file must contain a header that describe the data in your list. For instance, if you need to run the inspector on a list of MetaRefSGB Unique Genome Identifiers, you can run the following command:
MetaRefSGB --inspect --file=~/mygenomes.txt --db=~/db --release=Jan21
Where ~/mygenomes.txt
should look like the following snippet:
# metarefsgb_id
663737656
841942266
618704549
Remember to change the header with sample_id
or dataset_id
in case you need to search for multiple samples or datasets. The --file
option does not work in case you need to search for clusters.
It is worth noting that the output will be printed on screen. To redirect the output, please use the --output
argument as shown below:
MetaRefSGB --inspect --genome=663737656 --db=~/db --release=Jan21 --output=~/663737656.json
This tool can also be used to inspect the MetaRefSGB Data Models (MDM) in order to help the contributors understand how to share their data. You can inspect the MDM by running:
MetaRefSGB --inspect --schema=MAG
This will print on screen the content of the MAG
model. In order to inspect also the genome
and metadata
models, just replace MAG
with genome
or metadata
. Please have a look at the MDM Schema section of this Wiki for a deep explanation about how we manage data in MetaRefSGB.