Skip to content

Howto link up NIDMFSL metadata with other source #155

@mih

Description

@mih

First a bit of background. Here is a document on a tstats volume that I get from nidmfsl (I have cut bits that are irrelevant for now):

            {
              "@id": "niiri:d83e9fb8-1e39-42f9-9c46-d408335a0150",
              "@type": [
                "nidm_StatisticMap",
                "prov:Entity"
              ],
              "crypto:sha512": "bbcfe23bf0d12500bf5db37ccca98c45ef6393912680552fb3646c5a50afc02ea023063e2171a0350db8f1a3921fc2c5dc327e0c21e38b123d7df8dd7f673c5b",
              "dct:format": "image/nifti",
              "nfo:fileName": "TStatistic.nii.gz",
              "prov:atLocation": {
                "@type": "http://www.w3.org/2001/XMLSchema#anyURI",
                "@value": "TStatistic.nii.gz"
              },
              ...
            }

This looks great. However, I have more information that I can get on this file from other sources

{
    "@context": {...},
    "@graph": [
      {
        "@id": "6398025b6f926d7d31202e29020331a80093ee89",
        "@type": "Dataset",
        "contentbytesize": 11061996,
        "dateCreated": "2019-05-09T10:10:11+02:00",
        "dateModified": "2019-05-09T10:11:40+02:00",
        "hasContributor": {
          "@id": "40395047096243tr832094023407237402304972"
        },
        "identifier": "dda42d6c-7231-11e9-a901-0050b6902ef0",
        "version": "0-2-g6398025",
        "hasPart": [
          {
            "@id": "MD5E-s384103--fbc66513e45fd3319e17ba74421fb484.nii.gz",
            "@type": "DigitalDocument",
            "name": "down/cope1.feat/stats/tstat1.nii.gz"
            "contentbytesize": 384103,
         },
   ...
}

And even more from extractors for the NIfTI data type, from a provenance extractor that tells me which command generated this file etc etc. Of course I have the desire to connect this information. Question is: how can I do this with minimal effort and interference. Without considering any changes I would have to compute SHA512 on all files in my dataset and then fish out documents with a matching crypto:sha512 property, and add some kind of sameAs property pointing to the alternative @id for the same file.... Doable but not nice, given the context (light pun intended ;-).

The context is: this is a DataLad dataset. All files and their versions are uniquely identified already, and also the datasets that contain them. I want DataLad to be able to tell information on a file that I can get from nidmfsl.

I see two ways of making this happen: 1) teach nidmfsl about DataLad and its IDs so it can use them without having to come up with new random ones (I dont think this is the most feasible approach, but would certanly be an amazing one!). 2) Teach NIDMFSL not to build a metadata pack, but leave all files exactly where they are and make it use relative paths (relative to a given root -- the dataset location in my case) as nfo:fileName. As the metadata extraction is guaranteed to happen on the same content, I can easily match files by relative path.

This second approach would also solve another issue: I am not interested in the result package with the files renamed to a standardized form. I actually throw it away. DataLad already gives me the means to obtain exactly what I need in terms of file content -- once I know the file key -- hence my interest of focused on connecting information to this file key (among other desires). With this information I can have a simple helper generate me NIDM result packages just based on the pre-extracted metadata. So at present, properties like "nfo:fileName": "TStatistic.nii.gz" are actually invalid for a description of dataset content.

So it would be nice to have a "just describe" mode, in addition to the "pack and describe" mode that I can currently use.

Does that make sense? I'd love to get pointers on the direction in which I should be looking/moving. Would be great if I could have everything implemented till OHBM.,

Thx much!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions