Feedback and ideas to improve dataset_description IDS #63
Replies: 10 comments 8 replies
-
Note that the idea there is to make quickly evolve ps: note that this IDS was already heavily modified between DD3 and DD4 |
Beta Was this translation helpful? Give feedback.
-
If uri is filled automatically when dataset_description is create, I think it is useful to know the original path the dataset was created in, even if the dataset moves around. But if uri is just another regular field that is filled by the user as the data is created, I don't see much utility in it and, like Prasad suggest, UUID would be more useful. |
Beta Was this translation helpful? Give feedback.
-
As currently define (current data entry URI) is see little to no interest in storing a path to self: if you can read this value you must know the URI/path for the data entry. Answering remark 4 above may help identify what's needed to replace this quantity. |
Beta Was this translation helpful? Give feedback.
-
The main usefulness I see is to know the origin of the data has it gets copied around. But for that, the URI is really not adequate, since there is no specification what a path means, and the path can change at any time, as a user decides to edit it. I think this is one of your points, Olivier, in your previous comment. I go back to UUID, that seems more useful. I can imagine how even experimental data will get a UUID assigned, so that it can be filled here. Of course, the UUID should be updated each time the user changes the data and saves it, and it is not clear how this will be implemented or enforced. It is possible to lose the one-to-one connection between the UUID and the data entry. Perhaps add an automated UUID deletion/creation process in the access layer? On the other hand, since we might be moving away from the need to have an access layer, we might have to add an independent tool that generates and writes an UUID into a data entry / dataset_description ... |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
I think this discussion jumped too quickly on details around the UUID or DOI notions. From exchanges so far in this thread I identify a first action of removal the non-clear or ill-purposed fields like I believe the second step will be to better define the role of the Considering that the rest of the structure expects to capture some info about either the pulse on a given experiment or about a simulation, I would find natural to group these two concerns as structures. By consequence I propose to group all pulse related data under a common struct
Current structuce (for reference)├── machine |
Beta Was this translation helpful? Give feedback.
-
We have had the discussion before: should the summary IDS contain information that is not stored any where else, or should it be based on data that is stored in other IDSes? For me as a provider of summary data for AUG pulses, the important fields are dataset_description.data_entry.user: dpc What to contact in the case of a problem As a provider of simulation results, the most important fields are dataset_description.simulation.workflow and potentially the comments_after |
Beta Was this translation helpful? Give feedback.
-
I think parent_entry should also have a URI field, and -- if we ever get FAIR compliant -- a DOI field. |
Beta Was this translation helpful? Give feedback.
-
To be continued as #83 |
Beta Was this translation helpful? Give feedback.
-
From the perspective of
We can, however, get this information from a different place (e.g. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
We need to discuss few fields in
dataset_description
and have opinion about them.Do we see purpose of storing
uri
within thedataset_description
IDS?https://imas-data-dictionary.readthedocs.io/en/latest/generated/ids/dataset_description.html#dataset_description-type
When a simulation is created, URI can point to the local path. However, unless explicitly updated, that URI remains unchanged—even if the data entry files are later moved or copied to a different location. As a result, the stored URI might no longer be valid or relevant.
Should we store GUID or something which can uniquely identify the simulation regardless of folder or URI?
I would appreciate your opinion on this.
Extending
data_type_identifier
withindataset_description
type
While scrapping through multiple yaml file which are used for storing data entries. We found
predictive
type, should we add it in thedata_type_identifier
? Currenty we haveexperimental
,simulation
https://imas-data-dictionary.readthedocs.io/en/latest/generated/identifier/data_type_identifier.html#identifier-utilities-data_type_identifier.xml
Addition of responsible person field in the dataset_description
In the
dataset_description
IDS there is a fieldprovider
within the structureids_properties
. Most of the times it is a Unix user name which is actually difficult to guess the person.I would recommend to have structure which specifies name of the responsible person, Simulation who executed the simulation and their email IDs and Organization. This would help users to connect the person who originally did the simulation
Thanks & regards,
Prasad
Beta Was this translation helpful? Give feedback.
All reactions