-
Notifications
You must be signed in to change notification settings - Fork 11
Description
I implement a dataset-level metadata extractor. I think I need to be able to report multiple, individual metadata records. In principle, one be able to build these records in a way that they can be reported in a nested fashion (thereby reporting just a single object). However, in my case I have no control over the nature of these documents, and they might be linked (or not) in different ways.
What is a desirable approach here?
- an arbitrary top-level key that maps onto an array?
- a JSON-LD style
@graph
top-level key (as a realization of the above)? - something else?
Related: We might be talking about a lot of stuff to return. If I see things correctly, I need to load multiple standalone records into memory (many), report them via immediate_data as a single dict, such that they can be written out as JSON (again). I am yet to understand why meta-extract
turns a single return value of type ExtractorResult
into a result record, rather than dealing with result records directly. This would make the standard machinery of seemlessly switching between return values and generator yields applicable to metadata extractors too