Skip to content

How to report multiple documents on extract #391

@mih

Description

@mih

I implement a dataset-level metadata extractor. I think I need to be able to report multiple, individual metadata records. In principle, one be able to build these records in a way that they can be reported in a nested fashion (thereby reporting just a single object). However, in my case I have no control over the nature of these documents, and they might be linked (or not) in different ways.

What is a desirable approach here?

  • an arbitrary top-level key that maps onto an array?
  • a JSON-LD style @graph top-level key (as a realization of the above)?
  • something else?

Related: We might be talking about a lot of stuff to return. If I see things correctly, I need to load multiple standalone records into memory (many), report them via immediate_data as a single dict, such that they can be written out as JSON (again). I am yet to understand why meta-extract turns a single return value of type ExtractorResult into a result record, rather than dealing with result records directly. This would make the standard machinery of seemlessly switching between return values and generator yields applicable to metadata extractors too

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions