Skip to content

Process crawled JSON-LD to multiple levels, possibly using another library #4

@justinccdev

Description

@justinccdev

At the moment, bsbang-crawl does a very hokey top-level crawl of the JSON-LD captured. This only captures a very small amount of information, mainly because this was for proof of concept and even crawling a small amount is still useful.

However, this will need to become much more sophisticated in the long-term, crawling to some arbitrary depth of nested json-ld structures. We probably don't want to write this code ourselves (unless it's very easy) but use a library such as https://github.com/digitalbazaar/pyld if it has appropriate facilities.

Also need to check that this isn't obviated by Apache Nutch if we switch to that for crawling.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions