Skip to content

Wrong borehole and metadata extraction, verb-based approach #226

@letao

Description

@letao

Continuing the investigations from #198 and #222, we should look at a way to better identify false positive boreholes (document pages that contain similar material descriptions but that are not themselves boreholes). For example: 35613_22.pdf

In this issue we should look into absence of verbs in the text as indication that the document in question is a borehole.

As an example we should aim to eliminate similar false positives to the file: 45004_8.pdf:

Image

Pages where data is extracted but aren't actual boreprofiles are available on the s3 bucket under false_positive_boreprofiles

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions