Skip to content

Merge layer descriptions spanning multiple pages #243

@TicaGit

Description

@TicaGit

Layer descriptions that span multiple pages in a PDF are currently split into separate layers. This results in incomplete entries, with unknown start or end depths, even though they represent a single continuous layer.

For example in I95RV03900_bp_19640403_Essertines-1.pdf, the layer 1942m–1952m spans pages 80–81. It is currently returned as two separate layers:

  • One with the text from page 80 with an unknown end depth
  • One with the text from page 81 with an unknown start depth

Ideally, we would recognise that this is a continuation of the same layer description and return all of the data as a single layer.

Proposed Solution:
Update assign_layers_to_boreholes() to:

Check if the last interval on a page has a known start depth but no end depth
Check if the first interval on the next page has a known end depth but no start depth
If both conditions are met, merge the intervals into a single layer.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions