-
Notifications
You must be signed in to change notification settings - Fork 4
Description
Layer descriptions that span multiple pages in a PDF are currently split into separate layers. This results in incomplete entries, with unknown start or end depths, even though they represent a single continuous layer.
For example in I95RV03900_bp_19640403_Essertines-1.pdf, the layer 1942m–1952m spans pages 80–81. It is currently returned as two separate layers:
- One with the text from page 80 with an unknown end depth
- One with the text from page 81 with an unknown start depth
Ideally, we would recognise that this is a continuation of the same layer description and return all of the data as a single layer.
Proposed Solution:
Update assign_layers_to_boreholes() to:
Check if the last interval on a page has a known start depth but no end depth
Check if the first interval on the next page has a known end depth but no start depth
If both conditions are met, merge the intervals into a single layer.