Skip to content

Conversation

@hnc-leebd
Copy link
Contributor

For the dataloader, I added a case that detects titles with the same font name, a large font size, and bold style, which are occasionally seen in papers.

Since bold style already receives bonus points for properness, I added logic to determine if a short, single-line text node is isolated and assign bonus points.

This logic requires determining the bounding box, which involves expanding the overall test code.

  • Improved heading detection by adding isolated bold case.
  • Update NodeUtilsTest to now also consider the bounding box of each text node.
  • Add AssertJ as a test-scoped dependency so fluent assertions are available.

Copy link
Contributor Author

@hnc-leebd hnc-leebd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will push the revised commit.

@hnc-leebd hnc-leebd force-pushed the improve/heading-detection branch from 4fc1dd2 to 4abd0fc Compare September 25, 2025 14:16
@hnc-leebd hnc-leebd force-pushed the improve/heading-detection branch from 4abd0fc to ce5cccd Compare September 25, 2025 14:18
@hnc-leebd
Copy link
Contributor Author

@MaximPlusov
I squashed the third commit and the new commit.
I've incorporated all of the feedback you gave me.
Thank you so much for the detailed review!

@MaximPlusov MaximPlusov merged commit 6b278ab into veraPDF:integration Sep 25, 2025
5 of 7 checks passed
@hnc-leebd hnc-leebd deleted the improve/heading-detection branch September 25, 2025 23:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants