Skip to content

When parsing a PDF with markdown, some parts of text are getting lost. #690

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
gulabyang opened this issue Apr 23, 2025 · 0 comments
Open
Labels
bug Something isn't working

Comments

@gulabyang
Copy link

Describe the bug
I'm parsing a PDF and requesting a markdown, but in it some portions of text are missing (i.e. "A. 3% of a long sentence here", might be completely missed). When checking requesting text, everything is in place.

Files
API endpoint:

6.     Deductibles
       Each claim for loss or damage as insured against arising out of any one (1)
       Occurrence shall be adjusted separately and from the amount of each such
       adjusted loss shall be deducted the sum of $500,000 except;
       Earthquake:
           With respect to locations in Puerto Rico and the Pacific Northwest, the
           deductible shall be:
           A.  3% of the value of each separate building or structure involved in the loss
               and/or the contents of each separate building or structure involved in the
               loss at the time of loss;
           B.  3% of the net profits and continuing expenses attributable to the
               operations at each separate building or structure which has sustained
               loss or damage for the 12 month period immediately following the
               covered loss or damage, considering the most probable experience of the
               business over that period had no loss or damage occurred..
       Named Storm, including resulting Flood:
           ...... Rest of the content

next.js library getText()

6. Deductibles
Each claim for loss or damage as insured against arising out of any one (1) Occurrence shall be adjusted separately and from the amount of each such adjusted loss shall be deducted the sum of $500,000 except;

Earthquake:

With respect to locations in Puerto Rico and the Pacific Northwest, the deductible shall be:
Named Storm, including resulting Flood:
...... Rest of the content

Blocks A & B with 3% are just missing.

Job ID
97958a6b-2187-4d50-8bc5-57cf89f2e229

Client:
Please remove untested options:

  • Typescript Library

Additional context
Default parsing.

@gulabyang gulabyang added the bug Something isn't working label Apr 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant