Skip to content

pydantic validation error of JobResult calling LlamaParse.parser.parse() #688

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
brianb08 opened this issue Apr 18, 2025 · 1 comment
Open
Labels
bug Something isn't working

Comments

@brianb08
Copy link

Describe the bug
Today I started getting pydantic validation errors for some calls to LlamaParse.parser.parse()

...
pydantic_core._pydantic_core.ValidationError: 2 validation errors for JobResult
pages.1.images.0.original_width
  Field required [type=missing, input_value={'name': 'img_p1_1.png', ...9, 'y': 684.69894064375}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.11/v/missing
pages.1.images.0.original_height
  Field required [type=missing, input_value={'name': 'img_p1_1.png', ...9, 'y': 684.69894064375}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.11/v/missing

Files
Some example PDFs can be provided if necessary.

Job ID
Two example Job IDs showing this error:
ddab5846-6a36-48d3-916b-bacd47b96e42
8e1bb36d-385d-4023-904f-f3572b424e03

Client:
Using Python llama-parse 0.6.12

Additional context
Excerpts of client code:

...
        self.parser = LlamaParse(
            result_type="markdown",
            verbose=True,
            api_key=settings.LLAMA_CLOUD_API_KEY,
            show_progress=False,
            parsing_instruction=parsing_instruction,
            language="en",
            take_screenshot=True,
            auto_mode=True,
        )
...
        job_result: JobResult = self.parser.parse(
            pdf_file,
            { "file_name": db_document.name })
...
@brianb08 brianb08 added the bug Something isn't working label Apr 18, 2025
@ngallo1
Copy link

ngallo1 commented Apr 23, 2025

I am getting the same validation errors, also on validating images and properties original_width and original_height. Have you tried setting target_pages variable in your parse options to skip the problematic pages (page 1 in your case) to see how the rest of the document parses? When I did this, it failed to even parse the rest of the document--the resulting markdown text just said NO CONTENT HERE for most of the pages. This seems to suggest there is something about the format of the specific pdf that the parser doesn't like (beyond the pydantic validation errors). Not sure if there is a good work around for this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants