Skip to content

Conversation

yuming-long
Copy link
Contributor

Summary

saw error for pinecone

500: Error in uploader - [UnstructuredIngestError] http error: (400) .... Vector dimension 1536 does not match the dimension of the index 1024

I believe this should be a 400

so I need to raise UnstructuredIngestError here

unfortunately its defined twice:

from starlette.responses import RedirectResponse
from unstructured_ingest.data_types.file_data import BatchFileData, FileData, file_data_from_dict
from unstructured_ingest.error import UnstructuredIngestError
from unstructured_ingest.errors_v2 import UnstructuredIngestError as UnstructuredIngestErrorV2
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i really don't get why we have a errors_v2 ////
cc @potter-potter ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

errors v2 was when we started to move toward the new error system.
I think it might be there now to be backward compatible. if i recall in someones pr it was removed and caused all sorts of havoc because it was in lots of repos.

@potter-potter
Copy link

@claude please review

@potter-potter
Copy link

@claude please review

claude you lazy sucker

@potter-potter potter-potter requested a review from Copilot October 1, 2025 17:59
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds proper error handling for UnstructuredIngestError exceptions in the ETL API, ensuring that HTTP status codes are correctly propagated instead of defaulting to 500. The change addresses an issue where Pinecone validation errors (400) were being returned as 500 errors.

  • Imports both versions of UnstructuredIngestError to handle the duplicate definitions
  • Adds specific exception handling to catch and properly format these errors with their status codes
  • Includes comprehensive test coverage for various error scenarios

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
unstructured_platform_plugins/etl_uvicorn/api_generator.py Adds exception handling for UnstructuredIngestError types with proper status code propagation
unstructured_platform_plugins/version.py Version bump to 0.0.41
test/assets/exception_status_code.py Adds test helper functions for UnstructuredIngestError scenarios
test/api/test_api.py Comprehensive test coverage for UnstructuredIngestError handling

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

)
except (UnstructuredIngestError, UnstructuredIngestErrorV2) as exc:
logger.error(
f"UnstructuredIngestError: {str(exc)} (status_code={exc.status_code})",
Copy link

Copilot AI Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message 'UnstructuredIngestError:' is misleading when the caught exception might be UnstructuredIngestErrorV2. Consider using a generic message like 'Unstructured ingest error:' or dynamically include the exception type name.

Suggested change
f"UnstructuredIngestError: {str(exc)} (status_code={exc.status_code})",
f"{exc.__class__.__name__}: {str(exc)} (status_code={exc.status_code})",

Copilot uses AI. Check for mistakes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v2 removed so should be fine here

Comment on lines 185 to 187
error = UnstructuredIngestErrorV2(
"Async gen test UnstructuredIngestErrorV2 with None status_code"
)
Copy link

Copilot AI Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The string literal is split across lines unnecessarily. Consider keeping the error message on a single line for better readability, or if line length is a concern, use implicit string concatenation or a shorter variable name.

Suggested change
error = UnstructuredIngestErrorV2(
"Async gen test UnstructuredIngestErrorV2 with None status_code"
)
error = UnstructuredIngestErrorV2("Async gen test UnstructuredIngestErrorV2 with None status_code")

Copilot uses AI. Check for mistakes.

@potter-potter
Copy link

@yuming-long I think we can remove the v2 stuff since it is just for backward compatibility. ya?

This reverts commit 368a02f.
@yuming-long
Copy link
Contributor Author

done ^ - i realize the v2 class is never used :)

Copy link

@potter-potter potter-potter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

@yuming-long yuming-long merged commit c4e21cc into main Oct 1, 2025
9 checks passed
@yuming-long yuming-long deleted the yuming/throw_ingest_error branch October 1, 2025 20:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants