Releases: Unstructured-IO/unstructured-api
Releases · Unstructured-IO/unstructured-api
0.0.56
0.0.56
- Add
max_characters
param for chunking This param gives users additional control to "chunk" elements into larger or smaller CompositeElement
s
- Bump unstructured to 0.10.28
- Make sure chipperv2 is called whien
hi_res_model_name==chipper
0.0.55
- Bump unstructured to 0.10.26
- Bring parent_id metadata field back after fixing a backwards compatibility bug
- Restrict Chipper usage to one at a time. The model is very resource intense, and this will prevent issues while we improve it.
0.0.54
- Bump unstructured to 0.10.25
- Use a generator when splitting pdfs in parallel mode
- Add a default memory minimum for 503 check
- Fix an UnboundLocalError when an invalid docx file is caught
0.0.53
- Bump unstructured to 0.10.23
- Simplify the error message for BadZipFile errors
0.0.52
- Bump unstructured to 0.10.21
- Fix an unhandled error when a non pdf file is sent with content-type pdf
- Fix an unhandled error when a non docx file is sent with content-type docx
- Fix an unhandled error when a non-Unstructured json schema is sent
0.0.51
- Bump unstructured to 0.10.19
0.0.50
- Bump unstructured to 0.10.18
0.0.49
- Remove spurious whitespace in
app-start.sh
. This fixes deployments in some envs such as Google Cloud Run.
0.0.48
- Adds
languages
kwarg ocr_languages
will eventually be deprecated and replaced by lanugages
to specify what languages to use for OCR
- Adds a startup log and other minor cleanups
0.0.47
- Adds
chunking_strategy
kwarg and associated params These params allow users to "chunk" elements into larger or smaller CompositeElement
s
- Remove
parent_id
from the element metadata. New metadata fields are causing errors with existing installs. We'll readd this once a fix is widely available.
- Fix some pdfs incorrectly returning a file is encrypted error. The
pypdf.is_encrypted
check caused us to return this error even if the file is readable.