diff --git a/pages/docs/configuration/librechat_yaml/object_structure/ocr.mdx b/pages/docs/configuration/librechat_yaml/object_structure/ocr.mdx index 4c2466b64..6ffb970aa 100644 --- a/pages/docs/configuration/librechat_yaml/object_structure/ocr.mdx +++ b/pages/docs/configuration/librechat_yaml/object_structure/ocr.mdx @@ -17,6 +17,7 @@ There are 4 main fields under `ocr`: - You only need the following environment variables to get started: `OCR_API_KEY` and `OCR_BASEURL`. - OCR functionality allows the application to extract text from images, which can then be processed by AI models. - The default strategy is `mistral_ocr`, which uses Mistral's OCR capabilities. +- Using the 'tika_ocr' strategy requires a baseURL, but not an API key - You can also configure a custom OCR service by setting the strategy to `custom_ocr`. - If using the default Mistral OCR, you may optionally specify a specific Mistral model to use. - Environment variable parsing is supported for `apiKey`, `baseURL`, and `mistralModel` parameters. @@ -83,7 +84,7 @@ ocr: @@ -95,4 +96,5 @@ ocr: **Available Strategies:** - `mistral_ocr`: Uses Mistral's OCR capabilities. +- `tika_ocr`: Uses Apache Tika & Tesseract - `custom_ocr`: Uses a custom OCR service specified by the baseURL. diff --git a/pages/docs/features/ocr.mdx b/pages/docs/features/ocr.mdx index 8c9a37c83..813649fbb 100644 --- a/pages/docs/features/ocr.mdx +++ b/pages/docs/features/ocr.mdx @@ -25,8 +25,9 @@ Currently, OCR is **only available as an agent capability**. This means you must OCR can be enabled in the LibreChat configuration file (`librechat.yaml`). The OCR configuration supports two strategies: -1. **Mistral OCR** (Default and currently the only available option) -2. **Custom OCR** (Planned for future releases) +1. **Mistral OCR** (Default) +2. **Tesseract OCR** (Using Apache Tika) +3. **Custom OCR** (Planned for future releases) ### Basic Configuration Example @@ -46,7 +47,7 @@ ocr: mistralModel: "mistral-ocr-latest" # Optional: Specify Mistral model, defaults to "mistral-ocr-latest" apiKey: "your-mistral-api-key" # Optional: Defaults to OCR_API_KEY env variable baseURL: "https://api.mistral.ai/v1" # Optional: Defaults to OCR_BASEURL env variable, or Mistral's API if no variable set - strategy: "mistral_ocr" # Optional: Defaults to "mistral_ocr" (only option currently available) + strategy: "mistral_ocr" # Optional: Defaults to "mistral_ocr", or can be set to "tika_ocr" ``` ## Mistral OCR @@ -69,6 +70,16 @@ Currently, LibreChat uses Mistral's OCR API as the default and only available OC - Maximum file size: 50 MB - Maximum document length: 1,000 pages +## Tika OCR + +Using Apache Tika & Tesseract allows self-hosted OCR + +### Key Features of Tika OCR + +- **Locally Hostable**: Use OCR without paying for an API +- **Tesseract 4**: Uses a neural net to recognize characters +- **Image Handling**: Extract text from image formats + ### Future Plans - Mistral plans to make their OCR API available through their cloud partners, such as GCP and AWS, and enterprise self-hosting for organizations with stringent data privacy requirements ([source](https://mistral.ai/fr/news/mistral-ocr)).