Use the Mistral OCR for processing PDFs and images to markdown.
Install the required dependencies:
pip install mistralai
- Set up your Mistral API key
- Edit environment variable and replace "Your_API_KEY"
- Run the script
python mistral_ocr.py
The script will create a folder named ocr_results_[PDF filename]
in the working directory, containing:
complete.md
: The extracted text content in markdown formatimages/
: (if images are found): A directory containing any extracted images