A comparative analysis of document processing using Azure Document Intelligence ☁️, Markitdown 📝, and Apache Tika 🦅.
This project benchmarks and compares the capabilities of three document intelligence solutions:
- Azure Document Intelligence ☁️: Cloud-based AI-powered document analysis by Microsoft Azure. (Layout model) Documentation
- Markitdown 📝: An open-source tool for extracting and converting document content. git
- Apache Tika 🦅: A content analysis toolkit for extracting metadata and text from various documents. git
The sample.pdf
in the input/
directory is used for the input.
The extracted results are stored in the output/
directory.
MIT License.