Skip to content

kimtth/azure-document-intelligence-vs-markitdown-vs-tika

Repository files navigation

Azure Document Intelligence vs Markitdown vs Tika

A comparative analysis of document processing using Azure Document Intelligence ☁️, Markitdown 📝, and Apache Tika 🦅.

Overview 🔍

This project benchmarks and compares the capabilities of three document intelligence solutions:

  • Azure Document Intelligence ☁️: Cloud-based AI-powered document analysis by Microsoft Azure. (Layout model) Documentation
  • Markitdown 📝: An open-source tool for extracting and converting document content. git
  • Apache Tika 🦅: A content analysis toolkit for extracting metadata and text from various documents. git

Input

The sample.pdf in the input/ directory is used for the input.

Output

The extracted results are stored in the output/ directory.

License

MIT License.

About

PDF extraction samples comparing Azure Document Intelligence (layout model) 🏢 vs Markitdown ✍️vs Apache Tika

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages