Transform PDF tables into HTML with the power of Gemini 2.5 that detects the layout and content and transforms into a viewable HTML file
This experimental tool leverages Google's Gemini 2.5 Flash Preview model to parse complex tables from PDF documents and convert them into clean HTML that preserves the exact layout, structure, and data.
test.mp4
PDF tables are notoriously difficult to extract accurately. Standard conversion tools often produce:
- Misaligned columns and rows
- Lost formatting and merged cells
- Garbled text and numbers
- Completely broken layouts
This tool achieves ~80% layout accuracy while maintaining nearly 100% data accuracy for most tables.
- Preserves Complex Table Structures - Handles merged cells, nested headers, and multi-line content
- Maintains Visual Fidelity - Recreates the visual appearance of tables with proper CSS
- Extracts Text with High Accuracy - Particularly effective with numerical data
- Direct PDF Processing - Sends PDF data directly to the model without intermediary conversions
- Thinking Mode - Uses Gemini's unique thinking capability for improved analysis
- Token Usage Reporting - Tracks processing efficiency
This project explores how AI models understand and parse structured PDF content. Rather than using OCR or traditional table extraction libraries, this tool gives the raw PDF to Gemini and uses specialized prompting techniques to optimize the extraction process.
- Clone this repository:
git clone https://github.com/lesteroliver911/gemini-pdf-table-extractor
cd gemini-pdf-table-extractor- Install the required dependencies:
pip install -r requirements.txt- Set up your Google API key:
- Create a
.envfile in the project root directory - Add your Google API key:
GOOGLE_API_KEY=your_api_key_here
- Create a
Basic usage:
python main.py path/to/your/document.pdfThis will generate an HTML file with the same name in the same directory.
Advanced options:
python main.py path/to/your/document.pdf --output custom_output.html --thinking-budget 24000Arguments:
--output: Specify a custom output file path--thinking: Enable thinking mode (default: True)--thinking-budget: Set the thinking token budget (default: 24000)--prompt: Provide a custom prompt for conversion
This project is an exploration of AI-powered PDF parsing capabilities. While it achieves strong results for many tables, complex documents with unusual layouts may present challenges. The extraction accuracy will improve as the underlying models advance.
