The PDF Insight Extractor is a Streamlit-based web application designed to analyze PDF documents and extract insights from text, images, and tables. It utilizes the OpenAI GPT model to process multimodal inputs and generate accurate responses to user queries.
- 📄 Upload PDF files for processing.
- 🔍 Extract insights from:
- Text: Understand and analyze textual content.
- Images: Extract context and meaning from embedded visuals.
- Tables: Retrieve structured data from tables.
- 💬 Ask questions about the document's content and get detailed responses.
- Python 3.8 or above
- Pip for package management
- OpenAI API key (stored in a
.env
file)
-
Clone the Repository
git clone https://github.com/Harshita1195/pdf-insight-extractor.git cd pdf-insight-extractor
-
Install Dependencies
pip install -r requirements.txt
-
Set OpenAI API Key
- Create a
.env
file in the project directory. - Add the following line:
OPENAI_API_KEY=your_openai_api_key
- Create a
-
Run the Application
streamlit run app.py
-
Open the provided local URL in your web browser.
-
Upload a PDF File
- Drag and drop or select a PDF file via the file uploader.
-
Process the PDF
- The application converts each page into a base64-encoded image for analysis.
-
Ask a Query
- Enter a query in natural language, such as:
- "What data is presented in the table on page 2?"
- "Summarize the text on page 1."
- "Describe the image on page 3."
- Click "Submit Query" to receive a detailed response.
- Enter a query in natural language, such as:
- PDF to Image Conversion:
Converts PDF pages to base64-encoded images using the
fitz
library for processing with OpenAI's GPT model. - Query Handling:
Processes user queries using the LangChain OpenAI integration (
ChatOpenAI
). - Streamlit Interface:
- Provides an intuitive user interface for uploading PDFs and entering queries.
- Highlights key capabilities for text, image, and table extraction.
- Streamlit: For building the web interface.
- PyMuPDF (fitz): For PDF processing.
- Pillow: For image handling.
- LangChain: For OpenAI GPT model integration.
- dotenv: For environment variable management.
This project is licensed under the MIT License. See the LICENSE
file for details.