This project is a Python-based Streamlit application that processes and analyzes unstructured data from PDFs and images using Google Gemini’s Large Language Model (LLM) API. It extracts text from PDFs and images and answers user queries based on the extracted data.
- Multiple File Uploads: Users can upload multiple PDFs and images.
- Text Extraction: Extract text from PDFs and images using PyMuPDF (fitz) and Tesseract.
- Query Processing: Use Google Gemini’s generative AI to answer user queries based on the extracted text.
- User-Friendly Interface: Streamlit-powered UI for easy file upload and query submission.
Before you begin, ensure you have the following installed:
- Python 3.8+
- Streamlit
- Tesseract
- Google Gemini API Access
git clone https://github.com/your-username/gemini-multi-file-processor.git
cd gemini-multi-file-processor
Here's the given content rewritten in Markdown (`.md`) format:
```markdown
## Environment Variables
To use Google Gemini’s API, you'll need to set up environment variables. The project uses `dotenv` to manage these.
### Create a `.env` file in the project directory:
```bash
touch .env
GEMINI_API_KEY=your-google-gemini-api-key
streamlit run app.py
http://localhost:8501
- Upload your PDF and image files using the interface.
- Input your query (e.g., "What is the main topic of the handwritten notes?").
- The application will process the files, extract the text, and provide answers based on the extracted data.
- File Upload: Users upload PDFs and images through the Streamlit interface.
- Text Extraction:
- PDFs: The application uses
PyMuPDF
(fitz) to extract images embedded in PDFs andTesseract
for Optical Character Recognition (OCR) to extract text from the images and PDFs.
- PDFs: The application uses
- Knowledge Base Creation: All extracted text is compiled into a centralized knowledge base.
- Gemini API Querying: The knowledge base and user query are sent to Google Gemini’s LLM API, which generates answers based on the data.
- Response Display: The generated response is shown in the Streamlit interface.
This `.md` content is properly formatted for inclusion in a `README.md` or documentation file.