Pdf Query chat-bot using Gemini Pro model and Llama Index
Gemini-File is a Streamlit web application that allows users to upload PDF files, index their contents using the Gemini search engine from the Llama-Index library, and query the documents.
Gemini-File.Preview.mp4
- Upload PDF files for indexing.
- Perform text queries on the indexed documents.
- Powered by the Gemini Pro model and Hugging Face embeddings.
!! Strongly Recommend running this code while connected to GPU !!
Before you begin, ensure you have the following installed:
- Python (>=3.6)
- Streamlit
- Llama-Index library
- Google API key (set as an environment variable)
You can get this Google gemini APi key from Google AI Developer Website , you can easily signup and get one for free.
The Google API key is set as an environment variable. Ensure it is correctly configured before running the app.
-
Clone the repository:
git clone https://github.com/AjayK47/Gemini-File.git
-
Install dependencies:
pip install -r requirements.txt
- Run the Streamlit app:
streamlit run app.py
- Access the app in your web browser.
- Use the "Upload your PDF" button to upload a PDF file.
-
It takes some time to index your file to database or storage depending on size of your file.
-
Click on the search or submit button to perform the query., it will produce a Response.
You can customize the embedding model used for document indexing. Edit the 'app.py' file and modify the 'HuggingFaceEmbedding' instantiation:
# Example using a different Hugging Face model
embed_model_custom = HuggingFaceEmbedding(model_name="your/own-model-name")
you can find best text embedding model for you with help of MTEB Leaderboard
Contributions are encouraged! Fork the repository, create a feature branch, make changes, push to the branch, and open a pull request
-
Use Open Source Embedding Models: Explore integrating open-source embedding models instead of relying on proprietary models like Gemini API.
-
Improved UI/UX: Enhance the user interface and experience for better usability.
-
Scalability: Optimize the application for large document collections and improve search speed.
-
Dockerization: Provide a Docker container for easy deployment.