AI Chat with PDF is a powerful application that allows you to have natural conversations with your PDF documents. Using state-of-the-art language models and HuggingFace embeddings, the application can understand and answer questions about the content of your PDF files in a conversational manner.
- 📄 Upload and process multiple PDF documents simultaneously
- 💬 Interactive chat interface with conversation history
- 🔍 Semantic search using HuggingFace embeddings
- 🧠 Powered by LangChain and HuggingFace models
- 🚀 Modern Streamlit-based web interface
- 🌍 Multilingual support (French/English)
- Python 3.8 or higher
- Poetry (recommended) or pip
- HuggingFace API key (optional, for some models)
-
Clone the repository:
git clone https://github.com/djili/aichatpdf.git cd aichatpdf
-
Install dependencies using Poetry:
poetry install
Or using pip:
pip install -r requirements.txt
-
Set up environment variables:
- Copy
.env.example
to.env
- Configure your preferred models and API keys:
# For OpenAI models (optional) OPENAI_API_KEY=your_openai_key # For HuggingFace models (recommended) HUGGINGFACEHUB_API_TOKEN=your_hf_token
- Copy
-
Start the application:
poetry run streamlit run app.py
Or with pip:
streamlit run app.py
-
Open your web browser and navigate to
http://localhost:8501
-
Upload a PDF file using the file uploader
-
Start chatting with your document by typing questions in the chat interface
- The application processes your PDF document and extracts the text content
- The text is split into manageable chunks using recursive text splitting
- These chunks are converted into vector embeddings using HuggingFace's instructor-xl model
- When you ask a question, the system performs a semantic search to find the most relevant text chunks
- The conversation history and relevant context are used to generate a coherent response
- The chat interface maintains conversation history for context-aware responses
- Streamlit - Web application framework
- LangChain - Framework for developing applications with LLMs
- LangChain - Framework for LLM applications
- HuggingFace - For embeddings and language models
- FAISS - Efficient similarity search
- PyPDF2 - PDF text extraction
- Streamlit - Web application framework
- Sentence Transformers - For generating embeddings
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
- Built with ❤️ using amazing open-source libraries
- Inspired by the growing ecosystem of AI-powered document processing tools
Note: Make sure to handle sensitive documents appropriately and be aware of the data you're processing through the application.