PDF Chatbot using LLM and Streamlit

This repository provides a simple, fully open-source chatbot application built with Streamlit that allows you to chat about the contents of PDF files using a large language model (LLM). The chatbot works on CPU and uses Retrieval-Augmented Generation (RAG) to answer questions based on information stored in PDF documents. This is achieved using LangChain for document processing and retrieval, FAISS for vector storage, and Hugging Face's transformers library for language model processing.

Features

Interactive Q&A: Ask questions related to the contents of PDF files, and get answers generated by an LLM.
PDF Document Retrieval: PDFs stored in the docs directory are loaded and processed for easy access during chats.
Conversational Memory (To be implemented): The chatbot maintains a chat history to provide contextually relevant responses within the conversation.
Streamlit Chat UI: Simple, intuitive interface using Streamlit, supporting conversation-based interaction with your PDF data.

Requirements

Python 3.8 or higher
Install the dependencies listed in requirements.txt using pip install -r requirements.txt. If you are working with the notebook, running the first block is sufficient.
16GB of RAM

How to use the Chatbot for your data

1. Setup PDF Data

Add the PDF files to a docs directory. The chatbot will load these PDFs, process the text, and create a vector store for retrieval. In the notebook example, I loaded and processed my results report from the Understand Myself personality test.

2. Run the Application

To start the chatbot: streamlit run chatbot.py

3. Ask Questions

Use the text input field to ask questions about the data within your PDF documents. The chatbot will retrieve relevant information and generate answers based on the content of your PDFs.

Code Overview

Model Loading: The LaMini-T5-738M model is loaded from Hugging Face as a text2text-generation pipeline.
PDF Loading and Text Splitting: PDFs are processed using the PyPDFLoader and split into smaller chunks with RecursiveCharacterTextSplitter.
Vector Store Creation: Text chunks are converted into embeddings and stored in a FAISS vector store for efficient retrieval.
Question-Answering Chain: A Conversational Retrieval Chain combines LLM responses with retrieved content to provide informed answers.
Streamlit Interface: The chatbot UI displays past user inputs and generated responses.

Known Limitations

Chat History: Currently, a full chat history mechanism is not implemented, but a placeholder chat_history is used for future enhancements. Model Resource Requirements: Ensure you have sufficient memory for running the LaMini-T5-738M model (I think 16GB of RAM should be sufficient).

Future Improvements

Complete Chat History: Implement a robust chat history that retains context across conversations.
Improved Document Processing: Enhance text chunking and embedding strategies for better context extraction.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
LICENSE		LICENSE
LaMini-T5-738M-RAG-LangchainFAISS.ipynb		LaMini-T5-738M-RAG-LangchainFAISS.ipynb
README.md		README.md
chatbot.py		chatbot.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PDF Chatbot using LLM and Streamlit

Features

Requirements

How to use the Chatbot for your data

1. Setup PDF Data

2. Run the Application

3. Ask Questions

Code Overview

Known Limitations

Future Improvements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

mihaicristianfarcas/LaMiniT5-Langchain-FAISS-RAG

Folders and files

Latest commit

History

Repository files navigation

PDF Chatbot using LLM and Streamlit

Features

Requirements

How to use the Chatbot for your data

1. Setup PDF Data

2. Run the Application

3. Ask Questions

Code Overview

Known Limitations

Future Improvements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages