ChatterDocs is a smart tool that lets you talk to your PDF files like you're chatting with a person. It breaks down the document into parts, figures out what each part means, and helps find the most relevant answers to your questions using a powerful AI brain called Mistral 7B, which runs locally using Ollama.
- AI-Powered Q&A: Ask questions about any PDF and receive contextually relevant answers.
- TF-IDF Based Search: Efficiently ranks document chunks based on semantic similarity.
- Fast & Lightweight: Upgraded to Mistral 7B, improving performance by ~35–45% over LLaMA2.
- Simple API Interface: Communicate using easy-to-use REST endpoints.
- Custom Chunking & Embeddings: Adjustable chunk size and tokenization logic for precision.
- Offline-Ready: Works with locally hosted models via Ollama—no external APIs required.
Before setting up ChatterDocs, ensure that you have the following:
-
Node.js: The project is built using Node.js. You can check if you have it installed by running:
node -v
If you don't have it installed, download it from here.
-
Llama2 Model: The application uses the Llama2 model for processing document queries. Ensure you have
ollama
set up on your machine and the model is running.You can install
ollama
using:npm install -g ollama
Once installed, run the Mistral 7B model:
ollama run mistral
-
Embeddings File: The system uses pre-processed embeddings of your documents. Ensure that you have a
embeddings/embeddings.json
file with the document embeddings saved in it.
-
Clone the Repository:
Clone the ChatterDocs repository to your local machine:
git clone https://github.com/Shivarora22/chatterdocs.git cd chatterdocs
-
Install Dependencies:
Install all the necessary dependencies:
npm install
Once you have the dependencies installed, follow these steps to run the project:
-
Start the Server:
Use the following command to start the project:
node app.js
This will launch the server locally on
http://localhost:3000
. -
Start Mistral 7B Model:
Ensure that you have Mistral 7B running locally (via
ollama run mistral
). This will handle the processing of queries. -
Project Structure:
Create two directories inside: "embeddings" and "data" and add your pdf file inside data directory.
Upload a PDF and automatically create embeddings from its content.
Content-Type: multipart/form-data
Key | Type | Description |
---|---|---|
file | File | PDF file to upload |
This is the main endpoint that will receive queries and return contextually relevant answers from your documents.
{
"question": "Your_Question_Here"
}
curl -X POST http://localhost:3000/embed \
-F "file=@./data/your-document.pdf"
curl -X POST http://localhost:3000/ask \
-H "Content-Type: application/json" \
-d '{"question": "Summarize the introduction"}'