Skip to content

dangkv/private-gpt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

8 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ”’ PrivateGPT - Your Personal AI Assistant

PrivateGPT Logo Python License

Chat with your documents privately, locally, and securely ✨

Features β€’ Installation β€’ Usage β€’ Architecture β€’ Contributing


🌟 What is PrivateGPT?

PrivateGPT transforms your personal documents into an intelligent, conversational AI assistant. Unlike cloud-based solutions, everything runs locally on your machine, ensuring your sensitive data never leaves your control.

πŸš€ Why PrivateGPT?

  • πŸ” Complete Privacy: Your documents never leave your machine
  • πŸ“š Smart Document Understanding: Chat with PDFs, Word docs, and text files
  • ⚑ Real-time Responses: Streaming responses for better user experience
  • 🎯 Context-Aware: Provides accurate answers based on your documents
  • πŸ› οΈ Easy Setup: Simple installation and intuitive interface

✨ Features

πŸ” Intelligent Document Processing

  • Supports multiple formats: PDF, DOCX, DOC, TXT
  • Automatic text chunking for optimal retrieval
  • Vector embeddings for semantic search

πŸ’¬ Natural Conversation

  • Streamlit-based chat interface with design first philosophy
  • Real-time streaming responses
  • Context-aware conversations
  • Emoji support for better readability

πŸ—οΈ Robust Architecture

  • RAG (Retrieval-Augmented Generation) pipeline
  • ChromaDB for vector storage
  • Ollama for local LLM inference
  • Modular and extensible design

πŸ› οΈ Installation

Prerequisites

Before getting started, ensure you have the following installed:

  • Python 3.8+ 🐍
  • Ollama (for local LLM inference) πŸ€–
  • Homebrew (for macOS users) 🍺

Step 1: Install Ollama

# macOS
brew install ollama

# Linux
curl -fsSL https://ollama.ai/install.sh | sh

# Windows (using WSL)
curl -fsSL https://ollama.ai/install.sh | sh

Step 2: Download Required Models

# Download the embedding model
ollama pull nomic-embed-text

# Download the language model
ollama pull mistral

Step 3: Clone and Setup PrivateGPT

# Clone the repository
git clone https://github.com/dangkv/privateGPT.git
cd privateGPT

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Step 4: Create requirements.txt

streamlit>=1.28.0
langchain>=0.0.350
langchain-community>=0.0.10
chromadb>=0.4.15
ollama>=0.1.0
pypdf>=3.17.0
python-docx>=0.8.11
docx2txt>=0.8

πŸ“ Data Folder Structure

Understanding the data folder structure is crucial for setting up PrivateGPT:

data/
β”œβ”€β”€ raw/                    # πŸ“„ Your unstructured documents go here
β”‚   β”œβ”€β”€ document1.pdf
β”‚   β”œβ”€β”€ report.docx
β”‚   β”œβ”€β”€ notes.txt
β”‚   └── subfolder/
β”‚       └── more_docs.pdf
β”œβ”€β”€ processed/              # πŸ”§ Processed document chunks (auto-generated)
└── chroma_db/             # πŸ—„οΈ Vector database storage (auto-generated)
    β”œβ”€β”€ chroma.sqlite3
    └── index/

πŸ“‹ Setting Up Your Documents

  1. Create the data directory (if it doesn't exist):

    mkdir -p data/raw
  2. Add your documents to the data/raw/ folder:

    • Supported formats: .pdf, .docx, .doc, .txt
    • You can organize documents in subfolders
    • No size limit (within reason)
  3. Example structure:

    data/raw/
    β”œβ”€β”€ company_policies/
    β”‚   β”œβ”€β”€ employee_handbook.pdf
    β”‚   └── code_of_conduct.docx
    β”œβ”€β”€ research_papers/
    β”‚   β”œβ”€β”€ ai_trends_2024.pdf
    β”‚   └── machine_learning_guide.pdf
    └── personal_notes/
        β”œβ”€β”€ meeting_notes.txt
        └── project_ideas.docx
    

πŸš€ Usage

Quick Start

  1. Start the services:

    make start
  2. Ingest your documents:

    make db-ingest
  3. Open your browser and go to http://localhost:8501

  4. Start chatting with your documents! πŸ’¬

  5. Shut down bot:

    make stop

Available Commands

PrivateGPT comes with a comprehensive Makefile for easy management:

# 🎯 Main Commands
make start          # Start all services
make stop           # Stop all services
make status         # Check service status

# πŸ“Š Streamlit Management
make streamlit-start    # Start Streamlit app
make streamlit-stop     # Stop Streamlit app
make streamlit-reset    # Restart Streamlit

# πŸ€– Ollama Management
make llm-start      # Start Ollama service
make llm-stop       # Stop Ollama service
make llm-status     # Check Ollama status

# πŸ—„οΈ Database Management
make db-ingest      # Process and ingest documents
make db-reset       # Reset database
make db-backup      # Create database backup
make db-check       # Check database status

# 🧹 Cleanup
make clean          # Clean temporary files
make clean-all      # Deep clean everything

Manual Document Ingestion

If you prefer to run the ingestion manually:

python scripts/ingest_documents.py

πŸ—οΈ Architecture

PrivateGPT follows a modular RAG (Retrieval-Augmented Generation) architecture:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   πŸ“„ Documents   β”‚ ── β”‚  πŸ“ Ingestion   β”‚ ── β”‚  πŸ—„οΈ Vector DB   β”‚
β”‚   (PDF, DOCX,   β”‚    β”‚  (Chunking &    β”‚    β”‚  (ChromaDB)    β”‚
β”‚    TXT files)   β”‚    β”‚   Embedding)    β”‚    β”‚                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                       β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  πŸ’¬ Streamlit   β”‚ ── β”‚  🧠 RAG         β”‚ ── β”‚  πŸ” Retrieval   β”‚
β”‚   Interface     β”‚    β”‚  Pipeline       β”‚    β”‚   (Semantic     β”‚
β”‚                 β”‚    β”‚                 β”‚    β”‚    Search)      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                β”‚
                       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                       β”‚  πŸ€– Generation  β”‚
                       β”‚   (Ollama +     β”‚
                       β”‚    Mistral)     β”‚
                       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Components

  • πŸ”„ Ingestion Pipeline: Processes documents and creates vector embeddings
  • πŸ” Retrieval System: Finds relevant document chunks using semantic search
  • πŸ€– Generation Engine: Uses Ollama with Mistral model for response generation
  • πŸ’¬ Chat Interface: Streamlit-based UI with real-time streaming
  • πŸ—„οΈ Vector Database: ChromaDB for efficient similarity search

πŸ”§ Configuration

Customize PrivateGPT by modifying config.py:

# Model configurations
EMBEDDING_MODEL = "nomic-embed-text"    # Change embedding model
LLM_MODEL = "mistral"                   # Change language model
OLLAMA_BASE_URL = "http://localhost:11434"

# Chunking parameters
CHUNK_SIZE = 1000                       # Adjust chunk size
CHUNK_OVERLAP = 200                     # Adjust overlap

# Retrieval parameters
TOP_K_RETRIEVAL = 5                     # Number of documents to retrieve

πŸ“ Development Guidelines

  • Follow PEP 8 style guidelines
  • Add docstrings to all functions and classes
  • Include type hints where appropriate
  • Write unit tests for new features
  • Update documentation as needed

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


πŸ™ Acknowledgments

  • LangChain for the RAG framework
  • Ollama for local LLM inference
  • ChromaDB for vector storage
  • Streamlit for the web interface
  • Mistral for the language model

πŸŽ‰ Ready to chat with your documents privately?

Get Started

Made with ❀️ by the PrivateGPT Team

About

Chat with your documents privately, locally, and securely

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published