AI Document Bot is Telegram Bot that transforms PDF Documents
into interactive, queryable knowledge bases. Upload any PDF Document
and ask questions about its content using either cloud-based GROQ API or Local Ollama Models.
Python 3.8+
Telegram Bot Token
(get from@botfather
)Groq API Key
(optional, get fromgroq.com
)Ollama
(optional, for local ai -ollama.ai
)
Note
To experience the Bot in Action, access it on Telegram via @AI_Docz_Bot
. No GROQ API Key
or Ollama Model
is required.
git clone https://github.com/jafarbekyusupov/ai-docs-tgbot.git
cd ai-docs-tgbot
python -m venv venv
source venv/bin/activate
Tip
On Windows:
python -m venv venv
venv\Scripts\activate
pip install -r requirements.txt
Create a .env
file in the root directory:
TELEGRAM_BOT_TOKEN=your_telegram_bot_token_here
GROQ_API_KEY=your_groq_api_key_here # optional
Install Ollama from https://ollama.ai
ollama pull llama3.2 # or any other model you prefer
python run.py
- Start the Bot - send
/start
to see welcome message - Configure AI Service (Optional) - use
/settings
to choose betweenGroq
orOllama
(GROQ
is set as default) - Upload PDF - send any PDF Document (Max 20MB - telegram's limit)
- Ask Questions - start asking Questions about your Document Content
Command | Description |
---|---|
/start |
show welcome message and basic instructions |
/settings |
choose between groq and ollama ai services |
/models |
list and switch between available ollama models |
/status |
check ai services status and current configuration |
/clear |
clear current document and start over |
/debug <query> |
see detailed search results for debugging |
/help |
show help information |
- pros: fast response times, no local setup required
- cons: requires api key, sends data to cloud
- models: llama3-8b-8192 (default)
- pros: completely private, no api costs, multiple model choices
- cons: slower response times, requires local installation
- models: llama3.2, mistral, codellama, and many more
ai-docs-tgbot/
βββ config.py # configuration and environment variables
βββ run.py # application entry point
βββ document_bot.py # main bot class and setup
βββ bot_handlers.py # telegram message and callback handlers
βββ document_processor.py # pdf text extraction and segmentation
βββ ai_processor.py # groq and ollama ai integration
βββ vector_search.py # faiss-based semantic search
βββ ollama.py # ollama client implementation
βββ requirements.txt # python dependencies
βββ .env # environment variables (CREATE THIS FILE ON UR OWN MACHINE)
- extracts text from pdf files using pypdf2
- analyzes document structure and identifies headers
- segments text into meaningful chunks for better retrieval
- supports both advanced and simple segmentation strategies
- Creates embeddings using
sentence-transformers
- Implements multiple search strategies:
-
- Smantic Search - finds content based on meaning
-
- Keyword Search - matches specific terms
-
- Fuzzy Search - handles partial matches
-
- Section Search - searches within document sections
- Uses
faiss
for efficient similarity search
- supports both groq and ollama apis
- handles model selection and switching
- manages api calls and error handling
- provides consistent interface for different ai services
- processes telegram messages and commands
- manages user sessions and preferences
- handles file uploads and user interactions
- provides inline keyboards for easy configuration
Important
- Semantic Search - understands the meaning of your question
- Adaptive Keyword Search - matches important document terms
- Fuzzy Matching - finds partial word matches
- Section-Based Search - searches within specific document sections
use /debug <your question>
to see exactly how the bot finds relevant information:
- view search strategies and their results
- see similarity scores for different content segments
- understand why certain answers were selected
the document processor automatically:
- identifies document headers and sections
- creates logical text segments
- preserves context across segment boundaries
- handles various document formats and structures
Warning
TELEGRAM_BOT_TOKEN=your_bot_token
GROQ_API_KEY=your_groq_api_key
OLLAMA_BASE_URL=http://localhost:11434 # or the port you set it to
Note
- switch between groq and ollama
- select different ollama models
- view service status and availability
Package | Purpose |
---|---|
pyTelegramBotAPI |
telegram bot framework |
PyPDF2 |
pdf text extraction |
groq |
groq ai api client |
sentence-transformers |
text embeddings |
faiss-cpu |
vector similarity search |
numpy |
numerical computations |
- no persistent storage of document content
- user sessions are memory-based only
- api keys are environment-based
- local ollama option for complete privacy
- efficient text segmentation algorithms
- normalized vector embeddings for better search
- combined search strategies for improved accuracy
- fallback mechanisms for robust operation
AI Docs TGBot @ jafarbekyusupov
β Star this Repo β’ π Report Bug β’ π‘ Request Feature