βββββββ βββββββββββββββ ββββββ ββββββββββ βββββββ βββββββ βββββββββ ββββββββ βββββββββββββββ ββββββ βββββββββββββββββββββββββββββββββββββ βββ βββββββ βββ βββββββββββ ββββββββββββββββββββββ βββ βββ βββ ββββββ βββ βββββββββββ ββββββββββββββββββββββ βββ βββ ββββββββ βββ βββ βββ βββββββββββββββββββββββββββββββββββββ βββ βββββββ βββ βββ βββ βββ βββββββ βββββββ βββββββ βββββββ βββ
An open-source, LLM-based intelligent analysis bot for GitHub repositories
Chat with your codebase, gain deep insights, and automate code understanding
Note: This project is currently under active development and is not yet ready for production use.
GithubBot is a powerful AI framework designed to revolutionize how developers interact with codebases. It automatically "learns" an entire GitHub repositoryβincluding all its code and documentationβand answers any questions about it in natural language through an intelligent chatbot, from "What does this function do?" to "How do I implement a new feature?".
- π€ Intelligent Code Q&A: Provides precise, context-aware code explanations and suggestions based on Retrieval-Augmented Generation (RAG).
- β‘οΈ Fully Automated Processing: Simply provide a GitHub repository URL to automatically clone, parse, chunk, vectorize, and index the code.
- π Highly Extensible: Easily swap or extend LLMs, embedding models, and vector databases. Supports various models like OpenAI, Azure, Cohere, and HuggingFace.
- π Hybrid Search: Combines vector search with BM25 keyword search to ensure optimal context retrieval for various types of queries.
- βοΈ Asynchronous Task Handling: Uses Celery and Redis to manage time-consuming repository indexing tasks, ensuring API responsiveness and stability.
- π³ One-Click Deployment: Comes with a complete Docker Compose setup, allowing you to launch all services (API, Worker, databases, etc.) with a single command.
GithubBot uses a modern microservices architecture to ensure system scalability and maintainability. The core process is divided into two stages: "Data Ingestion" and "Query Answering".
π₯ Data Ingestion Flow | π¬ Query Answering Flow | ||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
- Backend: FastAPI, Python 3.10+
- AI / RAG: LangChain, OpenAI, Cohere, HuggingFace (extendable)
- Database: PostgreSQL (metadata), ChromaDB (vector storage)
- Task Queue: Celery, Redis
- Containerization: Docker, Docker Compose
- Data Validation: Pydantic
You can get GithubBot up and running in minutes with Docker.
- Docker: Install Docker
- Docker Compose: Usually included with Docker Desktop.
- Git: To clone this project.
git clone https://github.com/oGYCo/GithubBot.git
cd GithubBot
The project uses a .env
file to manage sensitive information and configurations. Please note: The project includes a .env.example
file. You need to create your own .env
file from it.
cp .env.example .env
Then, edit the .env
file and add at least your OpenAI API key:
# .env
# --- LLM and Embedding Model API Keys ---
# At least one model key is required
OPENAI_API_KEY="sk-..."
# AZURE_OPENAI_API_KEY=
# ANTHROPIC_API_KEY=
# ... other API keys
For Linux/macOS:
chmod +x start.sh
./start.sh
For Windows:
-
Method 1 (Batch file): Double-click
start.bat
or run in Command Prompt:start.bat
-
Method 2 (PowerShell): Right-click
start.ps1
β "Run with PowerShell" or run in PowerShell:.\start.ps1
Build and start all services manually:
docker compose up --build -d
This command will start the API service, Celery worker, PostgreSQL, Redis, and ChromaDB.
Wait a moment for the services to initialize, then check if all containers are running correctly:
docker compose ps
You should see the status of all services as running
or healthy
.
Once all services are running, you can access:
- API Documentation: http://localhost:8000/docs
- API Root: http://localhost:8000
- Flower (Task Monitor): http://localhost:5555
- Health Check: http://localhost:8000/health
Service | Port | Monitor URL | Description |
---|---|---|---|
API Service | 8000 | http://localhost:8000/health | Main API interface |
API Documentation | 8000 | http://localhost:8000/docs | Swagger documentation |
Flower | 5555 | http://localhost:5555 | Task queue monitoring |
PostgreSQL | 5432 | - | Database service |
Redis | 6380 | - | Cache and message queue |
ChromaDB | 8001 | - | Vector database (host port, container internal 8000) |
docker compose down
docker compose restart
# View all service logs
docker compose logs -f
# View specific service logs
docker compose logs -f api
docker compose logs -f worker
-
API keys not set
- Ensure at least one LLM API key is set in the
.env
file - Recommended: Set
OPENAI_API_KEY
- Ensure at least one LLM API key is set in the
-
Port conflicts
- Check if ports 8000, 5555, 5432, 6380, 8001 are occupied
- Use
netstat -an | grep :8000
to check port status
-
Docker not running
- Ensure Docker Desktop is running
- Check Docker system tray icon
-
Memory issues
- Ensure system has enough memory to run all containers
- Recommended: At least 4GB available memory
-
Network connection issues
- Ensure access to Docker Hub
- May need Docker registry mirror configuration in China
-
Docker Desktop not started
- Ensure Docker Desktop is running
- Check Docker icon in system tray
-
WSL2 not enabled
- Docker Desktop requires WSL2 support
- Refer to WSL2 installation guide
-
Firewall blocking
- Ensure Windows Firewall allows Docker network access
Once the services are running, the API will be available at http://localhost:8000
. You can access the interactive API documentation (Swagger UI) at http://localhost:8000/docs
.
Send a POST
request to the following endpoint to start analyzing a repository. This is an asynchronous operation, and the API will immediately return a task ID.
- URL:
/api/v1/repos/analyze
- Method:
POST
- Body:
{
"repo_url": "https://github.com/tiangolo/fastapi",
"embedding_config": {
"provider": "openai",
"model_name": "text-embedding-3-small",
"api_key": "your-openai-api-key"
}
}
Example (using cURL):
curl -X 'POST' \
'http://localhost:8000/api/v1/repos/analyze' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"repo_url": "https://github.com/tiangolo/fastapi",
"embedding_config": {
"provider": "openai",
"model_name": "text-embedding-3-small",
"api_key": "your-openai-api-key"
}
}'
Use the session_id
returned from the previous step to check the analysis progress.
- URL:
/api/v1/repos/status/{session_id}
- Method:
GET
Once the repository status changes to SUCCESS
, you can start asking questions.
- URL:
/api/v1/repos/query
- Method:
POST
- Body:
{
"session_id": "your-session-id",
"question": "How to handle CORS in FastAPI?",
"generation_mode": "service",
"llm_config": {
"provider": "openai",
"model_name": "gpt-4",
"api_key": "your-openai-api-key",
"temperature": 0.7,
"max_tokens": 1000
}
}
Example (using cURL):
curl -X 'POST' \
'http://localhost:8000/api/v1/repos/query' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"session_id": "your-session-id",
"question": "How to handle CORS in FastAPI?",
"generation_mode": "service"
}'
You can customize almost every aspect of the application in the .env
file.
Variable Name | Description | Default Value |
---|---|---|
APP_NAME |
Application name | "GithubBot" |
APP_VERSION |
Application version | "0.1.0" |
DEBUG |
Debug mode | False |
LOG_LEVEL |
Log level (DEBUG, INFO, WARNING, ERROR, CRITICAL) | "INFO" |
API_KEY |
API access key (optional) | "" |
CORS_ORIGINS |
Allowed CORS origins (comma-separated) | "http://localhost:3000,http://127.0.0.1:3000" |
Variable Name | Description | Default Value |
---|---|---|
API_HOST |
API host address | "0.0.0.0" |
API_PORT |
Port for the API service to listen on | 8000 |
Variable Name | Description | Default Value |
---|---|---|
DATABASE_URL |
Complete PostgreSQL connection URL | "postgresql+psycopg2://user:password@postgres:5432/repoinsight" |
POSTGRES_USER |
PostgreSQL username | "user" |
POSTGRES_PASSWORD |
PostgreSQL password | "password" |
POSTGRES_DB |
PostgreSQL database name | "repoinsight" |
POSTGRES_HOST |
PostgreSQL host | "postgres" |
POSTGRES_PORT |
PostgreSQL port | 5432 |
Variable Name | Description | Default Value |
---|---|---|
REDIS_URL |
Complete Redis connection URL | "redis://redis:6379/0" |
REDIS_HOST |
Redis service address | "redis" |
REDIS_PORT |
Redis port | 6379 |
Variable Name | Description | Default Value |
---|---|---|
CHROMADB_HOST |
ChromaDB host | "chromadb" |
CHROMADB_PORT |
ChromaDB port | 8000 |
CHROMADB_CLIENT_TIMEOUT |
ChromaDB client timeout (seconds) | 120 |
CHROMADB_SERVER_TIMEOUT |
ChromaDB server timeout (seconds) | 120 |
CHROMADB_MAX_RETRIES |
ChromaDB connection max retries | 5 |
CHROMADB_RETRY_DELAY |
ChromaDB connection retry delay (seconds) | 3 |
Variable Name | Description |
---|---|
OPENAI_API_KEY |
OpenAI API key |
AZURE_OPENAI_API_KEY |
Azure OpenAI API key |
AZURE_OPENAI_ENDPOINT |
Azure OpenAI endpoint |
ANTHROPIC_API_KEY |
Anthropic API key |
COHERE_API_KEY |
Cohere API key |
GOOGLE_API_KEY |
Google API key |
HUGGINGFACE_HUB_API_TOKEN |
HuggingFace API token |
MISTRAL_API_KEY |
Mistral API key |
QWEN_API_KEY |
Qwen API key |
DASHSCOPE_API_KEY |
DashScope API key |
Variable Name | Description | Default Value |
---|---|---|
GIT_CLONE_DIR |
Directory for Git repository clones | "/repo_clones" |
CHUNK_SIZE |
Maximum size of text chunks | 1000 |
CHUNK_OVERLAP |
Overlap size between text chunks | 200 |
EMBEDDING_BATCH_SIZE |
Batch size for embedding processing | 32 |
VECTOR_SEARCH_TOP_K |
Number of documents from vector search | 10 |
BM25_SEARCH_TOP_K |
Number of documents from BM25 search | 10 |
Variable Name | Description | Default Value |
---|---|---|
ALLOWED_FILE_EXTENSIONS |
List of allowed file extensions (JSON array) | [".py", ".js", ".jsx", ".ts", ".tsx", ".java", ".cpp", ".c", ".h", ".hpp", ".cs", ".php", ".rb", ".go", ".rs", ".swift", ".kt", ".scala", ".md", ".txt", ".rst", ".json", ".yaml", ".yml", ".toml", ".ini", ".cfg", ".sh", ".sql", ".html", ".css", ".vue", "dockerfile", "makefile", "readme", "license", "changelog"] |
EXCLUDED_DIRECTORIES |
List of directories to exclude (JSON array) | [".git", "node_modules", "dist", "build", "venv", ".venv", "target"] |
Variable Name | Description | Default Value |
---|---|---|
CELERY_BROKER_URL |
Celery broker URL | "redis://redis:6379/0" |
Contributions of all kinds are welcome! Whether it's reporting a bug, submitting a feature request, or contributing code.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
This project is licensed under the MIT License. See the LICENSE file for details.