Repo Helper AI 🤖

An autonomous AI-powered assistant that automatically answers questions and triages issues on a GitHub repository using Google's Gemini API and a local vector database.

Repo Helper AI acts as a smart assistant for your GitHub repository. By listening for new issues via webhooks, it leverages the power of large language models and your project's documentation to automatically classify issues and provide relevant answers to common questions, significantly reducing the burden on human maintainers.

✨ Features

Intelligent Issue Triage: Automatically classifies newly opened issues into categories like Bug Report, Feature Request, Question, or Installation Problem using AI.
Context-Aware Answers (RAG): Uses a Retrieval-Augmented Generation (RAG) system powered by ChromaDB and your documentation (/docs folder) to find relevant information and answer user questions directly within the issue thread.
Automated GitHub Actions: Posts AI-generated answers as comments and applies relevant labels to issues using the GitHub API.
Autonomous Webhook Listener: Runs as a FastAPI application, triggered automatically by configured GitHub webhooks.
Self-Hosted Knowledge Base: Stores and queries your documentation using a local ChromaDB instance (typically run via Docker), keeping your data private.
Configurable: Easily set up and configure the bot using environment variables (.env).

⚙️ How It Works

The system operates on an event-driven architecture triggered by GitHub webhooks:

Webhook Trigger: A user creates a new issue on GitHub. A configured webhook sends the issue details (payload) to your running application.
FastAPI Server: A self-hosted Python server built with FastAPI receives and validates the webhook payload.
AI Triage: The server makes a call to the configured Gemini API to classify the issue type based on its title and body.
Vector DB Query (RAG): If the issue is classified as a "Question" or "Installation Problem", the server queries a local ChromaDB vector database (which contains indexed knowledge from the /docs folder) to find the most relevant documentation chunks based on the user's question.
AI Answer Generation: The server makes a second call to the Gemini API, providing the user's full question and the retrieved context from the vector database. Gemini uses this information to generate a helpful, contextually relevant answer.
GitHub API Action: The server uses the PyGithub library to interact with the GitHub API, posting the generated answer as a comment on the issue and applying the appropriate labels based on the AI triage result.

📚 Tech Stack

Python: The core programming language.
FastAPI: A modern, fast (high-performance) web framework for building APIs, used here to receive webhooks.
uvicorn: An ASGI server that runs the FastAPI application.
python-dotenv: For loading environment variables from a .env file.
PyGithub: A Python library to interact with the GitHub API.
Google Generative AI: Python client for accessing the Gemini API for AI tasks (triage and answering).
ChromaDB: An open-source vector database used to store and query embeddings of your project's documentation.
Docker: Used to easily run the ChromaDB service.

🚀 Installation

Follow these steps to get your own Repo Helper AI running.

1. Prerequisites

Python 3.9+
Docker (to run the ChromaDB vector database)
A GitHub repository where you want to deploy the bot.
A GitHub Personal Access Token (classic) with repo scope.
A Google Gemini API key from Google AI Studio.
A strong, random string for your GitHub webhook secret.

2. Clone the Repository

git clone https://github.com/LMLK-seal/repo-helper-ai.git 
cd repo-helper-ai

3. Configure Your Environment

This project uses a .env file to manage secret keys and configuration.

Open the .env file and fill in your details:

GITHUB_TOKEN=YOUR_GITHUB_PAT
GEMINI_API_KEY=YOUR_GEMINI_API_KEY
GITHUB_WEBHOOK_SECRET=YOUR_RANDOM_SECRET_STRING
REPO_OWNER=your-github-username
REPO_NAME=your-repository-name

Replace placeholders with your actual values.

4. Create a Virtual Environment & Install Dependencies

It is highly recommended to use a virtual environment to manage dependencies.

# Create the virtual environment
python -m venv venv

# Activate it (Windows)
.\venv\Scripts\activate
# Or on macOS/Linux:
source venv/bin/activate

# Install all required packages
pip install -r requirements.txt

🛠️ Usage (Local Development)

These instructions are for running the bot on your local machine for development and testing purposes.

1. Start the Vector Database

In a separate terminal, start the ChromaDB service using Docker. This provides the knowledge base for your documentation.

docker run -p 8000:8000 chromadb/chroma

2. Index Your Documentation

Before the bot can answer questions based on your documentation, you need to index it. This script reads files from the ./docs directory and stores their embeddings in ChromaDB.

python vector_db.py

(You should re-run this command whenever you add, remove, or significantly change files in your ./docs folder.)

3. Run the Bot Server

Start the main FastAPI application server. The --reload flag is useful for development, but you would typically use a process manager like gunicorn or pm2 for production deployment (see Deployment Guide in docs/06_deployment.md).

uvicorn main:app --reload

4. Configure the GitHub Webhook

The bot is now running, but GitHub needs to know where to send issue events.

Since the bot is likely running locally, you'll need a way for GitHub to reach it over the internet. Tools like ngrok or localtunnel can help. For example, using ngrok:
```
ngrok http 8000
```
Copy the https:// forwarding URL provided by ngrok (e.g., https://abcdef123456.ngrok.io).
In your GitHub repository, go to Settings > Webhooks > Add webhook.
Payload URL: Paste your public forwarding URL and add /api/github/webhook to the end (e.g., https://abcdef123456.ngrok.io/api/github/webhook).
Content type: Set to application/json.
Secret: Paste the GITHUB_WEBHOOK_SECRET from your .env file. This is crucial for security.
Which events...?: Select "Let me select individual events." and check Issues.
Ensure "Active" is checked.
Click Add webhook.

Your Repo Helper AI is now live and ready to assist with new issues opened in your repository!

⚠️ Important Disclosures & Considerations for production

Please read the following considerations before deploying this project for production.

1. Memory Usage and Free-Tier Limitations

The full version of this bot, which includes the ability to answer questions using a vector database (RAG), is memory-intensive.

Local vs. Cloud: Your local development machine (with 8GB+ of RAM) can easily handle loading the ChromaDB client and GoogleGenerativeAI models into memory.
Free-Tier Cloud Services: Cloud hosting providers like Render offer a "Free Tier" which is excellent for simple web services but is typically limited to 512MB of RAM.
Memory Exhaustion: When the bot attempts to query the vector database (vector_db.query_vector_db), the memory usage can spike beyond the 512MB limit. The cloud platform's automated system will detect this "memory exhaustion" and immediately terminate (SIGTERM) the application to protect the platform. This is not a bug in the code, but a resource limitation of the hosting environment.

Recommendation: To run the full-featured bot 24/7, it is highly recommended to use a paid hosting plan (e.g., Render's "Starter" plan) that provides at least 1GB of RAM.

2. The "Triage-Only" Free Tier Alternative

To allow you to run this bot on a free tier, this repository provides a lightweight, memory-efficient alternative. This version performs all the AI-powered triage and labeling but disables the question-answering (RAG) feature to avoid memory exhaustion.

Functionality: The bot will still intelligently classify issues and post a polite confirmation message, but it will not query the vector database or attempt to answer questions.
How to Use:
1. Go to the /triage_only_version folder in this repository.
2. Copy the github_handler.py file from that folder.
3. Use it to replace the github_handler.py file in your main project directory.
4. Commit and push this change to your repository.
This simplified version has a very low memory footprint and will run smoothly on any free-tier hosting plan.

☁️ Deployment (Production)

To run the bot 24/7 without needing to keep your local computer on, you should deploy it to a cloud service. Below is a guide for deploying on Render, which offers a free tier suitable for this project.

Prerequisites for Deployment

Push Your Code to GitHub: Your project, including all .py files, the docs folder, requirements.txt, and your professional README.md, must be in a GitHub repository.
Create a build.sh file: Some cloud services need help setting up. Create a file named build.sh in your project's root directory with the following content. This script will run your vector_db.py script automatically during deployment to build the knowledge base.
```
#!/usr/bin/env bash
# exit on error
set -o errexit

pip install -r requirements.txt

# Run the vector DB indexing script
python vector_db.py
```

Deploying to Render

Sign up for a free account at render.com using your GitHub account.
In the Render Dashboard, click New + and select Web Service.
Connect the GitHub repository containing your bot's code.
Give your service a unique name. The region does not matter.
Set the Build Command to: ./build.sh
Set the Start Command to: gunicorn -w 4 -k uvicorn.workers.UvicornWorker main:app
Under the Advanced section, go to Environment Variables. Add all the keys from your local .env file here (GITHUB_TOKEN, GEMINI_API_KEY, etc.). Do not use a .env file in production.
Click Create Web Service. Render will automatically build and deploy your application.
Once deployed, Render will give you a public https://...onrender.com URL. Use this URL for your GitHub webhook's Payload URL. Now your bot is live 24/7!

Your Final Deployment Checklist

Here is your checklist for the "Manage webhook" page to make your Render deployment live:

Go to your repository settings, press "webhook" on the left side, and then "Add webhook".
Payload URL: Paste the new public URL provided by Render (e.g., https://your-app-name.onrender.com/api/github/webhook).
Content type: Keep it as application/json.
Secret: Ensure this contains the exact same GITHUB_WEBHOOK_SECRET value that you entered into Render's environment variables.
Events: Make sure you have selected "Let me select individual events" and that only the Issues box is checked.
Active: Ensure this is checked.

Once you save the webhook with these settings, your bot will be fully operational in the cloud, 24/7. Go ahead and create a new issue in your repository to see it in action

▶️ How to Trigger the Bot

Once installed and configured, the bot is automatically triggered when a new issue is opened in the configured GitHub repository.

Open an issue: Simply create a new issue in your repository.
Observe: The bot will automatically classify the issue and, if it's a question or installation problem, attempt to provide an answer based on your documentation within a few moments by posting a comment. It will also add relevant labels.

🤝 Contributing

Contributions are welcome! If you'd like to contribute, please refer to the docs/contributing.md file for detailed guidelines on reporting bugs, suggesting enhancements, and submitting pull requests.

📄 License

This project is licensed under the MIT License. See the docs/license.md file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Repo Helper AI 🤖

✨ Features

⚙️ How It Works

📚 Tech Stack

🚀 Installation

1. Prerequisites

2. Clone the Repository

3. Configure Your Environment

4. Create a Virtual Environment & Install Dependencies

🛠️ Usage (Local Development)

1. Start the Vector Database

2. Index Your Documentation

3. Run the Bot Server

4. Configure the GitHub Webhook

⚠️ Important Disclosures & Considerations for production

1. Memory Usage and Free-Tier Limitations

2. The "Triage-Only" Free Tier Alternative

☁️ Deployment (Production)

Prerequisites for Deployment

Deploying to Render

Your Final Deployment Checklist

▶️ How to Trigger the Bot

🤝 Contributing

📄 License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
docs		docs
triage_only_version		triage_only_version
.env		.env
.gitignore		.gitignore
README.md		README.md
ai_services.py		ai_services.py
build.sh		build.sh
config.py		config.py
github_handler.py		github_handler.py
main.py		main.py
requirements.txt		requirements.txt
vector_db.py		vector_db.py

LMLK-seal/Repo-Helper-AI

Folders and files

Latest commit

History

Repository files navigation

Repo Helper AI 🤖

✨ Features

⚙️ How It Works

📚 Tech Stack

🚀 Installation

1. Prerequisites

2. Clone the Repository

3. Configure Your Environment

4. Create a Virtual Environment & Install Dependencies

🛠️ Usage (Local Development)

1. Start the Vector Database

2. Index Your Documentation

3. Run the Bot Server

4. Configure the GitHub Webhook

⚠️ Important Disclosures & Considerations for production

1. Memory Usage and Free-Tier Limitations

2. The "Triage-Only" Free Tier Alternative

☁️ Deployment (Production)

Prerequisites for Deployment

Deploying to Render

Your Final Deployment Checklist

▶️ How to Trigger the Bot

🤝 Contributing

📄 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages