Skip to content

It is an AI-powered tool that clones any GitHub repository, chunks and embeds its code using vector embeddings, and enables semantic search via ChromaDB. It retrieves the most relevant code segments and queries a language model (like Gemini) to provide deep, contextual understanding of the repository.

License

Notifications You must be signed in to change notification settings

jasjeev013/Git-Insight-Orchestrator-Agent

Repository files navigation

Git Insight Orchestrator Agent

Chat Interface (Laptop) Repository Analysis(Mobile)
Chat Interface Repository Analysis

A sophisticated chatbot that allows you to query any GitHub repository by analyzing its source code. The system clones the repository, processes the code files, and creates vector embeddings for semantic search.

Features

  • Repository Analysis: Clone and analyze any public GitHub repository
  • Semantic Search: Find relevant code sections using natural language queries
  • AI-Powered Answers: Get explanations and insights about the codebase
  • Vector Database: Efficient storage and retrieval of code embeddings
  • Modern UI: Clean, futuristic interface for optimal user experience

Technology Stack

  • Backend: Python, Flask
  • Vector Database: ChromaDB
  • Embeddings: Google Embedding 001
  • LLM: Gemini 2.5 Flash
  • Frontend: HTML, CSS, JavaScript
  • Repository Handling: GitPython

Workflow

graph TD
    A[GitHub Repository URL] --> B[Clone Repository]
    B --> C[Extract Code Files]
    C --> D[Generate Embeddings]
    D --> E[Store in ChromaDB]
    E --> F[User Query]
    F --> G[Similarity Search]
    G --> H[Generate Response with Gemini 2.5 Flash]
    H --> I[Display Results]
Loading

OR

Workflow

Installation

Prerequisites

  • Python 3.9+
  • Git
  • Google Cloud API key (for Gemini and Embeddings)

Setup

  1. Clone the repository:
git clone https://github.com/yourusername/repo-chatbot.git
cd repo-chatbot
  1. Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`
  1. Install dependencies:
pip install -r requirements.txt
  1. Create a .env file and add your API keys:
GOOGLE_API_KEY=your_google_api_key

Usage

  1. Start the Flask server:
python app.py
  1. Open your browser to http://localhost:5000

  2. Enter a GitHub repository URL and click "Analyze"

  3. Once processed, you can ask questions about the repository

API Endpoints

  • POST /analyze - Submit a GitHub repository for analysis
  • POST /chat - Submit a query about the analyzed repository
  • GET /status - Check processing status

Configuration

Modify config.py for these settings:

# Chunking parameters
CHUNK_SIZE = 1000
CHUNK_OVERLAP = 200

# Database settings
PERSIST_DIRECTORY = "db"
COLLECTION_NAME = "code_embeddings"

# Model settings
EMBEDDING_MODEL = "models/embedding-001"
LLM_MODEL = "gemini-1.5-flash"

Development

To contribute to the project:

  1. Fork the repository
  2. Create a new branch (git checkout -b feature-branch)
  3. Commit your changes (git commit -am 'Add new feature')
  4. Push to the branch (git push origin feature-branch)
  5. Create a new Pull Request

Testing

Run the test suite with:

python -m pytest tests/

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • Google for the Gemini models and embeddings
  • ChromaDB team for the vector database
  • LangChain for the LLM integration framework

Support

For issues or questions, please open an issue on GitHub or contact jasjeev99@gmail.com

About

It is an AI-powered tool that clones any GitHub repository, chunks and embeds its code using vector embeddings, and enables semantic search via ChromaDB. It retrieves the most relevant code segments and queries a language model (like Gemini) to provide deep, contextual understanding of the repository.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published