A command-line tool for indexing and searching local code and documentation using semantic search powered by OpenAI embeddings and ChromaDB.
KB-Index (Knowledge Base Index) is a tool that helps developers create a searchable knowledge base from their local files. It uses OpenAI's text embeddings to create semantic representations of your code and documentation, storing them in a ChromaDB vector database for efficient similarity search.
Key Features:
- Index code and documentation files with semantic understanding
- Search your codebase using natural language queries
- Highlight and format search results for easy reading
- Multiple output formats (pretty, JSON, markdown)
- Simple configuration management
The downloaded file will be located in ~/.local/bin/
curl -s https://scripts.sandybridge.io/install.sh | bash -s kb_index kb
- Rust and Cargo (install via rustup)
- An OpenAI API key
- A running ChromaDB instance (local or remote)
-
Clone the repository:
git clone https://github.com/thesandybridge/kb-index.git cd kb-index
-
Build the project:
cargo build --release
-
Install the binary (optional):
cargo install --path .
KB-Index requires configuration before first use:
You can set your OpenAI API key in one of two ways:
-
Using the config command (recommended):
kb config --set-api-key="your_openai_api_key_here"
-
Using an environment variable:
export OPENAI_API_KEY="your_openai_api_key_here"
By default, KB-Index connects to ChromaDB at http://localhost:8000
. You can modify this in the config file located at:
- Linux:
$HOME/.config/kb-index/config.toml
- macOS:
$HOME/Library/Application Support/kb-index/config.toml
- Windows:
%APPDATA%\kb-index\config.toml
Example config.toml:
chroma_host = "http://localhost:8000"
openai_api_key = "your_openai_api_key_here"
openai_completion_model = "gpt-4"
openai_embedding_model = "text-embedding-3-large"
file_extensions = ["md", "rs", "tsx", "ts", "js", "jsx", "html"]
syntax_theme = "gruvbox-dark"
You can view your current configuration with:
kb config --show
Index a single file or directory:
kb index /path/to/your/code
This will:
- Recursively find all supported files (md, rs, tsx, ts, js, jsx)
- Split them into manageable chunks
- Generate embeddings using OpenAI
- Store them in ChromaDB
Search your indexed files with natural language:
kb query "How does the authentication system work?"
Options:
--top-k
or-k
: Number of results to return (default: 5)--format
or-f
: Output format (options: pretty, json, markdown)
Examples:
# Get 10 results
kb query "How to connect to the database" --top-k 10
# Output in markdown format
kb query "Error handling patterns" --format markdown
# Output in JSON format for programmatic use
kb query "API endpoints for users" --format json
KB-Index operates in two main phases:
-
Indexing Phase:
- Files are read and split into chunks of approximately 10 lines each
- Each chunk is converted to a vector embedding using OpenAI's text-embedding-3-large model
- Embeddings are stored in ChromaDB along with metadata about the source file
-
Query Phase:
- Your natural language query is converted to an embedding using the same model
- ChromaDB performs a similarity search to find the most relevant chunks
- Results are displayed with syntax highlighting and source information
Currently, KB-Index supports the following file types:
- Markdown (.md)
- Rust (.rs)
- TypeScript (.ts, .tsx)
- JavaScript (.js, .jsx)
- Project Onboarding: Quickly find relevant code when joining a new project
- Documentation Search: Search through documentation written in Markdown
- Code Navigation: Find implementations of specific features across a large codebase
- Knowledge Management: Build a searchable knowledge base of your team's code and documentation
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add some amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- OpenAI for their text embedding and completion models
- ChromaDB for the vector database
- Rust and its amazing ecosystem
Built with ❤️ by method.