A modular vector database interface supporting multiple backends (Weaviate, Milvus) with a unified API and flexible embedding strategies.
- Multi-backend support: Weaviate and Milvus vector databases
- Flexible embedding strategies: Support for pre-computed vectors and multiple embedding models
- Unified API: Consistent interface across different vector database implementations
- Factory pattern: Easy creation and switching between database types
- MCP Server: Model Context Protocol server for AI agent integration with multi-database support
- CLI Tool: Command-line interface for vector database operations with YAML configuration
- Document management: Write, read, delete, and query documents
- Collection management: List and manage collections across vector databases
- Query functionality: Natural language querying with semantic search across documents
- Metadata support: Rich metadata handling for documents
- Environment variable substitution: Dynamic configuration with
{{ENV_VAR_NAME}}
syntax - Safety features: Confirmation prompts for destructive operations with
--force
flag bypass
First, clone the repository and navigate into the directory:
git clone https://github.com/AI4quantum/maestro-knowledge.git
cd maestro-knowledge
You will need Python 3.11+ and uv.
Create and activate a virtual environment:
uv venv
source .venv/bin/activate
Next, install the required dependencies:
uv sync
This should be rerun after pulling changes to ensure all dependencies are up-to-date.
from src.vector_db import create_vector_database
# Create a vector database (defaults to Weaviate)
db = create_vector_database("weaviate", "MyCollection")
# Set up the database
db.setup()
# Write documents with default embedding
documents = [
{
"url": "https://example.com/doc1",
"text": "This is a document about machine learning.",
"metadata": {"topic": "ML", "author": "Alice"}
}
]
db.write_documents(documents, embedding="default")
# List documents
docs = db.list_documents(limit=10)
print(f"Found {len(docs)} documents")
# Query documents using natural language
results = db.query("What is the main topic of the documents?", limit=5)
print(f"Query results: {results}")
# Clean up
db.cleanup()
The project includes a Go-based CLI tool for managing vector databases through the MCP server. For comprehensive CLI usage, installation, and examples, see cli/README.md.
Quick CLI Examples:
# Build and use the CLI
cd cli && go build -o maestro-k src/*.go
# List vector databases
./maestro-k vectordb list
# Create vector database from YAML
./maestro-k vectordb create config.yaml
# Query documents
./maestro-k query "What is the main topic?" --vdb=my-database
The project includes a Model Context Protocol (MCP) server that exposes vector database functionality to AI agents. For detailed MCP server documentation, configuration, and examples, see src/maestro_mcp/README.md.
Quick MCP Server Usage:
# Start the MCP server
./start.sh
# Stop the MCP server
./stop.sh
# Check server status
./stop.sh status
The library supports flexible embedding strategies for both vector databases. For detailed embedding model support and usage examples, see src/maestro_mcp/README.md.
- Weaviate: Supports built-in vectorizers and external embedding models
- Milvus: Supports pre-computed vectors and OpenAI embedding models
- Environment Variables: Set
OPENAI_API_KEY
for OpenAI embedding models
# Check supported embeddings
supported = db.supported_embeddings()
print(f"Supported embeddings: {supported}")
# Write documents with specific embedding
db.write_documents(documents, embedding="text-embedding-3-small")
See the examples/ directory for usage examples:
- Weaviate Example - Demonstrates Weaviate with different embedding models and querying
- Milvus Example - Shows Milvus with pre-computed vectors, embedding models, and querying
- MCP Server Example - Demonstrates MCP server integration including query functionality
The project includes several utility scripts for development and testing:
# Code quality and formatting
./tools/lint.sh # Run Python linting and formatting checks
cd cli && ./lint.sh # Run Go linting and code quality checks
# MCP server management
./start.sh # Start the MCP server
./stop.sh # Stop the MCP server
# Testing
./test.sh [COMMAND] # Run tests with options: cli, mcp, all, help
./test-integration.sh # Run CLI integration tests
./tools/e2e.sh all # Run end-to-end tests
# CLI tool
cd cli && ./build.sh # Build the CLI tool
# Run all tests (CLI + MCP + Integration)
./test.sh all
# Run specific test suites
./test.sh cli # Run only CLI tests
./test.sh mcp # Run only MCP server tests
./test.sh help # Show test command help
# Run comprehensive test suite (recommended before PR)
./tools/lint.sh && cd cli && ./lint.sh && cd .. && ./test.sh all
# Run integration and end-to-end tests
./test-integration.sh # CLI integration tests
./tools/e2e.sh all # Complete e2e workflows
# Monitor logs in real-time
./tools/tail-logs.sh status # Show service status
./tools/tail-logs.sh all # Tail all service logs
The project maintains high code quality standards through comprehensive linting and automated checks.
- ruff: Fast Python linter and formatter
- Formatting: Consistent code style across Python files
- Import sorting: Organized and clean imports
- CI Integration: Automated Python linting in CI/CD
- staticcheck: Detects unused code, unreachable code, and other quality issues
- golangci-lint: Advanced Go linting with multiple analyzers
- go fmt: Consistent Go code formatting
- go vet: Static analysis for potential bugs
- Dependency management: Clean and verified module dependencies
- Race condition detection: Thread safety validation
- CI Integration: Automated Go linting in CI/CD with quality gates
# Python quality checks
./tools/lint.sh
# Go quality checks (CLI)
cd cli && ./lint.sh
# All quality checks
./tools/lint.sh && cd cli && ./lint.sh
maestro-knowledge/
├── src/ # Source code
│ ├── db/ # Vector database implementations
│ │ ├── vector_db_base.py # Abstract base class
│ │ ├── vector_db_weaviate.py # Weaviate implementation
│ │ ├── vector_db_milvus.py # Milvus implementation
│ │ └── vector_db_factory.py # Factory function
│ ├── maestro_mcp/ # MCP server implementation
│ │ ├── server.py # Main MCP server
│ │ ├── mcp_config.json # MCP client configuration
│ │ └── README.md # MCP server documentation
│ └── vector_db.py # Main module exports
├── cli/ # Go CLI tool
│ ├── src/ # Go source code
│ ├── tests/ # CLI tests
│ ├── examples/ # CLI usage examples
│ ├── build.sh # Build script
│ ├── lint.sh # Go linting and code quality script
│ └── README.md # CLI documentation
├── start.sh # MCP server start script
├── stop.sh # MCP server stop script
├── tools/ # Development tools
│ ├── lint.sh # Code linting and formatting
│ ├── e2e.sh # End-to-end testing script
│ ├── test-integration.sh # Integration tests
│ └── tail-logs.sh # Real-time log monitoring script
├── test.sh # Test runner script (CLI, MCP, Integration)
├── tests/ # Test suite
│ ├── test_vector_db_*.py # Vector database tests
│ ├── test_mcp_server.py # MCP server tests
│ ├── test_query_*.py # Query functionality tests
│ ├── test_integration_*.py # Integration tests
│ ├── test_vector_database_yamls.py # YAML schema validation tests
│ └── yamls/ # YAML configuration examples
│ ├── test_local_milvus.yaml
│ └── test_remote_weaviate.yaml
├── examples/ # Usage examples
│ ├── weaviate_example.py # Weaviate usage
│ ├── milvate_example.py # Milvus usage
│ └── mcp_example.py # MCP server usage
├── schemas/ # JSON schemas
│ ├── vector-database-schema.json # Vector database configuration schema
│ └── README.md # Schema documentation
└── docs/ # Documentation
├── CONTRIBUTING.md # Contribution guidelines
├── CLI_UX_REVIEW.md # CLI UX review and improvements
└── PRESENTATION.md # Project presentation
VECTOR_DB_TYPE
: Default vector database type (defaults to "weaviate")OPENAI_API_KEY
: Required for OpenAI embedding modelsMAESTRO_KNOWLEDGE_MCP_SERVER_URI
: MCP server URI for CLI toolMILVUS_URI
: Milvus connection URI. Important: Do not use quotes around the URI value in your.env
file (e.g.,MILVUS_URI=http://localhost:19530
instead ofMILVUS_URI="http://localhost:19530"
).- Database-specific environment variables for Weaviate and Milvus connections
For detailed environment variable usage in CLI and MCP server, see their respective README files.
See CONTRIBUTING.md for contribution guidelines.
Before submitting a pull request, run the comprehensive test suite:
./tools/lint.sh && ./test.sh all
This ensures code quality, functionality, and integration with the CLI tool.
For a complete development workflow that tests everything end-to-end:
./start.sh && ./tools/e2e.sh fast && ./stop.sh
This workflow:
- Starts the MCP server
- Runs the fast end-to-end test suite
- Stops the MCP server
This is useful for quickly validating that your changes work correctly in a real environment.
The project includes comprehensive log monitoring capabilities:
# Show service status with visual indicators
./tools/tail-logs.sh status
# Monitor all logs in real-time
./tools/tail-logs.sh all
# Monitor specific service logs
./tools/tail-logs.sh mcp # MCP server logs
./tools/tail-logs.sh cli # CLI logs
# View recent logs
./tools/tail-logs.sh recent
Log Monitoring Features:
- 📡 Real-time tailing - Monitor logs as they're generated
- ✅ Visual status indicators - Clear service status with checkmarks and X marks
- 🌐 Port monitoring - Check service availability on ports
- 📄 Log file management - Automatic detection and size tracking
- 🔍 System integration - macOS system log monitoring for debugging
- 🎯 Service-specific monitoring - Tail individual service logs or all at once
Apache 2.0 License - see LICENSE file for details.