A production-ready, modular vector search platform that abstracts the complexity of semantic search operations while providing flexibility for various use cases. Built with a plugin-based architecture, comprehensive API, and enterprise-grade deployment options.
Get started in under 5 minutes:
# Clone and start with Docker Compose
git clone https://github.com/shubham-web/reusable-vector-search-service
cd reusable-vector-search-service
docker-compose up -d
# Verify installation
curl http://localhost:8000/health
# View API documentation
open http://localhost:8000/docs
- Plugin-based design - Each component is independently replaceable
- Clean interfaces - Well-defined contracts for all components
- Hot-swappable - Change providers without service restart
- Embedders: HuggingFace, OpenAI, Cohere, local ONNX models
- Vector Databases: Qdrant, Pinecone, Weaviate, ChromaDB, Milvus, pgvector
- Chunkers: Recursive, semantic, fixed-size, custom strategies
- Retrievers: Similarity, hybrid, multi-vector, filtered search
- Scalable: Horizontal scaling with Kubernetes
- Monitoring: Prometheus metrics, health checks, structured logging
- Security: API key authentication, RBAC, network policies
- Performance: Async processing, caching, batch operations
- RESTful API with OpenAPI documentation
- Python SDK and client libraries
- Configuration-driven setup with environment variables
- Comprehensive examples and tutorials
graph TB
Client[Client Applications] --> API[FastAPI REST API]
API --> Services[Service Layer]
Services --> Plugins[Plugin System]
Plugins --> Chunkers[Chunkers<br/>• Recursive<br/>• Semantic<br/>• Custom]
Plugins --> Embedders[Embedders<br/>• HuggingFace<br/>• OpenAI<br/>• Custom]
Plugins --> Databases[Vector DBs<br/>• Qdrant<br/>• Pinecone<br/>• Custom]
Services --> Cache[Redis Cache]
Services --> Monitor[Monitoring<br/>Prometheus]
style API fill:#e1f5fe
style Plugins fill:#f3e5f5
style Services fill:#e8f5e8
# Example: Complete document processing workflow
from vector_search_client import VectorSearchClient
client = VectorSearchClient("http://localhost:8000")
# 1. Create collection
await client.create_collection("documents", "My document collection")
# 2. Ingest documents (automatic chunking and embedding)
documents = [
{
"id": "doc1",
"content": "Your document content here...",
"metadata": {"category": "tutorial", "author": "John Doe"}
}
]
await client.ingest_documents("documents", documents)
# 3. Search with semantic understanding
results = await client.search("documents", "How to implement vector search?")
The service is built around four main interfaces:
Split documents into optimal chunks for embedding:
class ChunkerInterface:
def chunk_document(self, document: Document) -> List[Document]:
"""Split document into chunks."""
pass
Convert text to high-dimensional vectors:
class EmbedderInterface:
def embed_texts(self, texts: List[str]) -> np.ndarray:
"""Generate embeddings for texts."""
pass
Store and retrieve vectors efficiently:
class VectorDBInterface:
async def search_similar(self, collection: str, query_vector: np.ndarray,
top_k: int) -> List[Tuple[Document, float]]:
"""Search for similar documents."""
pass
Implement advanced search strategies:
class RetrieverInterface:
async def retrieve(self, query: str, top_k: int,
filters: Dict[str, Any]) -> List[Document]:
"""Retrieve relevant documents."""
pass
vector-search-service/
├── 🐳 Docker & Deployment
│ ├── Dockerfile # Production-ready container
│ ├── docker-compose.yml # Local development stack
│ └── docker/
│ ├── docker-compose.prod.yml # Production deployment
│ └── kubernetes/ # K8s manifests
├── 📚 Documentation
│ ├── docs/getting-started.md # Quick start guide
│ ├── docs/api-reference.md # Complete API docs
│ ├── docs/configuration.md # Configuration guide
│ ├── docs/deployment.md # Deployment instructions
│ └── docs/extending.md # Plugin development
├── 💡 Examples
│ ├── examples/basic_usage.py # Simple usage examples
│ ├── examples/advanced_search.py # Advanced features
│ └── examples/custom_provider.py # Custom components
├── ⚙️ Source Code
│ ├── src/api/ # FastAPI application
│ ├── src/core/ # Core implementations
│ ├── src/services/ # Business logic
│ ├── src/plugins/ # Plugin system
│ └── src/config/ # Configuration management
└── 🔧 Configuration
├── configs/default.yaml # Default settings
└── .env.example # Environment template
- Python 3.11+
- Docker & Docker Compose
- 4GB+ RAM (recommended)
# Clone repository
git clone https://github.com/shubham-web/reusable-vector-search-service
cd reusable-vector-search-service
# Start all services
docker-compose up -d
# Check status
docker-compose ps
# Setup Python environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
# Start dependencies
docker-compose up -d qdrant redis
# Run service
python -m uvicorn src.main:app --reload
# Apply Kubernetes manifests
kubectl apply -f docker/kubernetes/
# Health check
curl http://localhost:8000/health
# Create collection
curl -X POST "http://localhost:8000/collections" \
-H "Content-Type: application/json" \
-d '{"name": "test", "description": "Test collection"}'
# Ingest document
curl -X POST "http://localhost:8000/collections/test/ingest" \
-H "Content-Type: application/json" \
-d '{
"documents": [{
"id": "doc1",
"content": "Vector search enables semantic similarity matching.",
"metadata": {"category": "tutorial"}
}]
}'
# Search
curl -X POST "http://localhost:8000/collections/test/search" \
-H "Content-Type: application/json" \
-d '{"query": "semantic search", "top_k": 5}'
# API Configuration
export VSS_API_PORT=8000
export VSS_API_HOST=0.0.0.0
# Database Configuration
export VSS_DATABASE_TYPE=qdrant
export VSS_DATABASE_HOST=localhost
export VSS_DATABASE_PORT=6333
# Embedder Configuration
export VSS_EMBEDDER_TYPE=huggingface
export VSS_EMBEDDER_CONFIG_MODEL_ID=sentence-transformers/all-MiniLM-L6-v2
# Cache Configuration
export VSS_CACHE_TYPE=redis
export VSS_CACHE_REDIS_HOST=localhost
# configs/production.yaml
service:
environment: "production"
api:
workers: 4
cors_origins: ["https://yourdomain.com"]
components:
embedder:
type: "huggingface"
config:
model_id: "sentence-transformers/all-MiniLM-L6-v2"
device: "cuda" # Use GPU for better performance
batch_size: 64
database:
type: "qdrant"
config:
host: "qdrant.yourdomain.com"
api_key: "${QDRANT_API_KEY}"
https: true
infrastructure:
cache:
type: "redis"
config:
redis_host: "redis.yourdomain.com"
redis_password: "${REDIS_PASSWORD}"
features:
async_processing: true
batch_embeddings: true
embedding_cache: true
Endpoint | Method | Description |
---|---|---|
/health |
GET | Service health check |
/collections |
GET, POST | Manage collections |
/collections/{name}/ingest |
POST | Ingest documents |
/collections/{name}/search |
POST | Semantic search |
/collections/{name}/search/hybrid |
POST | Hybrid search |
/docs |
GET | Interactive API documentation |
# Basic semantic search
{
"query": "machine learning algorithms",
"top_k": 10,
"include_metadata": true
}
# Search with metadata filtering
{
"query": "neural networks",
"top_k": 5,
"metadata_filter": {
"category": "research",
"publication_date": {"$gte": "2024-01-01"}
}
}
# Hybrid search (semantic + keyword)
{
"query": "transformer architecture",
"top_k": 10,
"alpha": 0.7, # 70% semantic, 30% keyword
"rerank": {"enabled": true}
}
# Custom embedder example
from src.plugins.interfaces.base import EmbedderInterface
class CustomEmbedder(EmbedderInterface):
def __init__(self, config):
# Initialize your custom embedder
pass
def embed_texts(self, texts: List[str]) -> np.ndarray:
# Your embedding logic
return embeddings
# Register the plugin
from src.plugins.registry import get_registry
registry = get_registry()
registry.register_embedder("custom", CustomEmbedder)
# Use your custom component
components:
embedder:
type: "custom"
config:
model_path: "/path/to/your/model"
custom_param: "value"
# Initialize swarm
docker swarm init
# Deploy stack
docker stack deploy -c docker/docker-compose.prod.yml vector-search
# Create namespace
kubectl create namespace vector-search
# Apply configurations
kubectl apply -f docker/kubernetes/secrets.yaml
kubectl apply -f docker/kubernetes/configmap.yaml
kubectl apply -f docker/kubernetes/deployment.yaml
kubectl apply -f docker/kubernetes/service.yaml
- AWS EKS: Deployment Guide
- Google GKE: Deployment Guide
- Azure AKS: Deployment Guide
The service exposes Prometheus metrics at /metrics
:
- Request rates and latencies
- Embedding generation times
- Database query performance
- Cache hit rates
- Error rates and types
- Liveness:
/health
- Service is running - Readiness:
/ready
- Service can accept traffic - Deep Health: Includes dependency status
Structured JSON logging with configurable levels:
{
"timestamp": "2024-01-15T10:30:00Z",
"level": "INFO",
"service": "vector-search-service",
"message": "Document ingested successfully",
"document_id": "doc123",
"processing_time_ms": 45
}
# API Key authentication
curl -H "X-API-Key: your-api-key" http://localhost:8000/collections
- TLS/SSL encryption
- Network policies for Kubernetes
- CORS configuration
- Rate limiting
- Input validation and sanitization
- Metadata filtering
- Audit logging
- Secure secret management
- Document repositories: Search across company documents
- Knowledge bases: Semantic search in wikis and documentation
- Customer support: Find relevant articles and solutions
- Product search: Find products by description, not just keywords
- Recommendation systems: Similar product discovery
- Content matching: Match user queries to product descriptions
- Literature review: Find related research papers
- Citation analysis: Discover relevant citations
- Knowledge discovery: Explore connections between concepts
- Media libraries: Search videos, images, and documents
- News aggregation: Find related articles and stories
- Content recommendation: Suggest similar content to users
Operation | Latency (p95) | Throughput |
---|---|---|
Document Ingestion | 150ms | 1000 docs/sec |
Semantic Search | 25ms | 500 queries/sec |
Hybrid Search | 35ms | 300 queries/sec |
Embedding Generation | 45ms | 2000 texts/sec |
- Use GPU for embedding generation
- Enable caching for repeated queries
- Batch operations for bulk processing
- Tune chunk sizes for your content
- Configure connection pooling for databases
We welcome contributions! Please see our Contributing Guide for details.
# Clone repository
git clone https://github.com/shubham-web/reusable-vector-search-service
cd reusable-vector-search-service
# Setup development environment
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
pip install -r requirements-dev.txt
# Run tests
pytest
# Format code
black src/
isort src/
# Type checking
mypy src/
- Implement the appropriate interface
- Add configuration schema
- Write comprehensive tests
- Update documentation
- Submit pull request
- Getting Started - Quick start guide
- API Reference - Complete API documentation
- Configuration - Configuration options
- Deployment - Production deployment
- Extending - Plugin development guide
- Basic Usage - Simple operations
- Advanced Search - Complex queries
- Custom Provider - Plugin development
- Documentation: Check our comprehensive docs
- Issues: GitHub Issues
- Discussions: GitHub Discussions
This project is licensed under the MIT License - see the LICENSE file for details.
- FastAPI for the excellent web framework
- Qdrant for high-performance vector search
- HuggingFace for transformer models and embeddings
- Pydantic for data validation and settings management
- Docker and Kubernetes communities for containerization standards
⭐ Star this repository if you find it useful!
Made with ❤️ by Shubham P