A sophisticated search engine implementation leveraging multiple RAG (Retrieval-Augmented Generation) approaches for enhanced document retrieval and querying.
This project implements three cutting-edge RAG strategies:
Based on Anthropic's research, this implementation:
- Preserves broader document context during retrieval
- Uses a two-stage retrieval process
- Generates contextual explanations for each chunk
- Implements caching with Portkey.ai to optimize LLM calls
https://www.anthropic.com/news/contextual-retrieval
Following Jina AI's methodology:
- Delays text segmentation until after embedding
- Improves semantic coherence in long documents
- Reduces information loss during chunking
- Optimizes token usage for large documents
https://jina.ai/news/late-chunking-in-long-context-embedding-models/
Combines multiple search strategies:
- Vector search for semantic understanding
- Keyword search for precise matching
- BM25 scoring for relevance ranking
- Configurable weights between search types
https://www.elastic.co/search-labs/tutorials/search-tutorial/vector-search/hybrid-search
-
Multiple Processing Strategies:
- Basic RAG with naive chunking
- Context-aware RAG with semantic preservation
- Late chunking for improved segmentation
- Hybrid search combining multiple approaches
-
Advanced Retrieval:
- Comparative search across implementations
- Context-aware document understanding
- Semantic similarity matching
- Optimized caching for LLM operations
- FastAPI backend
- PostgreSQL with pgvector for vector storage
- Elasticsearch for hybrid search
- Multiple embedding models:
- OpenAI embeddings
- Google AI embeddings
- Jina AI embeddings
- Document ingestion (PDF support)
- Text extraction and cleaning
- Semantic chunking
- Embedding generation
- Vector storage and indexing
POST /upload-pdf
: Main upload endpoint with optimized processingPOST /upload-pdf/naive
: Basic RAG implementationPOST /upload-pdf/contextual
: Contextual RAG implementationPOST /upload-pdf/jina
: Late chunking implementationPOST /upload-pdf/elasticsearch
: Hybrid search implementation
POST /query/naive
: Basic vector searchPOST /query/contextual
: Context-aware searchPOST /query/jina
: Late chunking searchPOST /query/hybrid-search
: Combined vector and keyword searchPOST /query/comparative
: Comparative results across implementations
Open Ia Caching https://portkey.ai/docs/integrations/llms/openai/prompt-caching-openai
- Clone the repository
- Install dependencies:
POSTGRES_CONNECTION_STRING=your_connection_string OPENAI_API_KEY=your_openai_key GOOGLE_API_KEY=your_google_key PORTKEY_API_KEY=your_portkey_key JINA_API_KEY=your_jina_key ES_URL=your_elasticsearch_url (cloud)
Set up your own elastick searck service with docker https://python.langchain.com/docs/integrations/vectorstores/elasticsearch/