A production-style Retrieval-Augmented Generation stack that plans queries and chooses between:
- Local path (Hybrid RAG): BM25 + vector search fused via Reciprocal Rank Fusion (RRF); optional HyDE booster.
- Global path (GraphRAG): builds a knowledge graph with community summaries and answers “overview / themes / across the corpus” questions.
Why these choices?
- RRF is a robust, rank-based way to combine heterogeneous result lists (BM25, kNN/neural) without score normalization. It’s now a first-class feature in OpenSearch search pipelines. :contentReference[oaicite:0]{index=0}
- GraphRAG adds a global reasoning mode by querying precomputed community summaries (great for “what are the main themes?”-type queries). :contentReference[oaicite:1]{index=1}
- HyDE improves recall on tough queries by generating a “hypothetical” answer passage, embedding it, and searching for the closest real docs. :contentReference[oaicite:2]{index=2}
- RAG evaluation is built-in (RAGAS, TruLens “RAG triad”, DeepEval’s pytest-like checks) so you can track faithfulness, answer/context relevance, and regressions over time. :contentReference[oaicite:3]{index=3}
- 🔎 Hybrid retrieval (BM25 + vector) with RRF fusion
- 🕸️ GraphRAG “global search” path using community summaries
- 🧪 Evals: RAGAS metrics, TruLens “RAG triad”, DeepEval tests
- ⚙️ FastAPI API + Docker Compose for OpenSearch and Neo4j
- 🧰 Scripts for ingestion, indexing, graph build, and end-to-end runs
- 🧯 Optional HyDE booster for hard queries
- 📈 Dataset hooks for HotpotQA / MuSiQue / 2WikiMultihopQA (multi-hop QA) :contentReference[oaicite:4]{index=4}
flowchart LR
U[User Query] --> P[Planner]
P -->|local| HR[Hybrid Retriever<br/>BM25 + Vector + RRF]
P -->|global| GR[GraphRAG<br/>Community Summaries]
HR --> A[Answer + Citations]
GR --> A
- Hybrid path: BM25 + vector searches run in parallel; results are fused with RRF (rank-based) to get resilient top-k. Optionally trigger HyDE when recall looks weak. (OpenSearch, arXiv)
- GraphRAG path: pre-compute a graph and community summaries; at query time, select relevant communities and synthesize a globally grounded answer. (Microsoft GitHub)
- Python 3.10+
- Docker + Docker Compose
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
docker compose -f docker/compose.yml up -d
python scripts/setup_opensearch.py
Note: If you use the
sentence-transformers/all-MiniLM-L6-v2
embedder, set the vector dimension to 384 in your index mapping (or the setup script). That model outputs 384-dim vectors. (Hugging Face)
python scripts/ingest.py
uvicorn app.api:app --reload
Open http://localhost:8000/docs
and try:
{"query": "overview of the example file"}
→ should route global (GraphRAG) if configured{"query": "what does the example mention about hybrid search?"}
→ local (hybrid + RRF)
- Run BM25 and vector queries in parallel and combine via RRF (sum of reciprocal ranks). In OpenSearch 2.19+, you can also enable engine-side RRF via a search pipeline (
score-ranker-processor
) for hybrid queries. (OpenSearch Docs, OpenSearch)
- Builds a corpus graph and community summaries, then answers global questions by aggregating the most relevant communities; use local graph queries for entity-centric questions. (Microsoft GitHub)
- When recall looks weak, generate a short hypothetical answer, embed it, and re-query; fuse with RRF. (arXiv)
This repo includes three complementary evaluation tools:
- RAGAS: faithfulness, answer relevance, context precision/recall, etc. (Ragas)
- TruLens: the RAG triad — context relevance, groundedness, answer relevance. (TruLens)
- DeepEval: pytest-like tests for RAG apps to run in CI. (GitHub)
Run a small example:
python evals/run_ragas.py
pytest -q
Hook up a small dev slice from one or more of:
- HotpotQA — ~113k multi-hop QA with supporting facts. (arXiv)
- MuSiQue — carefully composed 2–4 hop questions. (ACL Anthology)
- 2WikiMultihopQA — includes reasoning paths/evidence. (arXiv)
You can let OpenSearch do the fusion:
- Create an RRF search pipeline (rank constant ~60).
- Use a hybrid query (e.g.,
match
+ neural/kNN) with?search_pipeline=...
. References and examples: official hybrid search guides and RRF pipeline docs. (OpenSearch Docs)
-
Embedding dimension
all-MiniLM-L6-v2
→ 384 dims; set the indexknn_vector
mapping accordingly. (Hugging Face)
-
Planner
- Route to global if the query asks for themes/overview/unification across documents; otherwise local. (You can also use retrieval-agreement heuristics.)
-
Switching to engine-side RRF
- Once you register a neural model in OpenSearch, prefer the built-in RRF search pipeline for performance and simplicity. (OpenSearch)
- Streamlit demo: planner viz, per-retriever top-k, fused list, citations
- Add ColBERT-style late interaction retriever ablation
- Expand evals with larger HotpotQA/MuSiQue/2Wiki dev slices
- Add latency/cost dashboards
- OpenSearch hybrid search & RRF (tutorials, pipelines, best practices). (OpenSearch Docs, OpenSearch)
- GraphRAG modes (global/local) & community summaries. (Microsoft GitHub)
- HyDE (Hypothetical Document Embeddings). (arXiv)
- RAG evaluation: RAGAS metrics, TruLens RAG Triad, DeepEval framework. (Ragas, TruLens, GitHub)
all-MiniLM-L6-v2
model card (384-dim embeddings). (Hugging Face)
MIT