High-performance Retrieval-Augmented Generation (RAG) on Redis, Qdrant or PostgreSQL (pgvector)
π FastAPI β’ Redis / Qdrant / PostgreSQL β’ Async β’ Embedding-agnostic
- Features
- Tech Stack
- Requirements
- Installation
- Configuration & Connection Options
- Usage
- Architecture
- License
- π High Performance: Vector search powered by Redis HNSW, Qdrant, or PostgreSQL with pgvector.
- π οΈ Simple API: Endpoints for index creation, insertion, querying, and optional re-ranking.
- π Embedding-agnostic: Works with any embedding model (OpenAI, Llama 3, HuggingFace, etc.).
- π» Interactive Setup Wizard:
aquiles-rag configs
walks you through full configuration for Redis, Qdrant, or PostgreSQL. - β‘ Sync & Async clients:
AquilesRAG
(requests) andAsyncAquilesRAG
(httpx) withembedding_model
andmetadata
support. - π§© Extensible: Designed to integrate into ML pipelines, microservices, or serverless deployments; supports an optional re-ranker stage for improved result ordering.
- Python 3.9+
- FastAPI
- Redis, Qdrant or PostgreSQL + pgvector as vector store
- NumPy
- Pydantic
- Jinja2
- Click (CLI)
- Requests (sync client)
- HTTPX (async client)
- Platformdirs (config management)
- Redis (standalone or cluster) β or Qdrant (HTTP / gRPC) β or PostgreSQL with the
pgvector
extension. - Python 3.9+
- pip
Optional: run Redis locally with Docker:
docker run -d --name redis-stack -p 6379:6379 redis/redis-stack-server:latest
pip install aquiles-rag
git clone https://github.com/Aquiles-ai/Aquiles-RAG.git
cd Aquiles-RAG
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
# optional development install
pip install -e .
Configuration is persisted at:
~/.local/share/aquiles/aquiles_config.json
The previous manual per-flag config flow was replaced by an interactive wizard. Run:
aquiles-rag configs
The wizard prompts for everything required for either Redis, Qdrant, or PostgreSQL (host, ports, TLS/gRPC options, API keys, admin user). At the end it writes aquiles_config.json
to the standard location.
The wizard also includes optional re-ranker configuration (enable/disable, execution provider, model name, concurrency, preload) so you can activate a re-ranking stage that scores
(query, doc)
pairs after the vector store returns candidates.
If you prefer automation, generate the same JSON schema the wizard writes and place it at ~/.local/share/aquiles/aquiles_config.json
before starting the server (or use the deploy
pattern described below).
Aquiles-RAG supports multiple Redis modes:
- Local Cluster
RedisCluster(host=host, port=port, decode_responses=True)
- Standalone Local
redis.Redis(host=host, port=port, decode_responses=True)
- Remote with TLS/SSL
redis.Redis(host=host, port=port, username=username or None,
password=password or None, ssl=True, decode_responses=True,
ssl_certfile=ssl_certfile, ssl_keyfile=ssl_keyfile, ssl_ca_certs=ssl_ca_certs)
- Remote without TLS/SSL
redis.Redis(host=host, port=port, username=username or None, password=password or None, decode_responses=True)
If you select PostgreSQL in the wizard, the wizard will prompt for connection and pool settings for your Postgres instance. Note: Aquiles-RAG does not run DB migrations automatically β if you use Postgres you must prepare the
pgvector
andpgcrypto
extension, tables and indexes yourself.
- Interactive Setup Wizard (recommended):
aquiles-rag configs
- Serve the API:
aquiles-rag serve --host "0.0.0.0" --port 5500
- Deploy with bootstrap script (pattern:
deploy_*.py
withrun()
that callsgen_configs_file()
):
# Redis example
aquiles-rag deploy --host "0.0.0.0" --port 5500 --workers 2 deploy_redis.py
# Qdrant example
aquiles-rag deploy --host "0.0.0.0" --port 5500 --workers 2 deploy_qdrant.py
# PostgreSQL example
aquiles-rag deploy --host "0.0.0.0" --port 5500 --workers 2 deploy_postgres.py
The
deploy
command imports the given Python file, executes itsrun()
to generate the config (writesaquiles_config.json
), then starts the FastAPI server.
- Create Index
curl -X POST http://localhost:5500/create/index \
-H "X-API-Key: YOUR_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"indexname": "documents",
"embeddings_dim": 768,
"dtype": "FLOAT32",
"delete_the_index_if_it_exists": false
}'
- Insert Chunk (ingest)
curl -X POST http://localhost:5500/rag/create \
-H "X-API-Key: YOUR_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"index": "documents",
"name_chunk": "doc1_part1",
"dtype": "FLOAT32",
"chunk_size": 1024,
"raw_text": "Text of the chunk...",
"embeddings": [0.12, 0.34, 0.56, ...]
}'
- Query Top-K
curl -X POST http://localhost:5500/rag/query-rag \
-H "X-API-Key: YOUR_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"index": "documents",
"embeddings": [0.78, 0.90, ...],
"dtype": "FLOAT32",
"top_k": 5,
"cosine_distance_threshold": 0.6
}'
The API supports an optional re-ranking stage (configurable in the server). When enabled, the typical flow is: vector search β candidate filtering/metadata match β optional re-ranker scores pairs to improve ordering. (See configuration wizard to enable/disable and set re-ranker options.)
from aquiles.client import AquilesRAG
client = AquilesRAG(host="http://127.0.0.1:5500", api_key="YOUR_API_KEY")
# Create an index (returns server text)
resp_text = client.create_index("documents", embeddings_dim=768, dtype="FLOAT32")
# Insert chunks using your embedding function
def get_embedding(text):
return embedding_model.encode(text)
responses = client.send_rag(
embedding_func=get_embedding,
index="documents",
name_chunk="doc1",
raw_text=full_text,
embedding_model="text-embedding-v1" # optional metadata sent with each chunk
)
# Query the index (returns parsed JSON)
results = client.query("documents", query_embedding, top_k=5)
print(results)
import asyncio
from aquiles.client import AsyncAquilesRAG
client = AsyncAquilesRAG(host="http://127.0.0.1:5500", api_key="YOUR_API_KEY")
async def main():
await client.create_index("documents_async")
responses = await client.send_rag(
embedding_func=async_embedding_func, # supports sync or async callables
index="documents_async",
name_chunk="doc_async",
raw_text=full_text
)
results = await client.query("documents_async", query_embedding)
print(results)
asyncio.run(main())
Notes
- Both clients accept an optional
embedding_model
parameter forwarded as metadata β helpful when storing/querying embeddings produced by different models. send_rag
chunks text usingchunk_text_by_words()
(default β600 words / β1024 tokens) and uploads each chunk (concurrently in the async client).- If the re-ranker is enabled on the server, the client can call the re-rank endpoint after receiving RAG results to re-score/re-order candidates.
Open the web UI (protected) at:
http://localhost:5500/ui
Use it to:
- Run the Setup Wizard link (if available) or inspect live configs
- Test
/create/index
,/rag/create
,/rag/query-rag
- Access protected Swagger UI & ReDoc after logging in
- Clients (HTTP/HTTPS, Python SDK, or UI Playground) make asynchronous HTTP requests.
- FastAPI Server β orchestration and business logic; validates requests and translates them to vector store operations.
- Vector Store β Redis (HASH + HNSW/COSINE search), Qdrant (collections + vector search), or PostgreSQL with
pgvector
andpgcrypto
(manual DB preparation required). - Optional Re-ranker β when enabled, a re-ranking component scores
(query, doc)
pairs to improve final ordering.
- Metrics /
/status/ram
: Redis offersINFO memory
andmemory_stats()
β for Qdrant the same Redis-specific metrics are not available (the endpoint will return a short message explaining this). For PostgreSQL, metrics exposed differ from Redis and Qdrant; check your Postgres monitoring tooling for memory and indexing statistics. - Dtype handling: Server validates
dtype
for Redis (converts embeddings to the requested NumPy dtype). Qdrant accepts float arrays directly βdtype
is informational/compatibility metadata. For PostgreSQL+pgvector, ensure the stored vector dimension and any normalization required for cosine/inner product are handled by your ingestion pipeline. - gRPC: Qdrant can be used over HTTP or gRPC (
prefer_grpc=true
in the config). Ensure your environment allows gRPC outbound/inbound as needed. - PostgreSQL note: Aquiles-RAG does not run automatic migrations for Postgres β create the
pgvector
extension, tables and indexes manually (or via your own migration tool) before using Postgres as a vector store.
See the test/
directory for automated tests:
- client tests for the Python SDK
- API tests for endpoint behavior
test_deploy.py
for deployment / bootstrap validation
If you add Postgres to CI, prepare the DB (create
pgvector
extension and required tables/indexes) in your test fixtures since there are no automatic migrations.