🤖 Multi-Agent Deep Researcher Workflow System

📌 Overview

The Multi-Agent Deep Researcher is a modular workflow system where specialized agents collaborate to perform in-depth research on any prompt. Agents are coordinated by an orchestration layer, store context in a vector DB, and iteratively refine results through structured communication.

Key use cases:

📊 Market Research – competitor, trends, reports
🧠 Academic Research – summarizing and synthesizing papers
⚙️ Tech Research – emerging tools, frameworks, patents

🧩 Features

Multi-Agent Architecture (search, summarizer, evaluator, synthesizer)
Orchestration Layer via LangChain / MCP / custom orchestrator
Memory + RAG with FAISS / Pinecone / Weaviate
Error Recovery: retries, graceful fallbacks, logging
Extensible API (FastAPI backend with REST endpoints)
Dockerized for reproducible environments
CI/CD with GitHub Actions

Sample workflow demo

Screenshot from Swagger UI (/docs)

example response from Deep Researcher system

Folder structure

deep-researcher/
├── README.md
├── docs/
│   ├── architecture.md
│   ├── agents.md
│   └── dev-setup.md
├── src/
│   ├── orchestrator/        # orchestration layer
│   │   └── orchestrator.py
│   ├── agents/              # agent implementations
│   │   ├── searcher.py
│   │   ├── extractor.py
│   │   ├── summarizer.py
│   │   ├── critic.py
│   │   └── synthesizer.py
│   ├── storage/
│   │   ├── vector_store.py
│   │   └── metadata_db.py
│   ├── api/
│       └── server.py        # FastAPI
│   
├── tests/
├── Dockerfile
├── docker-compose.yml
├── .github/workflows/ci.yml
└── examples/
    └── demo_prompt.md

Agent roles & responsibilities

Searcher

Inputs: research prompt
Actions: query web, scholarly APIs (arXiv, CrossRef), news, and internal corpora.
Outputs: raw documents, URLs, metadata, confidence scores.

Extractor

Inputs: raw documents (HTML, PDF, text)
Actions: OCR (if needed), text extraction, chunking, metadata extraction (title, authors, date, section headings)
Outputs: chunks with embeddings ready for vector store ingestion.

Summarizer

Inputs: retrieved chunks or raw text
Actions: produce concise summaries at multiple granularities (sentence, paragraph, section)
Outputs: summaries with provenance (source pointers)

Critic / Evaluator

Inputs: summaries / synthesized content
Actions: fact-check against source docs, evaluate reasoning quality, flag hallucinations, rank items by novelty/relevance
Outputs: critiques, suggested revisions, confidence metrics

Synthesizer

Inputs: summaries + critiques
Actions: produce final structured deliverable (executive summary, literature review, gaps & opportunities, recommended reading list)
Outputs: report (Markdown), references (structured), action items

Planner (optional)

Builds multi-step plans, decides whether to re-run Search/Extractor on uncovered gaps.

Monitor / Supervisor

Observes workflows, retries failed tasks, rate-limit management, and alerts.

Communication protocol & data model

Message format (JSON)

{
  "task_id": "uuid",
  "agent": "searcher",
  "prompt": "Find recent papers on retrieval-augmented generation",
  "inputs": { },
  "outputs": { },
  "status": "queued|running|done|failed",
  "meta": {"timestamp": "2025-08-01T12:00:00Z"}
}

Transport

Simple mode: in-process orchestrator with async function calls (asyncio)
Distributed mode: RabbitMQ / Redis streams / Celery / MCP (message passing)

Provenance

Every chunk and summary must include source_id, source_url, page, bbox (if from PDF), and timestamp.

Idempotency & retries

Tasks carry task_id and attempts counter. Orchestrator retries transient errors up to N attempts.

Memory management & RAG integration

Vector store

Each chunk has embedding vector + metadata. Use sentence-transformers embeddings or OpenAI/Azure embeddings.
Store in Pinecone / Weaviate / FAISS (local for demo).

Short-term vs long-term memory

Short-term context: session-level cache holding current prompt, recent messages (kept in memory or Redis with TTL)
Long-term memory: vector DB plus metadata DB for persistent artifacts (reports, raw documents, citations)

Context windowing

Retrieval: top-k nearest neighbors with metadata filtering (e.g., date range, domain).
Build RAG context by concatenating highest-quality chunks with provenance.

Error recovery & observability

Transient failures: retry policy with exponential backoff, capped attempts
Permanent failures: mark task failed, escalate to Monitor which can notify via Slack/email
Partial results: allow downstream agents to operate on partial outputs; track completeness score
Logging: structured logs (JSON) with request IDs and spans for tracing (OpenTelemetry compatible)
Metrics: Prometheus metrics for task latencies, failure rates, queue depth

System diagrams (Mermaid)

1) High level flow

flowchart LR
  A[User Prompt]
  A --> Orchestrator[Orchestration Layer]
  Orchestrator --> Searcher[Searcher]
  Searcher --> Extractor[Extractor]
  Extractor --> VectorDB[(Vector DB)]
  Orchestrator --> Retriever[Retriever]
  Retriever --> Summarizer[Summarizer]
  Summarizer --> Critic[Critic]
  Critic --> Synthesizer[Synthesizer]
  Synthesizer --> Output[Final Report]
  Output -->|Store| VectorDB
  Synthesizer -->|Feedback| Orchestrator

2) Agent feedback & iteration loop

sequenceDiagram
  participant U as User
  participant O as Orchestrator
  participant S as Searcher
  participant X as Extractor
  participant V as VectorDB
  participant SUM as Summarizer
  participant C as Critic
  participant SYN as Synthesizer

  U->>O: submit prompt
  O->>S: search
  S->>X: found docs
  X->>V: store chunks
  O->>V: retrieve context
  V->>SUM: context
  SUM->>C: summary
  C->>O: critique
  O->>SYN: synthesize (with critique)
  SYN->>U: report
  alt Critic requests more search
    O->>S: additional search
  end

🚀 Quick Start

1. Clone the repository

git clone https://github.com/PranithChowdary/deep-researcher.git
cd deep-researcher

2. Create environment

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

3. Run API server

uvicorn src.api.server:app --reload

4. Example request

curl -X POST "http://127.0.0.1:8000/research" \
     -H "Content-Type: application/json" \
     -d '{"prompt": "Latest research on quantum machine learning"}'

✅ Tests

pytest tests/ -v

🐳 Docker

docker build -t deep-researcher .
docker run -p 8000:8000 deep-researcher

Deployment & CI

Dockerfile for app
docker-compose.yml for local dev (FastAPI + FAISS + Redis)
GitHub Actions:
- ci.yml: run lint, unit tests, build docker image, push to GHCR

Documentation to include in GitHub repo

architecture.md — expanded diagrams, component responsibilities, tradeoffs
agents.md — each agent's input/output contract, sample prompts/templates, failure modes
dev-setup.md — environment variables, local vs cloud vector store options
runbook.md — monitoring, incident response, scaling recommendations

Tests & evaluation

Unit tests for agent contracts
Integration test: end-to-end flow with mocked LLM and local FAISS
Metric examples: ROUGE/BERTScore for summary quality, human evaluation checklist, latency and cost per run

Security & ethical considerations

Rate-limit external scrapers; respect robots.txt and copyright. Use only public data or licensed databases.
Sanitize PII before storing in vector DB. Provide deletion endpoints for user data.
Flag and human-review outputs with low-confidence or safety flags.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🤖 Multi-Agent Deep Researcher Workflow System

📌 Overview

🧩 Features

Sample workflow demo

Folder structure

Agent roles & responsibilities

Communication protocol & data model

Memory management & RAG integration

Error recovery & observability

System diagrams (Mermaid)

1) High level flow

2) Agent feedback & iteration loop

🚀 Quick Start

1. Clone the repository

2. Create environment

3. Run API server

4. Example request

✅ Tests

🐳 Docker

Deployment & CI

Documentation to include in GitHub repo

Tests & evaluation

Security & ethical considerations

About

Uh oh!

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github/workflows		.github/workflows
.vscode		.vscode
docs		docs
examples		examples
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

License

PranithChowdary/deep-researcher

Folders and files

Latest commit

History

Repository files navigation

🤖 Multi-Agent Deep Researcher Workflow System

📌 Overview

🧩 Features

Sample workflow demo

Folder structure

Agent roles & responsibilities

Communication protocol & data model

Memory management & RAG integration

Error recovery & observability

System diagrams (Mermaid)

1) High level flow

2) Agent feedback & iteration loop

🚀 Quick Start

1. Clone the repository

2. Create environment

3. Run API server

4. Example request

✅ Tests

🐳 Docker

Deployment & CI

Documentation to include in GitHub repo

Tests & evaluation

Security & ethical considerations

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Packages 0

Languages

Packages