🤖 llama-index-supervisor

A Python library for creating hierarchical multi-agent systems using LlamaIndex. Hierarchical systems are a type of multi-agent architecture where specialized agents are coordinated by a central supervisor agent. The supervisor controls all communication flow and task delegation, making decisions about which agent to invoke based on the current context and task requirements. (inspired by Langgraph Supervisor)

Features

🤖 Supervisor Workflow: Create a supervisor to orchestrate multiple specialized agents or tools using LlamaIndex's workflow engine.
🛠️ Agent Handoff: Built-in mechanism for the supervisor to hand off tasks to appropriate agents.
📝 Flexible Message History: Control how agent interactions are reflected in the overall chat history (full_history or last_message).
🧩 Custom Agent Integration: Use custom LlamaIndex Workflow classes as agents within the supervisor framework.
🌳 Hierarchical Structures: Build multi-level agent hierarchies by using supervisors as agents within other supervisors. Supports adding structure to the context (add_tree_structure).
🏷️ Agent Name Attribution: Automatically adds the agent's name to its messages for clarity in the history (name_addition).

Installation

pip install llama-index-supervisor

Here's a simple example of a supervisor managing two specialized agents (a math expert and a research expert) using LlamaIndex components. The diagram below illustrates the basic architecture:

# Ensure you have OPENAI_API_KEY set in your environment
# import os
# os.environ["OPENAI_API_KEY"] = "sk-..."

from llama_index.llms.openai import OpenAI
from llama_index_supervisor import Supervisor
from llama_index.core.agent.function_calling import FunctionAgent
from llama_index.core.tools import FunctionTool
from llama_index.core.workflow import Context

# Initialize the LLM
llm = OpenAI(model="gpt-4o", temperature=0)

# Define tools
def add(a: float, b: float) -> float:
    """Add two numbers."""
    return a + b

def multiply(a: float, b: float) -> float:
    """Multiply two numbers."""
    return a * b

def web_search(query: str) -> str:
    """Search the web for information."""
    # Replace with a real web search implementation if needed
    print(f"--- Searching web for: {query} ---")
    return (
        "Here are the headcounts for each of the FAANG companies in 2024:\n"
        "1. **Facebook (Meta)**: 67,317 employees.\n"
        "2. **Apple**: 164,000 employees.\n"
        "3. **Amazon**: 1,551,000 employees.\n"
        "4. **Netflix**: 14,000 employees.\n"
        "5. **Google (Alphabet)**: 181,269 employees."
    )

# Create function tools
add_tool = FunctionTool.from_defaults(fn=add)
multiply_tool = FunctionTool.from_defaults(fn=multiply)
search_tool = FunctionTool.from_defaults(fn=web_search)

# Create specialized agents using LlamaIndex FunctionAgent
math_agent = FunctionAgent.from_tools(
    tools=[add_tool, multiply_tool],
    llm=llm,
    name="math_expert",
    system_prompt="You are a math expert. Always use one tool at a time.",
    description="Specialized in performing mathematical calculations like addition and multiplication." # Added description
)

research_agent = FunctionAgent.from_tools(
    tools=[search_tool],
    llm=llm,
    name="research_expert",
    system_prompt="You are a world class researcher with access to web search. Do not do any math.",
    description="Specialized in searching the web for information." # Added description
)

# Create supervisor workflow
# The supervisor automatically gets descriptions from the agents
supervisor = Supervisor(
    llm=llm,
    agents=[math_agent, research_agent],
    # Optional: Add a tree structure representation to the system prompt
    add_tree_structure=True,
    # Optional: Set a timeout for the workflow
    timeout=60
)

# Run the workflow
# Context manages state, including memory, across workflow steps
ctx = Context(supervisor)
response = await supervisor.run(
    input="what's the combined headcount of the FAANG companies in 2024?",
    ctx=ctx # Pass the context
)

# Print the response directly
print(response)

Message History Management

The output_mode parameter in the Supervisor constructor controls how messages from delegated agents are added back to the supervisor's main chat history:

output_mode="full_history" (Default): Includes all messages (intermediate steps, tool calls, final response) from the delegated agent's run.
output_mode="last_message": Includes only the final message generated by the delegated agent.

# Example: Only include the final response from agents
supervisor_last_only = Supervisor(
    llm=llm,
    agents=[math_agent, research_agent],
    output_mode="last_message"
)

Hierarchical Supervisors

You can create multi-level hierarchies by using Supervisor instances as agents within another Supervisor. Ensure each agent (including supervisors acting as agents) has a unique name and a description.

# Assume math_agent, research_agent, writing_agent, publishing_agent are defined
# (writing_agent and publishing_agent would need tools and descriptions)

# Create mid-level supervisors
research_supervisor = Supervisor(
    llm=llm,
    agents=[research_agent, math_agent],
    name="research_supervisor",
    description="Manages a research team with experts in web research and math.",
    system_prompt="You manage a research team. Delegate tasks appropriately.",
    timeout=60
)

writing_supervisor = Supervisor(
    llm=llm,
    # agents=[writing_agent, publishing_agent], # Add writing/publishing agents here
    name="writing_supervisor",
    description="Manages a content team with experts in writing and publishing.",
    system_prompt="You manage a content team. Delegate tasks appropriately.",
    timeout=60
)

# Create top-level supervisor
top_level_supervisor = Supervisor(
    llm=llm,
    agents=[research_supervisor, writing_supervisor],
    name="top_level_supervisor",
    description="Executive supervisor coordinating research and writing teams.",
    system_prompt="You are the executive supervisor. For research/math, use research_supervisor. For content/publishing, use writing_supervisor.",
    timeout=120, # Increase timeout for multi-level
    add_tree_structure=True # Useful for complex hierarchies
)

# Run the top-level supervisor
ctx_top = Context(top_level_supervisor)
query = "Research the FAANG headcounts for 2024, calculate the total, and then write a brief summary."
response_top = await top_level_supervisor.run(
    input=query,
    ctx=ctx_top,
)

# Print the final response
print(response)

Using Custom Agents (Workflows)

You can define your own agent logic by creating a class that inherits from llama_index.core.workflow.Workflow. This custom workflow can then be passed as an agent to the Supervisor.

See the CoolAgent example in CoolAgent.ipynb (Colab Link) for a demonstration. The key is that your custom workflow class needs name and description.

# simplified snippet from CoolAgent.ipynb
from llama_index.core.workflow import Workflow, Context, StartEvent, StopEvent, step
from llama_index.core.llms import LLM, ChatMessage
from llama_index.core.memory import ChatMemoryBuffer

class CoolAgent(Workflow):
    def __init__(self, name: str, description: str, llm: LLM, system_prompt: str = "You are a cool agent. Be cool."):
        super().__init__()
        self.name = name
        self.description = description
        self.llm = llm
        self.system_message = ChatMessage(role="system", content=system_prompt)

    @step
    async def start_flow(self, ctx: Context, ev: StartEvent) -> StopEvent:
        memory: ChatMemoryBuffer = await ctx.get("memory", default=ChatMemoryBuffer.from_defaults(llm=self.llm))
        # Agent logic here... interacts with memory and LLM
        # Example: Ask how it is doing based on supervisor's prompt
        user_input = await ctx.get("input", "") # Get input if passed directly
        await memory.aput(ChatMessage(role="user", content="How are you?")) # Example interaction
        response = await self.llm.achat([self.system_message] + memory.get())
        await memory.aput(response.message)
        await ctx.set("memory", memory) # Update context memory
        return StopEvent(result=response.message) # Return result

# Create the custom agent
cool_agent = CoolAgent(
    name="cool_agent",
    description="A cool agent that does cool things.",
    llm=llm,
    system_prompt="You are a cool agent. Be super super cool."
)

# Use it in the supervisor
supervisor_with_cool = Supervisor(
    llm=llm,
    agents=[cool_agent] # Pass the custom agent instance
)

# Run it
result = await supervisor_with_cool.run(
    input="Ask the cool agent how it is doing."
)
print(result)

Agent Handoff Mechanism

The Supervisor automatically creates internal "handoff" tools (transfer_to_<agent_name>) for each agent provided.
When the supervisor's LLM decides to delegate a task, it calls the corresponding handoff tool.
The Supervisor intercepts this tool call, prepares the context (including chat history), and runs the designated agent's workflow (agent.run(ctx=...)).
If add_handoff_back_messages=True (default), special messages are added to the history when control returns to the supervisor, indicating the handoff completion.

Adding Memory / Context

Memory in llama-index-supervisor is managed through the llama_index.core.workflow.Context object and uses llama_index.core.memory.ChatMemoryBuffer.

The Context is passed to the supervisor.run() method.
The supervisor automatically retrieves or initializes a ChatMemoryBuffer within the context under the key "memory".
You can pre-populate the memory before running the supervisor:

# filepath: examples.ipynb (snippet)
from llama_index.core.memory import ChatMemoryBuffer
from llama_index.core.llms import ChatMessage
from llama_index.core.workflow import Context

# Assume supervisor, math_agent, research_agent are defined

ctx = Context(supervisor) # Create context linked to the supervisor workflow

# Pre-populate memory
memory=ChatMemoryBuffer.from_defaults(
    chat_history=[
        ChatMessage(role="user", content="What is 2+2?"),
        ChatMessage(role="assistant", content="2 + 2 = 4"), # Example previous turn
    ]
)
await ctx.set("memory", memory) # Set the pre-populated memory in the context

# Run the supervisor with the context containing pre-populated memory
response = await supervisor.run(
    input="Now multiply that by 3.", # Follow-up question
    ctx=ctx,
)

Contributing

Contributions are welcome! Please feel free to open issues or submit pull requests for any enhancements, bug fixes, or new features.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
assets		assets
llama_index_supervisor		llama_index_supervisor
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
examples.ipynb		examples.ipynb
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🤖 llama-index-supervisor

Features

Installation

Message History Management

Hierarchical Supervisors

Using Custom Agents (Workflows)

Agent Handoff Mechanism

Adding Memory / Context

Contributing

About

Uh oh!

Uh oh!

Languages

License

johnmalek312/llama-index-supervisor

Folders and files

Latest commit

History

Repository files navigation

🤖 llama-index-supervisor

Features

Installation

Message History Management

Hierarchical Supervisors

Using Custom Agents (Workflows)

Agent Handoff Mechanism

Adding Memory / Context

Contributing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages