A Python library for creating hierarchical multi-agent systems using LlamaIndex. Hierarchical systems are a type of multi-agent architecture where specialized agents are coordinated by a central supervisor agent. The supervisor controls all communication flow and task delegation, making decisions about which agent to invoke based on the current context and task requirements. (inspired by Langgraph Supervisor)
- π€ Supervisor Workflow: Create a supervisor to orchestrate multiple specialized agents or tools using LlamaIndex's workflow engine.
- π οΈ Agent Handoff: Built-in mechanism for the supervisor to hand off tasks to appropriate agents.
- π Flexible Message History: Control how agent interactions are reflected in the overall chat history (
full_history
orlast_message
). - π§© Custom Agent Integration: Use custom LlamaIndex
Workflow
classes as agents within the supervisor framework. - π³ Hierarchical Structures: Build multi-level agent hierarchies by using supervisors as agents within other supervisors. Supports adding structure to the context (
add_tree_structure
). - π·οΈ Agent Name Attribution: Automatically adds the agent's name to its messages for clarity in the history (
name_addition
).
pip install llama-index-supervisor
Here's a simple example of a supervisor managing two specialized agents (a math expert and a research expert) using LlamaIndex components. The diagram below illustrates the basic architecture:
# Ensure you have OPENAI_API_KEY set in your environment
# import os
# os.environ["OPENAI_API_KEY"] = "sk-..."
from llama_index.llms.openai import OpenAI
from llama_index_supervisor import Supervisor
from llama_index.core.agent.function_calling import FunctionAgent
from llama_index.core.tools import FunctionTool
from llama_index.core.workflow import Context
# Initialize the LLM
llm = OpenAI(model="gpt-4o", temperature=0)
# Define tools
def add(a: float, b: float) -> float:
"""Add two numbers."""
return a + b
def multiply(a: float, b: float) -> float:
"""Multiply two numbers."""
return a * b
def web_search(query: str) -> str:
"""Search the web for information."""
# Replace with a real web search implementation if needed
print(f"--- Searching web for: {query} ---")
return (
"Here are the headcounts for each of the FAANG companies in 2024:\n"
"1. **Facebook (Meta)**: 67,317 employees.\n"
"2. **Apple**: 164,000 employees.\n"
"3. **Amazon**: 1,551,000 employees.\n"
"4. **Netflix**: 14,000 employees.\n"
"5. **Google (Alphabet)**: 181,269 employees."
)
# Create function tools
add_tool = FunctionTool.from_defaults(fn=add)
multiply_tool = FunctionTool.from_defaults(fn=multiply)
search_tool = FunctionTool.from_defaults(fn=web_search)
# Create specialized agents using LlamaIndex FunctionAgent
math_agent = FunctionAgent.from_tools(
tools=[add_tool, multiply_tool],
llm=llm,
name="math_expert",
system_prompt="You are a math expert. Always use one tool at a time.",
description="Specialized in performing mathematical calculations like addition and multiplication." # Added description
)
research_agent = FunctionAgent.from_tools(
tools=[search_tool],
llm=llm,
name="research_expert",
system_prompt="You are a world class researcher with access to web search. Do not do any math.",
description="Specialized in searching the web for information." # Added description
)
# Create supervisor workflow
# The supervisor automatically gets descriptions from the agents
supervisor = Supervisor(
llm=llm,
agents=[math_agent, research_agent],
# Optional: Add a tree structure representation to the system prompt
add_tree_structure=True,
# Optional: Set a timeout for the workflow
timeout=60
)
# Run the workflow
# Context manages state, including memory, across workflow steps
ctx = Context(supervisor)
response = await supervisor.run(
input="what's the combined headcount of the FAANG companies in 2024?",
ctx=ctx # Pass the context
)
# Print the response directly
print(response)
The output_mode
parameter in the Supervisor
constructor controls how messages from delegated agents are added back to the supervisor's main chat history:
-
output_mode="full_history"
(Default): Includes all messages (intermediate steps, tool calls, final response) from the delegated agent's run. -
output_mode="last_message"
: Includes only the final message generated by the delegated agent.
# Example: Only include the final response from agents
supervisor_last_only = Supervisor(
llm=llm,
agents=[math_agent, research_agent],
output_mode="last_message"
)
You can create multi-level hierarchies by using Supervisor
instances as agents within another Supervisor
. Ensure each agent (including supervisors acting as agents) has a unique name
and a description
.
# Assume math_agent, research_agent, writing_agent, publishing_agent are defined
# (writing_agent and publishing_agent would need tools and descriptions)
# Create mid-level supervisors
research_supervisor = Supervisor(
llm=llm,
agents=[research_agent, math_agent],
name="research_supervisor",
description="Manages a research team with experts in web research and math.",
system_prompt="You manage a research team. Delegate tasks appropriately.",
timeout=60
)
writing_supervisor = Supervisor(
llm=llm,
# agents=[writing_agent, publishing_agent], # Add writing/publishing agents here
name="writing_supervisor",
description="Manages a content team with experts in writing and publishing.",
system_prompt="You manage a content team. Delegate tasks appropriately.",
timeout=60
)
# Create top-level supervisor
top_level_supervisor = Supervisor(
llm=llm,
agents=[research_supervisor, writing_supervisor],
name="top_level_supervisor",
description="Executive supervisor coordinating research and writing teams.",
system_prompt="You are the executive supervisor. For research/math, use research_supervisor. For content/publishing, use writing_supervisor.",
timeout=120, # Increase timeout for multi-level
add_tree_structure=True # Useful for complex hierarchies
)
# Run the top-level supervisor
ctx_top = Context(top_level_supervisor)
query = "Research the FAANG headcounts for 2024, calculate the total, and then write a brief summary."
response_top = await top_level_supervisor.run(
input=query,
ctx=ctx_top,
)
# Print the final response
print(response)
You can define your own agent logic by creating a class that inherits from llama_index.core.workflow.Workflow
. This custom workflow can then be passed as an agent to the Supervisor
.
See the CoolAgent
example in CoolAgent.ipynb
(Colab Link) for a demonstration. The key is that your custom workflow class needs name
and description
.
# simplified snippet from CoolAgent.ipynb
from llama_index.core.workflow import Workflow, Context, StartEvent, StopEvent, step
from llama_index.core.llms import LLM, ChatMessage
from llama_index.core.memory import ChatMemoryBuffer
class CoolAgent(Workflow):
def __init__(self, name: str, description: str, llm: LLM, system_prompt: str = "You are a cool agent. Be cool."):
super().__init__()
self.name = name
self.description = description
self.llm = llm
self.system_message = ChatMessage(role="system", content=system_prompt)
@step
async def start_flow(self, ctx: Context, ev: StartEvent) -> StopEvent:
memory: ChatMemoryBuffer = await ctx.get("memory", default=ChatMemoryBuffer.from_defaults(llm=self.llm))
# Agent logic here... interacts with memory and LLM
# Example: Ask how it is doing based on supervisor's prompt
user_input = await ctx.get("input", "") # Get input if passed directly
await memory.aput(ChatMessage(role="user", content="How are you?")) # Example interaction
response = await self.llm.achat([self.system_message] + memory.get())
await memory.aput(response.message)
await ctx.set("memory", memory) # Update context memory
return StopEvent(result=response.message) # Return result
# Create the custom agent
cool_agent = CoolAgent(
name="cool_agent",
description="A cool agent that does cool things.",
llm=llm,
system_prompt="You are a cool agent. Be super super cool."
)
# Use it in the supervisor
supervisor_with_cool = Supervisor(
llm=llm,
agents=[cool_agent] # Pass the custom agent instance
)
# Run it
result = await supervisor_with_cool.run(
input="Ask the cool agent how it is doing."
)
print(result)
- The
Supervisor
automatically creates internal "handoff" tools (transfer_to_<agent_name>
) for each agent provided. - When the supervisor's LLM decides to delegate a task, it calls the corresponding handoff tool.
- The
Supervisor
intercepts this tool call, prepares the context (including chat history), and runs the designated agent's workflow (agent.run(ctx=...)
). - If
add_handoff_back_messages=True
(default), special messages are added to the history when control returns to the supervisor, indicating the handoff completion.
Memory in llama-index-supervisor
is managed through the llama_index.core.workflow.Context
object and uses llama_index.core.memory.ChatMemoryBuffer
.
- The
Context
is passed to thesupervisor.run()
method. - The supervisor automatically retrieves or initializes a
ChatMemoryBuffer
within the context under the key"memory"
. - You can pre-populate the memory before running the supervisor:
# filepath: examples.ipynb (snippet)
from llama_index.core.memory import ChatMemoryBuffer
from llama_index.core.llms import ChatMessage
from llama_index.core.workflow import Context
# Assume supervisor, math_agent, research_agent are defined
ctx = Context(supervisor) # Create context linked to the supervisor workflow
# Pre-populate memory
memory=ChatMemoryBuffer.from_defaults(
chat_history=[
ChatMessage(role="user", content="What is 2+2?"),
ChatMessage(role="assistant", content="2 + 2 = 4"), # Example previous turn
]
)
await ctx.set("memory", memory) # Set the pre-populated memory in the context
# Run the supervisor with the context containing pre-populated memory
response = await supervisor.run(
input="Now multiply that by 3.", # Follow-up question
ctx=ctx,
)
Contributions are welcome! Please feel free to open issues or submit pull requests for any enhancements, bug fixes, or new features.