Skip to content

galadriel-ai/smol-manus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Open Deep Research

Welcome to this open replication of OpenAI's Deep Research!

Read more about this implementation's goal and methods in our blog post.

This agent achieves 55% pass@1 on GAIA validation set, vs 67% for Deep Research.

Setup

Installation

To install it, first run

pip install -r requirements.txt
# To use browser-use
playwright install

Environment variables

The agent uses the GoogleSearchTool for web search, which requires an environment variable with the corresponding API key, based on the selected provider:

Depending on the model you want to use, you may need to set environment variables. For example, to use the default o1 model, you need to set the OPENAI_API_KEY environment variable. Sign up here to get a key.

Warning

The use of the default o1 model is restricted to tier-3 access: https://help.openai.com/en/articles/10362446-api-access-to-o1-and-o3-mini

Usage

Then you're good to go! Run the run.py script, as in:

python run.py --model-id "o1" "Your question here!"

Benchmarking

If --count is specified uses a hardocded seed to take samples from the dataset

Run benchmark

ls output/validation
# Take run name from last filename+1
PYTHONPATH=. nohup python benchmarking/run_gaia.py --run-name {name} --concurrency 4 --model-id {model} --count {count} > logs.log &
# Full example
PYTHONPATH=. nohup python benchmarking/run_gaia.py --run-name test_0001 --concurrency 4 --model-id gpt-4o --count 20 > logs.log &

tail -f logs.log

Check results

PYTHONPATH=. python benchmarking/score.py --filename {run_name}.jsonl
# Full example
PYTHONPATH=. python benchmarking/score.py --filename test_0001.jsonl

Telemetry

This project uses Langfuse for telemetry and observability. Telemetry helps track agent performance, model usage, and provides insights into how the agent processes queries.

Setting up Telemetry

  1. Access the Langfuse dashboard at http://10.132.0.99:3000/ (Galadriel VPN must be turned on)
  2. Create a new project or use an existing one
  3. Generate public and secret API keys from the dashboard
  4. Add these keys to your .env file:
LANGFUSE_PUBLIC_KEY="pk-..."
LANGFUSE_SECRET_KEY="sk-..."

Running with Telemetry

Telemetry must be enabled in the create_smol_manus_agent function. You can see this in run_smol_manus.py:

agent = create_smol_manus_agent(model_id=args.model_id, telemetry=True, sandbox=sandbox)

You can view all telemetry data in the Langfuse dashboard after running queries.

Disabling Telemetry

If you want to disable telemetry, set the telemetry parameter to False when creating the agent:

agent = create_smol_manus_agent(model_id=args.model_id, telemetry=False, sandbox=sandbox)