Welcome to this open replication of OpenAI's Deep Research!
Read more about this implementation's goal and methods in our blog post.
This agent achieves 55% pass@1 on GAIA validation set, vs 67% for Deep Research.
To install it, first run
pip install -r requirements.txt
# To use browser-use
playwright install
The agent uses the GoogleSearchTool
for web search, which requires an environment variable with the corresponding API key, based on the selected provider:
SERPAPI_API_KEY
for SerpApi: Sign up here to get a keySERPER_API_KEY
for Serper: Sign up here to get a key
Depending on the model you want to use, you may need to set environment variables.
For example, to use the default o1
model, you need to set the OPENAI_API_KEY
environment variable.
Sign up here to get a key.
Warning
The use of the default o1
model is restricted to tier-3 access: https://help.openai.com/en/articles/10362446-api-access-to-o1-and-o3-mini
Then you're good to go! Run the run.py script, as in:
python run.py --model-id "o1" "Your question here!"
If --count
is specified uses a hardocded seed to take samples from the dataset
Run benchmark
ls output/validation
# Take run name from last filename+1
PYTHONPATH=. nohup python benchmarking/run_gaia.py --run-name {name} --concurrency 4 --model-id {model} --count {count} > logs.log &
# Full example
PYTHONPATH=. nohup python benchmarking/run_gaia.py --run-name test_0001 --concurrency 4 --model-id gpt-4o --count 20 > logs.log &
tail -f logs.log
Check results
PYTHONPATH=. python benchmarking/score.py --filename {run_name}.jsonl
# Full example
PYTHONPATH=. python benchmarking/score.py --filename test_0001.jsonl
This project uses Langfuse for telemetry and observability. Telemetry helps track agent performance, model usage, and provides insights into how the agent processes queries.
- Access the Langfuse dashboard at http://10.132.0.99:3000/ (Galadriel VPN must be turned on)
- Create a new project or use an existing one
- Generate public and secret API keys from the dashboard
- Add these keys to your
.env
file:
LANGFUSE_PUBLIC_KEY="pk-..."
LANGFUSE_SECRET_KEY="sk-..."
Telemetry must be enabled in the create_smol_manus_agent
function. You can see this in run_smol_manus.py
:
agent = create_smol_manus_agent(model_id=args.model_id, telemetry=True, sandbox=sandbox)
You can view all telemetry data in the Langfuse dashboard after running queries.
If you want to disable telemetry, set the telemetry
parameter to False
when creating the agent:
agent = create_smol_manus_agent(model_id=args.model_id, telemetry=False, sandbox=sandbox)