-
Notifications
You must be signed in to change notification settings - Fork 537
Description
@icwhite
When running evaluation_script.py
with multi-agent task setup, the script fails due to a tmux
session conflict. Specifically, it attempts to create a new tmux session with the same name (server_0
) even though a session with that name already exists or was not properly cleaned up. This causes subprocess.run(...)
to raise a CalledProcessError
.
Steps to Reproduce:
-
Run the following command:
python tasks/evaluation_script.py --task_path tasks/crafting_tasks/test_tasks/2_agent.json --model gemini-2.0-flash --template_profile profiles/tasks/crafting_profile.json
-
If a tmux session named
server_0
already exists or is in an inconsistent state, the following error occurs:duplicate session: server_0 subprocess.CalledProcessError: Command '['tmux', 'new-session', '-d', '-s', 'server_0']' returned non-zero exit status 1.
Expected Behavior:
If a tmux session named server_0
already exists, the script should either:
- Kill the existing session safely before launching a new one, or
- Generate a unique session name to avoid conflict.
Observed Behavior:
The script retries launching the world but fails due to tmux
session name collision. It leads to no tasks completing successfully.
Possible Fix Suggestions:
- Add a check and safe cleanup before attempting to create the session:
subprocess.run(['tmux', 'kill-session', '-t', session_name], check=False)
- Or generate unique session names using timestamps or UUIDs.
Environment:
- OS: macOS
- Python: 3.13 (via Anaconda)
- Command used: see above
- Script:
evaluation_script.py