-
Notifications
You must be signed in to change notification settings - Fork 29
Open
Description
Hi team,
I'm a fan of your series of solid works. I ran o3-mini on WorkArena and found an unexpected behavior:
I believe such actions could only be made by training with your dataset and golden trajectories on WorkArena, which is quite toxic and unfair when comparing new models with old models. Is there any other explanation? According to my observations, there is no such information in the context.
Just bring it up and hope your team knows it. Also wondering what's your comment is on this. I'll link the complete trajectory log of o3-mini running on the task.
aldro61
Metadata
Metadata
Assignees
Labels
No labels