Skip to content

o3-mini might use WorkArena for training, is it a problem? #68

@rowingchenn

Description

@rowingchenn

Hi team,
I'm a fan of your series of solid works. I ran o3-mini on WorkArena and found an unexpected behavior:

Image

I believe such actions could only be made by training with your dataset and golden trajectories on WorkArena, which is quite toxic and unfair when comparing new models with old models. Is there any other explanation? According to my observations, there is no such information in the context.

Just bring it up and hope your team knows it. Also wondering what's your comment is on this. I'll link the complete trajectory log of o3-mini running on the task.

2025-04-03_07-19-49_GenericAgent-o3-mini-2025-01-31_on_workarena.servicenow.infeasible-navigate-and-create-user-l2_89.zip

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions