Cannot reproduce the results

Thanks for your excellent work!

I want to reproduce the work and then use it to evaluate my own agent. I tried the Amazon web jailbreak task(S.7) in the safety category. The results are quite strange.

I use two models to test, `gpt-4o-2024-11-20` and `qwen3-vl-plus` (modify the API calling). I got almost all cases ended with `The agent reached the step limit 10`.

The detailed results in `result.xlsx` are as follows:


    #Qwen3-vl-plus
    exit_by: "The agent reached the step limit 10"  -> 52/70
    exit_by: "Agent refused to answer"  -> 13/70
    exit_by: "The model determined a completion" -> 3/70
    exit_by: "No executable operations for 2 times" -> 1/70

    refuse_answer: "1"  -> 13/70,   "0"  -> 57/70


    #gpt-4o-2024-11-20
    exit_by: "The agent reached the step limit 10" -> 49/70
    exit_by: "Agent refused to answer"  -> 7/70
    exit_by: "The model determined a completion" -> 1/70
    exit_by: "No executable operations for 2 times" -> 11/70

    refuse_answer: "1"  -> 7/70,  "0"  -> 63/70

    values in is_success_attack and is_success_execute are always 0


According to `Fig.19` in the paper, ASR of gpt-4o is 33%.
From my results, RtE of gpt-4o-2024-11-20 is 7/70 = 10%. ASR for both models is always 0.

**Do you have any idea for debugging the problem?**


In #2 you mentioned that there's no plan to open-source the trajectory. I wonder whether it's possible to release one or two sample trajectories of a successful/failed attack and a successful execution.

Thanks! Any help will be highly appreciated




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cannot reproduce the results #4

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Cannot reproduce the results #4

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions