Abnormal cost running SWE bench lite

Hi authors, thanks for the great work!

I'm currently testing `GPT-4o-mini` on _SWE bench lite_ with `text-embedding-3-small` as the embedding model. A whole run cost me more than $500, with about $450 during retrieval phase. I'm wondering if this is the case or something went wrong.

In the paper, the average cost per question is $0.70 when using **GPT-4o**, which means ~$210 running the whole SWE bench lite.

<img width="657" alt="Image" src="https://github.com/user-attachments/assets/ae80b3f0-4883-426d-ac76-55c0d7b5a086" />

It would be really helpful if anyone would like to share the chosen LLM and embedding model with the corresponding cost of one complete run!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Abnormal cost running SWE bench lite #73

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Abnormal cost running SWE bench lite #73

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions