Skip to content

Use case: Run mini-swe-agent on NFS #485

@pcmoritz

Description

@pcmoritz

I've been enjoying running mini-swe-agent on NFS (Network File System)

The way it works is to split up the dataset into shards, and then use several machines in parallel, each one processing a single shard (the indexing can be done via the --slice parameter of swebench.py) and writing the results into a shared working directory that sits on NFS.

There is only a small wrinkle with this, which is that the preds.json file gets corrupted because it is only protected by an in-process lock https://github.com/SWE-agent/mini-swe-agent/blob/main/src/minisweagent/run/extra/swebench.py#L49 -- this leads to errors when another process tries to read a half written file.

Currently I work around this by just commenting out the code that reads and writes preds.json (since I don't actually need that file).

There are several ways to support this workflow better if we wanted:

  • Let people fork the swebench.py script if they want to do similar things
  • Replace the in process lock with a https://pypi.org/project/filelock/ -- this has the advantage that it would allow using mini-swe-agent out of the box on NFS but the disadvantage that it introduces a new dependency (it is a very small and stable dependency thought). If we want to do this, we could also make the dependence on the filelock package optional.
  • Encourage people who want to do something like this to run the shards in different directories and merge them manually at the end

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions