This project provides a framework for testing local LLM models using the SWE-bench benchmark. It allows you to evaluate how well your local language model performs on real-world software engineering tasks.
- Install the required dependencies:
pip install -r requirements.txt
- Update the following paths in
test_swe_bench.py
:test_cases_file
: Path to your SWE-bench test cases JSON filemodel_path
: Path to your local LLM model
- Prepare your test cases in JSON format following the SWE-bench format:
[
{
"test_case_id": "unique_id",
"problem_description": "Description of the problem",
"base_code": "Original code",
"target_file": "Path to the file to modify",
"target_line": "Line number to modify"
}
]
- Run the test script:
python test_swe_bench.py
The script will:
- Load your local LLM model
- Process each test case using SWE-bench's prompting module
- Generate solutions using your model
- Run the test cases using SWE-bench's test runner
- Save the results to
results.json
The script generates a results.json
file containing:
- Generated solutions for each test case
- Test results (pass/fail)
- Metadata about the model and generation parameters
- Python 3.8+
- CUDA-capable GPU (recommended)
- Sufficient RAM to load your local LLM model
- SWE-bench test cases in JSON format