SWE-bench Local LLM Testing

This project provides a framework for testing local LLM models using the SWE-bench benchmark. It allows you to evaluate how well your local language model performs on real-world software engineering tasks.

Setup

Install the required dependencies:

pip install -r requirements.txt

Update the following paths in test_swe_bench.py:
- test_cases_file: Path to your SWE-bench test cases JSON file
- model_path: Path to your local LLM model

Usage

Prepare your test cases in JSON format following the SWE-bench format:

[
  {
    "test_case_id": "unique_id",
    "problem_description": "Description of the problem",
    "base_code": "Original code",
    "target_file": "Path to the file to modify",
    "target_line": "Line number to modify"
  }
]

Run the test script:

python test_swe_bench.py

The script will:

Load your local LLM model
Process each test case using SWE-bench's prompting module
Generate solutions using your model
Run the test cases using SWE-bench's test runner
Save the results to results.json

Output

The script generates a results.json file containing:

Generated solutions for each test case
Test results (pass/fail)
Metadata about the model and generation parameters

Requirements

Python 3.8+
CUDA-capable GPU (recommended)
Sufficient RAM to load your local LLM model
SWE-bench test cases in JSON format

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
swe-bench-repo		swe-bench-repo
.gitignore		.gitignore
README.md		README.md
TECHNICAL_REPORT.md		TECHNICAL_REPORT.md
TECH_152__Ranjana_Raghavan_Report.pdf		TECH_152__Ranjana_Raghavan_Report.pdf
case_analysis.tex		case_analysis.tex
final_results.json		final_results.json
requirements.txt		requirements.txt
results.json		results.json
results_table.tex		results_table.tex
similarity.json		similarity.json
similarity_metrics_table.tex		similarity_metrics_table.tex
test_cases.json		test_cases.json
test_swe_bench.py		test_swe_bench.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SWE-bench Local LLM Testing

Setup

Usage

Output

Requirements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

RanjanaRaghavan/swe-bench-evaluation

Folders and files

Latest commit

History

Repository files navigation

SWE-bench Local LLM Testing

Setup

Usage

Output

Requirements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages