-
Notifications
You must be signed in to change notification settings - Fork 205
Description
There's a significant difference between the prompt template in the Agentless source code vs. what was actually used for Verified (based on the the output log files):
Source Code Prompt (agentless/test/select_regression_tests.py#L26):
Please identify the tests that should not be run after applying the patch to fix the issue.
These tests should be excluded as the original functionality may change due to the patch.
Actual Prompt Used (from the output file Agentless/releases/download/v1.5.0/agentless_swebench_verified.zip ):
The prompt below is in the output.jsonl
and each instance log file agentless_swebench_verified/select_regression/select_test_logs/astropy__astropy-7166.log
Please select a subset of regression tests to ensure the original functionality of the repository is not affected after applying a patch to fix the bug.
Note that some regression tests should not be run if the patch changes the behavior of those functionalities.
Your task is to choose the necessary subset of regression tests to run.
Key Differences:
- Intent:
- Source code: Asks to identify tests to exclude/not run
- Actual prompt: Asks to select tests to run - Selection Logic:
- Source code: Exclude tests that may be affected by functionality changes
- Actual prompt: Choose a subset to ensure original functionality isn't affected - Output Interpretation:
- Source code: Lists tests to remove from regression suite
- Actual prompt: Lists tests to include in regression suite
NOTE: The logs for agentless_swebench_lite/select_regression/select_test_logs/django__django-10924.log
show the same prompt as present in the source code.
@brutalsavage Can you please take a look and clarify what approach is used and provide more details.