Skip to content

Difference between the prompt template in the source code vs. what was actually used for Verified #77

@jatinganhotra

Description

@jatinganhotra

There's a significant difference between the prompt template in the Agentless source code vs. what was actually used for Verified (based on the the output log files):

Source Code Prompt (agentless/test/select_regression_tests.py#L26):

Please identify the tests that should not be run after applying the patch to fix the issue.
These tests should be excluded as the original functionality may change due to the patch.

Actual Prompt Used (from the output file Agentless/releases/download/v1.5.0/agentless_swebench_verified.zip ):

The prompt below is in the output.jsonl and each instance log file agentless_swebench_verified/select_regression/select_test_logs/astropy__astropy-7166.log

Please select a subset of regression tests to ensure the original functionality of the repository is not affected after applying a patch to fix the bug.
Note that some regression tests should not be run if the patch changes the behavior of those functionalities.
Your task is to choose the necessary subset of regression tests to run.

Key Differences:

  1. Intent:
    - Source code: Asks to identify tests to exclude/not run
    - Actual prompt: Asks to select tests to run
  2. Selection Logic:
    - Source code: Exclude tests that may be affected by functionality changes
    - Actual prompt: Choose a subset to ensure original functionality isn't affected
  3. Output Interpretation:
    - Source code: Lists tests to remove from regression suite
    - Actual prompt: Lists tests to include in regression suite

NOTE: The logs for agentless_swebench_lite/select_regression/select_test_logs/django__django-10924.log show the same prompt as present in the source code.

@brutalsavage Can you please take a look and clarify what approach is used and provide more details.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions