You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now, the contains evaluator is limited to checking for a single, static string across all test cases. For example, we can set it to check if the model's output always contains "thank you," but we can't change that expected string from one test case to the next.
Proposed Change
We propose allowing the contains evaluator to use variables from the test set. This would let us specify a different string to check for in each individual test case by adding a column to our dataset.
For example, we could have a dataset like this:
[
{
"user_message": "Here is my email.",
"expected_phrase": "thank you"
},
{
"user_message": "I have a bug.",
"expected_phrase": "sorry"
}
]
The evaluator would then check if the LLM output contains the value from the expected_phrase variable for that specific row ({{testcase.expected_phrase}}).
Benefit
This would make the evaluator much more flexible and powerful. It would allow us to create more specific, "unit test" style evaluations where the expected content changes depending on the input, leading to more accurate testing.
The one area i've stumbed on is the evaluator. I've made a dataset with input and correct_answer and want to check the text in correct_answer is in the response from the LLM. I tried to do that like this but I think I can't use variables here (see screenshot)
...would anyone be able to put me on the right track please? (edited)
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Current Limitation
Right now, the
contains
evaluator is limited to checking for a single, static string across all test cases. For example, we can set it to check if the model's output always contains "thank you," but we can't change that expected string from one test case to the next.Proposed Change
We propose allowing the
contains
evaluator to use variables from the test set. This would let us specify a different string to check for in each individual test case by adding a column to our dataset.For example, we could have a dataset like this:
The evaluator would then check if the LLM output contains the value from the
expected_phrase
variable for that specific row ({{testcase.expected_phrase}}
).Benefit
This would make the evaluator much more flexible and powerful. It would allow us to create more specific, "unit test" style evaluations where the expected content changes depending on the input, leading to more accurate testing.
Original Request by Henri
https://agenta-hq.slack.com/archives/C05JDQWKD6E/p1755012840660029
Beta Was this translation helpful? Give feedback.
All reactions