You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"""Real-time detection and remediation of bad responses in RAG applications, powered by Cleanlab's TrustworthyRAG and Codex.
26
27
@@ -35,7 +36,7 @@ def __init__(
35
36
codex_access_key (str): The [access key](/codex/web_tutorials/create_project/#access-keys) for a Codex project. Used to retrieve expert-provided answers
36
37
when bad responses are detected, or otherwise log the corresponding queries for SMEs to answer.
37
38
38
-
custom_eval_thresholds (dict[str, float], optional): Custom thresholds (between 0 and 1) for specific evals.
39
+
eval_thresholds (dict[str, float], optional): Custom thresholds (between 0 and 1) for specific evals.
39
40
Keys should either correspond to an Eval from [TrustworthyRAG](/tlm/api/python/utils.rag/#class-trustworthyrag)
40
41
or a custom eval for your project. If not provided, project settings will be used.
41
42
@@ -45,9 +46,9 @@ def __init__(
45
46
ValueError: If any threshold value is not between 0 and 1.
"""Evaluate whether the AI-generated response is bad, and if so, request an alternate expert answer.
65
67
If no expert answer is available, this query is still logged for SMEs to answer.
66
68
@@ -71,14 +73,17 @@ def validate(
71
73
prompt (str, optional): Optional prompt representing the actual inputs (combining query, context, and system instructions into one string) to the LLM that generated the response.
72
74
form_prompt (Callable[[str, str], str], optional): Optional function to format the prompt based on query and context. Cannot be provided together with prompt, provide one or the other. This function should take query and context as parameters and return a formatted prompt string. If not provided, a default prompt formatter will be used. To include a system prompt or any other special instructions for your LLM, incorporate them directly in your custom form_prompt() function definition.
73
75
metadata (dict, optional): Additional custom metadata to associate with the query logged in the Codex Project.
74
-
options (ProjectValidateOptions, optional): Typed dict of advanced configuration options for the Trustworthy Language Model.
76
+
eval_scores (dict[str, float], optional): Scores assessing different aspects of the RAG system. If provided, TLM Trustworthy RAG will not be used to generate scores.
77
+
options (ProjectValidateOptions, optional): Typed dict of advanced TLM configuration options. See [TLMOptions](/tlm/api/python/tlm/#class-tlmoptions)
75
78
quality_preset (Literal["best", "high", "medium", "low", "base"], optional): The quality preset to use for the TLM or Trustworthy RAG API.
76
79
77
80
Returns:
78
-
dict[str, Any]: A dictionary containing:
79
-
- 'expert_answer': Alternate SME-provided answer from Codex if the response was flagged as bad and an answer was found in the Codex Project, or None otherwise.
80
-
- 'is_bad_response': True if the response is flagged as potentially bad, False otherwise. When True, a Codex lookup is performed, which logs this query into the Codex Project for SMEs to answer.
81
-
- Additional keys from a [`ThresholdedTrustworthyRAGScore`](/codex/api/python/types.validator/#class-thresholdedtrustworthyragscore) dictionary: each corresponds to a [TrustworthyRAG](/tlm/api/python/utils.rag/#class-trustworthyrag) evaluation metric, and points to the score for this evaluation as well as a boolean `is_bad` flagging whether the score falls below the corresponding threshold.
81
+
ProjectValidateResponse: A response object containing:
82
+
- eval_scores (Dict[str, EvalScores]): Evaluation scores for the original response along with a boolean flag, `failed`,
83
+
indicating whether the score is below the threshold.
84
+
- expert_answer (Optional[str]): Alternate SME-provided answer from Codex if the response was flagged as bad and
85
+
an answer was found in the Codex Project, or None otherwise.
86
+
- is_bad_response (bool): True if the response is flagged as potentially bad and triggered escalation to SMEs.
82
87
"""
83
88
formatted_prompt=prompt
84
89
ifnotformatted_prompt:
@@ -92,27 +97,14 @@ def validate(
92
97
ifnotformatted_prompt:
93
98
raiseValueError("Exactly one of prompt or form_prompt is required") # noqa: TRY003
0 commit comments