-
Notifications
You must be signed in to change notification settings - Fork 5
data_critic: add subagent feature that performs data analysis #300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- Changing data_quality from a Literal enum to a float will break any downstream consumers expecting the old values; please add a migration path or compatibility layer.
- The data_toolset is always wrapped and injected even when AEGIS_DATA_ENABLED is false, resulting in an empty toolset in agents; consider conditionally registering and wrapping it only when enabled.
- You’ve duplicated the ‘use data_critic tool’ instructions in both SuggestImpact and SuggestRemediation—extract that into a shared directive or helper to avoid drift.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Changing data_quality from a Literal enum to a float will break any downstream consumers expecting the old values; please add a migration path or compatibility layer.
- The data_toolset is always wrapped and injected even when AEGIS_DATA_ENABLED is false, resulting in an empty toolset in agents; consider conditionally registering and wrapping it only when enabled.
- You’ve duplicated the ‘use data_critic tool’ instructions in both SuggestImpact and SuggestRemediation—extract that into a shared directive or helper to avoid drift.
## Individual Comments
### Comment 1
<location> `src/aegis_ai/toolsets/tools/data_critic/__init__.py:22-28` </location>
<code_context>
+ return await cwe_data_critic.exec(
+ cve_data.cve_id, static_context=cve_data.model_dump()
+ )
+ except ValueError:
+ return CVEDataCriticToolInput(status="failure", **cve_data)
+
+
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Catching only ValueError may miss other exceptions during analysis.
Consider catching additional exception types or logging unexpected errors to improve robustness.
```suggestion
try:
cwe_data_critic = cve.CVEDataCritic(agent=data_critic_agent)
return await cwe_data_critic.exec(
cve_data.cve_id, static_context=cve_data.model_dump()
)
except ValueError:
return CVEDataCriticToolInput(status="failure", **cve_data)
except Exception as exc:
logger.exception("Unexpected error during CVE data analysis")
return CVEDataCriticToolInput(status="failure", **cve_data)
```
</issue_to_address>
### Comment 2
<location> `src/aegis_ai/toolsets/tools/data_critic/__init__.py:28` </location>
<code_context>
+ cve_data.cve_id, static_context=cve_data.model_dump()
+ )
+ except ValueError:
+ return CVEDataCriticToolInput(status="failure", **cve_data)
+
+
</code_context>
<issue_to_address>
**issue (bug_risk):** Returning CVEDataCriticToolInput on failure may not match expected output schema.
Returning CVEDataCriticToolInput instead of CVEDataCriticOutput on error may cause type mismatches. Please return a default CVEDataCriticOutput with an error status.
</issue_to_address>
### Comment 3
<location> `src/aegis_ai/agents/data.py:16-25` </location>
<code_context>
+ "seed": 42,
+ "response_format": {"type": "json_object"},
+ },
+ system_prompt="""
+ You are a Red Hat product security assistant assessing data quality.
+ Goals: Perform a fast and concise analysis of data quality.
+ Rules:
+ - Prefer facts over speculation; cite only provided context/tools.
+ - Never use any tools.
+ - Keep answers short and directly useful.
+ - Output must match the requested JSON schema when provided.
+ Safety: refuse harmful or unethical requests.
+ """,
+)
</code_context>
<issue_to_address>
**question:** System prompt prohibits tool usage, but agent is used for tool-based analysis.
Please update the prompt or agent configuration to avoid conflicting instructions and ensure consistent behavior.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
e6157ac to
e1262b7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes and they look great!
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location> `src/aegis_ai/toolsets/tools/data_critic/__init__.py:27-36` </location>
<code_context>
+ pass
+
+
+async def analyse_cve_data(cve_id: CVEID, cve_data: Any) -> CVEDataCriticOutput:
+ from aegis_ai.agents.data import default as data_critic_agent
+
+ try:
+ cwe_data_critic = cve.CVEDataCritic(agent=data_critic_agent)
+ return await cwe_data_critic.exec(
+ cve_data.cve_id, static_context=cve_data.model_dump()
+ )
+ except Exception:
+ logger.warning("Unexpected error during CVE data analysis.")
+ return CVEDataCriticOutput(
+ data_quality=0,
+ confidence=0,
</code_context>
<issue_to_address>
**suggestion:** Exception handling in analyse_cve_data may mask underlying errors.
Log the exception details or stack trace to improve error traceability and simplify debugging.
```suggestion
except Exception as exc:
logger.exception("Unexpected error during CVE data analysis.")
return CVEDataCriticOutput(
data_quality=0,
confidence=0,
tools_used=[],
cve_id=cve_id,
disclaimer=CVEDataCriticOutput.disclaimer,
explanation=f"Unexpected error during CVE data analysis: {exc}",
)
```
</issue_to_address>
### Comment 2
<location> `src/aegis_ai/features/cve/__init__.py:24` </location>
<code_context>
+class CVEDataCritic(Feature):
+ """Based on current CVE information and context assess quality of CVE information."""
+
+ async def exec(self, cve_id: CVEID, static_context: Any = None, usage=None):
+ prompt = AegisPrompt(
+ user_instruction="Your task is to examine the provided CVE data and generate a fast and brief data assessment of its quality, confidence, completeness and internal consistency.",
</code_context>
<issue_to_address>
**question:** The exec method signature for CVEDataCritic differs from other Feature classes.
If the usage parameter is required for agent compatibility, document this difference and notify all consumers. Otherwise, align the method signature with other Feature classes for consistency.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
e1262b7 to
876b200
Compare
|
/retest |
|
when this is enabled: and eval tests are run - they timeout due to call to data subagent (which is set to use |
The reason is that the |
|
ah thx @kdudka - will fix |
| ) | ||
|
|
||
| # chain logging wrappers | ||
| data_toolset = LoggingToolset(data_toolset, no_args=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of no_args=True does not take any effect because WrapperToolset is a data class and you set the flag on a different object than you read it. The following patch makes it work as expected:
--- a/src/aegis_ai/toolsets/__init__.py
+++ b/src/aegis_ai/toolsets/__init__.py
@@ -7,6 +7,8 @@ import os
import logging
import time
+from dataclasses import dataclass
+
from pydantic_ai.common_tools.tavily import tavily_search_tool
from pydantic_ai.mcp import MCPServerStdio
from pydantic_ai.toolsets import FunctionToolset, CombinedToolset
@@ -38,7 +40,10 @@ from aegis_ai.toolsets.tools.data_critic import data_critic_toolset
logger = logging.getLogger(__name__)
+@dataclass
class LoggingToolset(WrapperToolset):
+ no_args: bool
+
def __init__(self, wrapped, no_args: bool = False):
super().__init__(wrapped)
self.no_args: bool = no_argsI also suggest to apply the following patch to make the code readable:
--- a/src/aegis_ai/toolsets/__init__.py
+++ b/src/aegis_ai/toolsets/__init__.py
@@ -42,18 +42,16 @@ logger = logging.getLogger(__name__)
@dataclass
class LoggingToolset(WrapperToolset):
- no_args: bool
+ print_args: bool
- def __init__(self, wrapped, no_args: bool = False):
+ def __init__(self, wrapped, print_args: bool = True):
super().__init__(wrapped)
- self.no_args: bool = no_args
+ self.print_args = print_args
async def call_tool(self, name: str, tool_args: dict, ctx: RunContext, tool): # type: ignore[override]
# log tool call entry
- args = str(tool_args) if tool_args else ""
+ args = str(tool_args) if self.print_args and tool_args else ""
prefix = f"[tool call] {name}({args})"
- if self.no_args:
- prefix = f"[tool call] {name}()"
start = time.time()
logger.info(f"{prefix} started")
@@ -221,7 +219,7 @@ if use_data_tools in truthy:
)
# chain logging wrappers
-data_toolset = LoggingToolset(data_toolset, no_args=True)
+data_toolset = LoggingToolset(data_toolset, print_args=False)
public_toolset = LoggingToolset(public_toolset)
redhat_cve_toolset = LoggingToolset(redhat_cve_toolset)
public_cve_toolset = LoggingToolset(public_cve_toolset)|
@sourcery-ai resolve |
|
Hi @xquery! 👋 Only authors and team members can run @sourcery-ai commands on public repos. |
What
Adds a data subagent that performs data analysis.
cve.CVEDataCriticfeature)AEGIS_DATA_ENABLED,AEGIS_DATA_LLM_HOST,AEGIS_DATA_LLM_MODEL,AEGIS_DATA_OPENAPI_KEYenv var (recc. using gemini-2.5-flash-lite)AEGIS_AGENT_MAX_RETRIESenv var for generic setting on all agentsNote: set DATA llm agent env vars to use something like gemini-2.5-flash-lite as data analysis should be very quick.
Why
Understanding data quality improves assessment of confidence in analysis. It can also directly help analysis by choosing more generic analysis where data quality is low.
How to Test
Related Tickets
https://issues.redhat.com/browse/AEGIS-74
Summary by Sourcery
Introduce a new data critic subagent and toolset for quantitative data quality analysis, integrate it into the CVE feature pipeline, adjust configuration and schema for numeric data_quality, and update documentation and logging support.
New Features:
Enhancements:
Documentation:
Summary by Sourcery
Introduce a new data critic subagent with its own toolset and agent to perform quantitative data quality analysis, integrate it into the CVE feature pipeline, migrate the data_quality field to a numeric scale, add environment-driven configuration, and refine logging and documentation.
New Features:
Enhancements:
Documentation: