Skip to content

Wazuh: Add separate parser 4.7 & 4.8 #12841

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 20 commits into from

Conversation

9alexx3
Copy link
Contributor

@9alexx3 9alexx3 commented Jul 22, 2025

Description

This PR introduces two separate parsers for Wazuh ( >= 4.8 (Indexer) AND =< 4.7 (Legacy)) due to significant changes in the data structure introduced in version 4.8 to fix 12634 issue.

The decision to split the parsers was made for long-term maintainability and compatibility, as the new format differs notably from previous versions. Key changes include:
• Renamed JSON parent keys.
• Fields such as title and agent_ip have been removed.
• The cve field is now renamed to id.
• CVSS scores of the differents versions are unified under the same structure and differentiated via a version field.

OLD WAZUH V4.7:

"cvss2_score": 7.5,
"cvss3_score": 9.8,

NEW WAZUH V4.8

"score": {
    "base": 9.1,
    "version": "3.1"
},
"score": {
    "base": 1.9,Q
    "version": "2.0"
},

• Wazuh 4.8 includes additional metadata such as Operating System and Package Information which is dumped to Finding.

Other features:

  • Standard class Names
  • Unittests
  • Bugfixe CVE field
  • Add custom dedupe
image

EXTRA INFORMATION

If you consider that using a single parser with subparsers logic like #12739 would be more appropriate, please let me know and I’ll update manuel's PR with my subparsers branch.

TEST RESULTS:

Unittests OK:
image

image

Also, I have checked with a 289893 lines file with 5000 findings and have 0 warnings/errors.
image

@github-actions github-actions bot added settings_changes Needs changes to settings.py based on changes in settings.dist.py included in this PR unittests parser labels Jul 22, 2025
Copy link

DryRun Security

This pull request contains two findings related to potential memory and resource exhaustion vulnerabilities in JSON parsing methods, specifically in the Wazuh Legacy and Indexer parsers, where large or maliciously crafted JSON files could lead to denial of service by consuming excessive server resources during parsing.

Uncontrolled Data Consumption in dojo/tools/wazuh_legacy/parser.py
Vulnerability Uncontrolled Data Consumption
Description The WazuhLegacyParser uses json.load(file) to parse uploaded scan results. This method loads the entire file into memory. While the application does implement a maximum file size limit of 100MB for uploaded files, a malicious actor could still upload a file close to this limit. Parsing a 100MB JSON file, especially one with deeply nested structures or large arrays, could lead to significant memory consumption, potentially causing resource exhaustion or denial of service for the application, even if the file size is within the allowed limit.

import json
from dojo.models import Endpoint, Finding
class WazuhLegacyParser:
"""
The vulnerabilities with condition "Package unfixed" are skipped because there is no fix out yet.
https://github.com/wazuh/wazuh/issues/14560
"""
"""
Parser used for the Wazuh Detector module used in older versions of 4.7 and below (before Vulnerability Detection refactor).
https://github.com/wazuh/wazuh/releases/tag/v4.8.0
"""
def get_scan_types(self):
return ["Wazuh =< 4.7 Scan"]
def get_label_for_scan_types(self, scan_type):
return "Wazuh =< 4.7 Scan"
def get_description_for_scan_types(self, scan_type):
return "Wazuh =< 4.7 Scan. See the documentation for search a script to obtain a clear output."
def get_findings(self, file, test):
data = json.load(file)
if not data:
return []
findings = []
# Loop through each element in the list
vulnerabilities = data.get("data", {}).get("affected_items", [])
for item in vulnerabilities:
if (
item["condition"] != "Package unfixed"
and item["severity"] != "Untriaged"
):
cve = item.get("cve")
package_name = item.get("name")
package_version = item.get("version")
description = item.get("condition")
severity = item.get("severity").capitalize()
agent_ip = item.get("agent_ip")
links = item.get("external_references")
cvssv3_score = item.get("cvss3_score")
publish_date = item.get("published")
agent_name = item.get("agent_name")
agent_ip = item.get("agent_ip")
detection_time = item.get("detection_time").split("T")[0]
references = "\n".join(links) if links else None
title = (
item.get("title") + " (version: " + package_version + ")"
)
find = Finding(
title=title,
test=test,
description=description,
severity=severity,
references=references,
dynamic_finding=True,
static_finding=False,
component_name=package_name,
component_version=package_version,
cvssv3_score=cvssv3_score,
publish_date=publish_date,
date=detection_time,
)
# in some cases the agent_ip is not the perfect way on how to identify a host. Thus prefer the agent_name, if existant.
if agent_name:
find.unsaved_endpoints = [Endpoint(host=agent_name)]
elif agent_ip:
find.unsaved_endpoints = [Endpoint(host=agent_ip)]
if cve:
find.unsaved_vulnerability_ids = [cve]
findings.append(find)
return findings

Potential Denial of Service (DoS) via Large or Malformed JSON Input in dojo/tools/wazuh_indexer/parser.py
Vulnerability Potential Denial of Service (DoS) via Large or Malformed JSON Input
Description The WazuhIndexerParser uses json.load(file) to parse uploaded JSON files. This function reads the entire file into memory before parsing. While the settings.dist.py file indicates a MAX_UPLOAD_SIZE of 2GB, this limit is applied to the overall file upload, not specifically to the JSON content being parsed. If a malicious actor uploads a large, valid JSON file (up to 2GB), it could consume significant memory resources on the server, potentially leading to a Denial of Service (DoS) by exhausting available memory or CPU, especially if multiple such requests are processed concurrently. The json.load method is also vulnerable to CPU exhaustion with deeply nested JSON structures, even if the file size is small.

import json
from datetime import datetime
from dojo.models import Endpoint, Finding
class WazuhIndexerParser:
def get_scan_types(self):
return ["Wazuh >= 4.8 Scan"]
def get_label_for_scan_types(self, scan_type):
return "Wazuh >= 4.8 Scan"
def get_description_for_scan_types(self, scan_type):
return "Wazuh Vulnerability Data >= 4.8 from indexer in JSON format. See the documentation for search a script to obtain a clear output."
def get_findings(self, file, test):
data = json.load(file)
if not data:
return []
findings = []
vulnerabilities = data.get("hits", {}).get("hits", [])
for item_source in vulnerabilities:
item = item_source.get("_source")
# Get all vulnerability data
vuln = item.get("vulnerability")
description = vuln.get("description")
cve = vuln.get("id")
published_date = datetime.fromisoformat(vuln["published_at"]).date()
references = vuln.get("reference")
severity = vuln.get("severity")
if severity not in {"Critical", "High", "Medium", "Low"}:
severity = "Info"
if vuln.get("score"):
cvss_score = vuln.get("score").get("base")
cvss_version = vuln.get("score").get("version")
cvss3 = cvss_version.split(".")[0]
# Agent is equal to the endpoint
agent = item.get("agent")
agent_id = agent.get("id")
agent_name = agent.get("name")
# agent_ip = agent.get("ip") Maybe... will introduce it in the news versions of Wazuh?
description = (
f"Agent Name/ID: {agent_name} / {agent_id}\n"
f"{description}"
)
# Package in Wazuh is equivalent to "component" in DD
package = item.get("package")
package_name = package.get("name")
package_version = package.get("version")
package_description = package.get("description")
# Only get this field on some Windows agents.
package_path = package.get("path", None)
# Get information about OS from agent.
# This will use for severity justification
info_os = item.get("host")
if info_os and info_os.get("os"):
name_os = info_os.get("os").get("full", "N/A")
kernel_os = info_os.get("os").get("kernel", "N/A")
title = f"{cve} Affects {package_name} (Version: {package_version})"
severity_justification = (
f"Severity: {severity}\n"
f"CVSS Score: {cvss_score}\n"
f"CVSS Version: {cvss_version}\n"
f"\nOS: {name_os}\n"
f"Kernel: {kernel_os}\n\n"
f"Package Name: {package_name}\n"
f"Package Description: {package_description}"
)
finding = Finding(
title=title,
test=test,
description=description,
severity_justification=severity_justification,
severity=severity,
references=references,
dynamic_finding=True,
static_finding=False,
component_name=package_name,
component_version=package_version,
file_path=package_path or None,
publish_date=published_date,
cvssv3_score=cvss_score if cvss3 == "3" else None,
)
finding.unsaved_vulnerability_ids = [cve]
finding.unsaved_endpoints = [Endpoint(host=agent_name)]
findings.append(finding)
return findings


All finding details can be found in the DryRun Security Dashboard.

@manuel-sommer
Copy link
Contributor

Please close this one @mtesauro as we already are on the way here: #12739

@9alexx3 9alexx3 closed this Jul 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parser settings_changes Needs changes to settings.py based on changes in settings.dist.py included in this PR unittests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants