-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Wazuh: Add separate parser 4.7 & 4.8 #12841
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This pull request contains two findings related to potential memory and resource exhaustion vulnerabilities in JSON parsing methods, specifically in the Wazuh Legacy and Indexer parsers, where large or maliciously crafted JSON files could lead to denial of service by consuming excessive server resources during parsing.
Uncontrolled Data Consumption in
|
Vulnerability | Uncontrolled Data Consumption |
---|---|
Description | The WazuhLegacyParser uses json.load(file) to parse uploaded scan results. This method loads the entire file into memory. While the application does implement a maximum file size limit of 100MB for uploaded files, a malicious actor could still upload a file close to this limit. Parsing a 100MB JSON file, especially one with deeply nested structures or large arrays, could lead to significant memory consumption, potentially causing resource exhaustion or denial of service for the application, even if the file size is within the allowed limit. |
django-DefectDojo/dojo/tools/wazuh_legacy/parser.py
Lines 1 to 87 in db2db4e
import json | |
from dojo.models import Endpoint, Finding | |
class WazuhLegacyParser: | |
""" | |
The vulnerabilities with condition "Package unfixed" are skipped because there is no fix out yet. | |
https://github.com/wazuh/wazuh/issues/14560 | |
""" | |
""" | |
Parser used for the Wazuh Detector module used in older versions of 4.7 and below (before Vulnerability Detection refactor). | |
https://github.com/wazuh/wazuh/releases/tag/v4.8.0 | |
""" | |
def get_scan_types(self): | |
return ["Wazuh =< 4.7 Scan"] | |
def get_label_for_scan_types(self, scan_type): | |
return "Wazuh =< 4.7 Scan" | |
def get_description_for_scan_types(self, scan_type): | |
return "Wazuh =< 4.7 Scan. See the documentation for search a script to obtain a clear output." | |
def get_findings(self, file, test): | |
data = json.load(file) | |
if not data: | |
return [] | |
findings = [] | |
# Loop through each element in the list | |
vulnerabilities = data.get("data", {}).get("affected_items", []) | |
for item in vulnerabilities: | |
if ( | |
item["condition"] != "Package unfixed" | |
and item["severity"] != "Untriaged" | |
): | |
cve = item.get("cve") | |
package_name = item.get("name") | |
package_version = item.get("version") | |
description = item.get("condition") | |
severity = item.get("severity").capitalize() | |
agent_ip = item.get("agent_ip") | |
links = item.get("external_references") | |
cvssv3_score = item.get("cvss3_score") | |
publish_date = item.get("published") | |
agent_name = item.get("agent_name") | |
agent_ip = item.get("agent_ip") | |
detection_time = item.get("detection_time").split("T")[0] | |
references = "\n".join(links) if links else None | |
title = ( | |
item.get("title") + " (version: " + package_version + ")" | |
) | |
find = Finding( | |
title=title, | |
test=test, | |
description=description, | |
severity=severity, | |
references=references, | |
dynamic_finding=True, | |
static_finding=False, | |
component_name=package_name, | |
component_version=package_version, | |
cvssv3_score=cvssv3_score, | |
publish_date=publish_date, | |
date=detection_time, | |
) | |
# in some cases the agent_ip is not the perfect way on how to identify a host. Thus prefer the agent_name, if existant. | |
if agent_name: | |
find.unsaved_endpoints = [Endpoint(host=agent_name)] | |
elif agent_ip: | |
find.unsaved_endpoints = [Endpoint(host=agent_ip)] | |
if cve: | |
find.unsaved_vulnerability_ids = [cve] | |
findings.append(find) | |
return findings |
Potential Denial of Service (DoS) via Large or Malformed JSON Input in dojo/tools/wazuh_indexer/parser.py
Vulnerability | Potential Denial of Service (DoS) via Large or Malformed JSON Input |
---|---|
Description | The WazuhIndexerParser uses json.load(file) to parse uploaded JSON files. This function reads the entire file into memory before parsing. While the settings.dist.py file indicates a MAX_UPLOAD_SIZE of 2GB, this limit is applied to the overall file upload, not specifically to the JSON content being parsed. If a malicious actor uploads a large, valid JSON file (up to 2GB), it could consume significant memory resources on the server, potentially leading to a Denial of Service (DoS) by exhausting available memory or CPU, especially if multiple such requests are processed concurrently. The json.load method is also vulnerable to CPU exhaustion with deeply nested JSON structures, even if the file size is small. |
django-DefectDojo/dojo/tools/wazuh_indexer/parser.py
Lines 1 to 105 in db2db4e
import json | |
from datetime import datetime | |
from dojo.models import Endpoint, Finding | |
class WazuhIndexerParser: | |
def get_scan_types(self): | |
return ["Wazuh >= 4.8 Scan"] | |
def get_label_for_scan_types(self, scan_type): | |
return "Wazuh >= 4.8 Scan" | |
def get_description_for_scan_types(self, scan_type): | |
return "Wazuh Vulnerability Data >= 4.8 from indexer in JSON format. See the documentation for search a script to obtain a clear output." | |
def get_findings(self, file, test): | |
data = json.load(file) | |
if not data: | |
return [] | |
findings = [] | |
vulnerabilities = data.get("hits", {}).get("hits", []) | |
for item_source in vulnerabilities: | |
item = item_source.get("_source") | |
# Get all vulnerability data | |
vuln = item.get("vulnerability") | |
description = vuln.get("description") | |
cve = vuln.get("id") | |
published_date = datetime.fromisoformat(vuln["published_at"]).date() | |
references = vuln.get("reference") | |
severity = vuln.get("severity") | |
if severity not in {"Critical", "High", "Medium", "Low"}: | |
severity = "Info" | |
if vuln.get("score"): | |
cvss_score = vuln.get("score").get("base") | |
cvss_version = vuln.get("score").get("version") | |
cvss3 = cvss_version.split(".")[0] | |
# Agent is equal to the endpoint | |
agent = item.get("agent") | |
agent_id = agent.get("id") | |
agent_name = agent.get("name") | |
# agent_ip = agent.get("ip") Maybe... will introduce it in the news versions of Wazuh? | |
description = ( | |
f"Agent Name/ID: {agent_name} / {agent_id}\n" | |
f"{description}" | |
) | |
# Package in Wazuh is equivalent to "component" in DD | |
package = item.get("package") | |
package_name = package.get("name") | |
package_version = package.get("version") | |
package_description = package.get("description") | |
# Only get this field on some Windows agents. | |
package_path = package.get("path", None) | |
# Get information about OS from agent. | |
# This will use for severity justification | |
info_os = item.get("host") | |
if info_os and info_os.get("os"): | |
name_os = info_os.get("os").get("full", "N/A") | |
kernel_os = info_os.get("os").get("kernel", "N/A") | |
title = f"{cve} Affects {package_name} (Version: {package_version})" | |
severity_justification = ( | |
f"Severity: {severity}\n" | |
f"CVSS Score: {cvss_score}\n" | |
f"CVSS Version: {cvss_version}\n" | |
f"\nOS: {name_os}\n" | |
f"Kernel: {kernel_os}\n\n" | |
f"Package Name: {package_name}\n" | |
f"Package Description: {package_description}" | |
) | |
finding = Finding( | |
title=title, | |
test=test, | |
description=description, | |
severity_justification=severity_justification, | |
severity=severity, | |
references=references, | |
dynamic_finding=True, | |
static_finding=False, | |
component_name=package_name, | |
component_version=package_version, | |
file_path=package_path or None, | |
publish_date=published_date, | |
cvssv3_score=cvss_score if cvss3 == "3" else None, | |
) | |
finding.unsaved_vulnerability_ids = [cve] | |
finding.unsaved_endpoints = [Endpoint(host=agent_name)] | |
findings.append(finding) | |
return findings |
All finding details can be found in the DryRun Security Dashboard.
Description
This PR introduces two separate parsers for Wazuh ( >= 4.8 (Indexer) AND =< 4.7 (Legacy)) due to significant changes in the data structure introduced in version 4.8 to fix 12634 issue.
The decision to split the parsers was made for long-term maintainability and compatibility, as the new format differs notably from previous versions. Key changes include:
• Renamed JSON parent keys.
• Fields such as
title
andagent_ip
have been removed.• The
cve
field is now renamed toid
.• CVSS scores of the differents versions are unified under the same structure and differentiated via a version field.
OLD WAZUH V4.7:
NEW WAZUH V4.8
• Wazuh 4.8 includes additional metadata such as
Operating System
andPackage Information
which is dumped to Finding.Other features:
EXTRA INFORMATION
If you consider that using a single parser with subparsers logic like #12739 would be more appropriate, please let me know and I’ll update manuel's PR with my subparsers branch.
TEST RESULTS:
Unittests OK:

Also, I have checked with a 289893 lines file with 5000 findings and have 0 warnings/errors.
