Wazuh: Add separate parser 4.7 & 4.8 #12841

9alexx3 · 2025-07-22T21:44:25Z

Description

This PR introduces two separate parsers for Wazuh ( >= 4.8 (Indexer) AND =< 4.7 (Legacy)) due to significant changes in the data structure introduced in version 4.8 to fix 12634 issue.

The decision to split the parsers was made for long-term maintainability and compatibility, as the new format differs notably from previous versions. Key changes include:
• Renamed JSON parent keys.
• Fields such as title and agent_ip have been removed.
• The cve field is now renamed to id.
• CVSS scores of the differents versions are unified under the same structure and differentiated via a version field.

OLD WAZUH V4.7:

"cvss2_score": 7.5,
"cvss3_score": 9.8,

NEW WAZUH V4.8

"score": {
    "base": 9.1,
    "version": "3.1"
},

"score": {
    "base": 1.9,Q
    "version": "2.0"
},

• Wazuh 4.8 includes additional metadata such as Operating System and Package Information which is dumped to Finding.

Other features:

Standard class Names
Unittests
Bugfixe CVE field
Add custom dedupe

EXTRA INFORMATION

If you consider that using a single parser with subparsers logic like #12739 would be more appropriate, please let me know and I’ll update manuel's PR with my subparsers branch.

TEST RESULTS:

Unittests OK:

Also, I have checked with a 289893 lines file with 5000 findings and have 0 warnings/errors.

dryrunsecurity · 2025-07-22T21:45:33Z

This pull request contains two findings related to potential memory and resource exhaustion vulnerabilities in JSON parsing methods, specifically in the Wazuh Legacy and Indexer parsers, where large or maliciously crafted JSON files could lead to denial of service by consuming excessive server resources during parsing.

Uncontrolled Data Consumption in dojo/tools/wazuh_legacy/parser.py

Vulnerability	Uncontrolled Data Consumption
Description	The `WazuhLegacyParser` uses `json.load(file)` to parse uploaded scan results. This method loads the entire file into memory. While the application does implement a maximum file size limit of 100MB for uploaded files, a malicious actor could still upload a file close to this limit. Parsing a 100MB JSON file, especially one with deeply nested structures or large arrays, could lead to significant memory consumption, potentially causing resource exhaustion or denial of service for the application, even if the file size is within the allowed limit.

django-DefectDojo/dojo/tools/wazuh_legacy/parser.py

Lines 1 to 87 in db2db4e

    
           import json 
        
           from dojo.models import Endpoint, Finding 
        
           class WazuhLegacyParser: 
        
               """ 
        
               The vulnerabilities with condition "Package unfixed" are skipped because there is no fix out yet. 
        
               https://github.com/wazuh/wazuh/issues/14560 
        
               """ 
        
               """ 
        
               Parser used for the Wazuh Detector module used in older versions of 4.7 and below (before Vulnerability Detection refactor). 
        
               https://github.com/wazuh/wazuh/releases/tag/v4.8.0 
        
               """ 
        
               def get_scan_types(self): 
        
                   return ["Wazuh =< 4.7 Scan"] 
        
               def get_label_for_scan_types(self, scan_type): 
        
                   return "Wazuh =< 4.7 Scan" 
        
               def get_description_for_scan_types(self, scan_type): 
        
                   return "Wazuh =< 4.7 Scan. See the documentation for search a script to obtain a clear output." 
        
               def get_findings(self, file, test): 
        
                   data = json.load(file) 
        
                   if not data: 
        
                       return [] 
        
                   findings = [] 
        
                   # Loop through each element in the list 
        
                   vulnerabilities = data.get("data", {}).get("affected_items", []) 
        
                   for item in vulnerabilities: 
        
                       if ( 
        
                           item["condition"] != "Package unfixed" 
        
                           and item["severity"] != "Untriaged" 
        
                       ): 
        
                           cve = item.get("cve") 
        
                           package_name = item.get("name") 
        
                           package_version = item.get("version") 
        
                           description = item.get("condition") 
        
                           severity = item.get("severity").capitalize() 
        
                           agent_ip = item.get("agent_ip") 
        
                           links = item.get("external_references") 
        
                           cvssv3_score = item.get("cvss3_score") 
        
                           publish_date = item.get("published") 
        
                           agent_name = item.get("agent_name") 
        
                           agent_ip = item.get("agent_ip") 
        
                           detection_time = item.get("detection_time").split("T")[0] 
        
                           references = "\n".join(links) if links else None 
        
                           title = ( 
        
                               item.get("title") + " (version: " + package_version + ")" 
        
                           ) 
        
                           find = Finding( 
        
                               title=title, 
        
                               test=test, 
        
                               description=description, 
        
                               severity=severity, 
        
                               references=references, 
        
                               dynamic_finding=True, 
        
                               static_finding=False, 
        
                               component_name=package_name, 
        
                               component_version=package_version, 
        
                               cvssv3_score=cvssv3_score, 
        
                               publish_date=publish_date, 
        
                               date=detection_time, 
        
                           ) 
        
                           # in some cases the agent_ip is not the perfect way on how to identify a host. Thus prefer the agent_name, if existant. 
        
                           if agent_name: 
        
                               find.unsaved_endpoints = [Endpoint(host=agent_name)] 
        
                           elif agent_ip: 
        
                               find.unsaved_endpoints = [Endpoint(host=agent_ip)] 
        
                           if cve: 
        
                               find.unsaved_vulnerability_ids = [cve] 
        
                           findings.append(find) 
        
                   return findings

Potential Denial of Service (DoS) via Large or Malformed JSON Input in dojo/tools/wazuh_indexer/parser.py

Vulnerability	Potential Denial of Service (DoS) via Large or Malformed JSON Input
Description	The `WazuhIndexerParser` uses `json.load(file)` to parse uploaded JSON files. This function reads the entire file into memory before parsing. While the `settings.dist.py` file indicates a `MAX_UPLOAD_SIZE` of 2GB, this limit is applied to the overall file upload, not specifically to the JSON content being parsed. If a malicious actor uploads a large, valid JSON file (up to 2GB), it could consume significant memory resources on the server, potentially leading to a Denial of Service (DoS) by exhausting available memory or CPU, especially if multiple such requests are processed concurrently. The `json.load` method is also vulnerable to CPU exhaustion with deeply nested JSON structures, even if the file size is small.

django-DefectDojo/dojo/tools/wazuh_indexer/parser.py

Lines 1 to 105 in db2db4e

    
           import json 
        
           from datetime import datetime 
        
           from dojo.models import Endpoint, Finding 
        
           class WazuhIndexerParser: 
        
               def get_scan_types(self): 
        
                   return ["Wazuh >= 4.8 Scan"] 
        
               def get_label_for_scan_types(self, scan_type): 
        
                   return "Wazuh >= 4.8 Scan" 
        
               def get_description_for_scan_types(self, scan_type): 
        
                   return "Wazuh Vulnerability Data >= 4.8 from indexer in JSON format. See the documentation for search a script to obtain a clear output." 
        
               def get_findings(self, file, test): 
        
                   data = json.load(file) 
        
                   if not data: 
        
                       return [] 
        
                   findings = [] 
        
                   vulnerabilities = data.get("hits", {}).get("hits", []) 
        
                   for item_source in vulnerabilities: 
        
                       item = item_source.get("_source") 
        
                       # Get all vulnerability data 
        
                       vuln = item.get("vulnerability") 
        
                       description = vuln.get("description") 
        
                       cve = vuln.get("id") 
        
                       published_date = datetime.fromisoformat(vuln["published_at"]).date() 
        
                       references = vuln.get("reference") 
        
                       severity = vuln.get("severity") 
        
                       if severity not in {"Critical", "High", "Medium", "Low"}: 
        
                           severity = "Info" 
        
                       if vuln.get("score"): 
        
                           cvss_score = vuln.get("score").get("base") 
        
                           cvss_version = vuln.get("score").get("version") 
        
                           cvss3 = cvss_version.split(".")[0] 
        
                       # Agent is equal to the endpoint 
        
                       agent = item.get("agent") 
        
                       agent_id = agent.get("id") 
        
                       agent_name = agent.get("name") 
        
                       # agent_ip = agent.get("ip")  Maybe... will introduce it in the news versions of Wazuh? 
        
                       description = ( 
        
                           f"Agent Name/ID: {agent_name} / {agent_id}\n" 
        
                           f"{description}" 
        
                       ) 
        
                       # Package in Wazuh is equivalent to "component" in DD 
        
                       package = item.get("package") 
        
                       package_name = package.get("name") 
        
                       package_version = package.get("version") 
        
                       package_description = package.get("description") 
        
                       # Only get this field on some Windows agents. 
        
                       package_path = package.get("path", None) 
        
                       # Get information about OS from agent. 
        
                       # This will use for severity justification 
        
                       info_os = item.get("host") 
        
                       if info_os and info_os.get("os"): 
        
                           name_os = info_os.get("os").get("full", "N/A") 
        
                           kernel_os = info_os.get("os").get("kernel", "N/A") 
        
                       title = f"{cve} Affects {package_name} (Version: {package_version})" 
        
                       severity_justification = ( 
        
                           f"Severity: {severity}\n" 
        
                           f"CVSS Score: {cvss_score}\n" 
        
                           f"CVSS Version: {cvss_version}\n" 
        
                           f"\nOS: {name_os}\n" 
        
                           f"Kernel: {kernel_os}\n\n" 
        
                           f"Package Name: {package_name}\n" 
        
                           f"Package Description: {package_description}" 
        
                       ) 
        
                       finding = Finding( 
        
                           title=title, 
        
                           test=test, 
        
                           description=description, 
        
                           severity_justification=severity_justification, 
        
                           severity=severity, 
        
                           references=references, 
        
                           dynamic_finding=True, 
        
                           static_finding=False, 
        
                           component_name=package_name, 
        
                           component_version=package_version, 
        
                           file_path=package_path or None, 
        
                           publish_date=published_date, 
        
                           cvssv3_score=cvss_score if cvss3 == "3" else None, 
        
                       ) 
        
                       finding.unsaved_vulnerability_ids = [cve] 
        
                       finding.unsaved_endpoints = [Endpoint(host=agent_name)] 
        
                       findings.append(finding) 
        
                   return findings

All finding details can be found in the DryRun Security Dashboard.

manuel-sommer · 2025-07-23T10:11:35Z

Please close this one @mtesauro as we already are on the way here: #12739

9alexx3 added 20 commits July 17, 2025 23:54

wazuh-2-parsers-initial

c62677e

Wazuh Indexer Parser fix

2f7daae

Dedupe

924dec3

Unittests Fix

f0347be

ruff-linting

022eb27

ruff-linting

6acae8f

ruff-linting fix

adf64e9

ruff-linting

3635351

ruff-linting

3198cb0

Wazuh-Parser-Legacy

d492f64

Wazuh-Parser

3a7c33e

Fix

633e636

Final

dbd543d

Final

ed40246

Remove comment 2-parsers

0809663

Fix indentation

c6d85ad

unittests

c00d25a

fix rufflinter

b2fdb2f

fix rufflinter

206582c

unittests

db2db4e

9alexx3 requested review from Maffooch and mtesauro as code owners July 22, 2025 21:44

github-actions bot added settings_changes Needs changes to settings.py based on changes in settings.dist.py included in this PR unittests parser labels Jul 22, 2025

9alexx3 closed this Jul 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Wazuh: Add separate parser 4.7 & 4.8 #12841

Wazuh: Add separate parser 4.7 & 4.8 #12841

Uh oh!

9alexx3 commented Jul 22, 2025

Uh oh!

dryrunsecurity bot commented Jul 22, 2025

Uh oh!

manuel-sommer commented Jul 23, 2025

Uh oh!

Uh oh!

Wazuh: Add separate parser 4.7 & 4.8 #12841

Wazuh: Add separate parser 4.7 & 4.8 #12841

Uh oh!

Conversation

9alexx3 commented Jul 22, 2025

Description

EXTRA INFORMATION

TEST RESULTS:

Uh oh!

dryrunsecurity bot commented Jul 22, 2025

Uh oh!

manuel-sommer commented Jul 23, 2025

Uh oh!

Uh oh!