Skip to content

reuse Sarif base parser for snyk and mayhem parsers #12788

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: dev
Choose a base branch
from

Conversation

valentijnscholten
Copy link
Member

@valentijnscholten valentijnscholten commented Jul 15, 2025

Over time two Sarif based parsers were added because they needed some custom parser logic to fully parse their scan reports:

  • Mayhem
  • Snyk Code

This resulted in some code duplication. This PR changes the Sarif base parser to be extensible and uses this to simplify the Snyk Code and Mayhem parsers. I didn't go as far as to create extension hooks for each an every field. This PR now has some examples that can be used in future PRs if more fields need to be extendible.

Initially I had the CWE parsing logic for the Mayhem taxa based CWEs also in an extenstion hook. But then I found out that the "taxa" format is actually somewhat official Sarif formatting. Also there was already code in the base Sarif parser looking for CWEs in various places. So the PR adds the "taxa way" to the base Sarif parser. This way it's also beneficial for any other non-Mayem report that might use this format.

Additional things in this PR:

  • Reduce code duplication between the snyk parser and snyk_code parser.
  • Enhance test cases for the snyk_code parser

[sc-11544]

@valentijnscholten valentijnscholten added this to the 2.49.0 milestone Jul 15, 2025
@valentijnscholten valentijnscholten marked this pull request as ready for review July 15, 2025 22:18
Copy link

dryrunsecurity bot commented Jul 15, 2025

DryRun Security

This pull request contains a potential Denial of Service (DoS) vulnerability in the SARIF parser where an unhandled ValueError could cause the file import process to crash if a SARIF file lacks necessary title generation fields.

Unhandled Exception leading to DoS in dojo/tools/sarif/parser.py
Vulnerability Unhandled Exception leading to DoS
Description The get_finding_title method in dojo/tools/sarif/parser.py explicitly raises a ValueError if it cannot extract a title from the SARIF result or rule. The calling method, get_items_from_result, does not contain a try...except block to catch this specific ValueError or a general exception. This means that if a SARIF file is provided that lacks the necessary fields for title generation (e.g., message, shortDescription, fullDescription, name, or id in the rule), the ValueError will propagate up the call stack, likely causing the SARIF parser to crash and fail the file import process.

# if the data is here we try to convert it to datetime
return dateutil.parser.isoparse(raw_date)
# Extension points for subclasses
def get_items_from_result(self, result, rules, artifacts, run_date):
"""
Main method to extract findings from a SARIF result.
This method can be overridden by subclasses for custom behavior.
"""
# see
# https://docs.oasis-open.org/sarif/sarif/v2.1.0/csprd01/sarif-v2.1.0-csprd01.html
# / 3.27.9
kind = result.get("kind", "fail")
if kind != "fail":
return None
# if finding is suppressed, mark it as False Positive
# Note: see
# https://docs.oasis-open.org/sarif/sarif/v2.0/csprd02/sarif-v2.0-csprd02.html#_Toc10127852
suppressed = False
if result.get("suppressions"):
suppressed = True
# if there is a location get all files into files list
files = []
if "locations" in result:
for location in result["locations"]:
file_path = None
line = None
if "physicalLocation" in location:
file_path = location["physicalLocation"]["artifactLocation"]["uri"]
# 'region' attribute is optionnal
if "region" in location["physicalLocation"]:
# https://docs.oasis-open.org/sarif/sarif/v2.0/csprd02/sarif-v2.0-csprd02.html / 3.30.1
# need to check whether it is byteOffset
if "byteOffset" in location["physicalLocation"]["region"]:
pass
else:
line = location["physicalLocation"]["region"]["startLine"]
files.append((file_path, line, location))
if not files:
files.append((None, None, None))
result_items = []
for file_path, line, location in files:
# test rule link
rule = rules.get(result.get("ruleId"))
# Get description from parser (uses inheritance)
description = self.get_finding_description(result, rule, location)
# Get title from parser (uses inheritance)
title = self.get_finding_title(result, rule, location)
# Get finding type (uses inheritance)
static_finding, dynamic_finding = self.get_finding_type()
finding = Finding(
title=title,
severity=get_severity(result, rule),
description=description,
static_finding=static_finding,
dynamic_finding=dynamic_finding,
false_p=suppressed,
active=not suppressed,
file_path=file_path,
line=line,
references=get_references(rule),
)
if "ruleId" in result:
finding.vuln_id_from_tool = result["ruleId"]
# for now we only support when the id of the rule is a CVE
if cve_try(result["ruleId"]):
finding.unsaved_vulnerability_ids = [cve_try(result["ruleId"])]
# some time the rule id is here but the tool doesn't define it
if rule is not None:
cwes_extracted = get_rule_cwes(rule)
if len(cwes_extracted) > 0:
finding.cwe = cwes_extracted[-1]
# Some tools such as GitHub or Grype return the severity in properties
# instead
if "properties" in rule and "security-severity" in rule["properties"]:
try:
cvss = float(rule["properties"]["security-severity"])
severity = cvss_to_severity(cvss)
finding.cvssv3_score = cvss
finding.severity = severity
except ValueError:
if rule["properties"]["security-severity"].lower().capitalize() in {"Info", "Low", "Medium", "High", "Critical"}:
finding.severity = rule["properties"]["security-severity"].lower().capitalize()
else:
finding.severity = "Info"
# manage the case that some tools produce CWE as properties of the result
cwes_properties_extracted = get_result_cwes_properties(result)
if len(cwes_properties_extracted) > 0:
finding.cwe = cwes_properties_extracted[-1]
# manage the case that some tools produce CWE using taxa (official SARIF approach)
cwes_taxa_extracted = get_result_cwes_taxa(result)
if len(cwes_taxa_extracted) > 0:
finding.cwe = cwes_taxa_extracted[-1]
# Get custom CWEs if available (uses inheritance)
custom_cwes = self.get_finding_cwes(result)
if custom_cwes:
finding.cwe = custom_cwes[-1] # Use the last CWE like other logic
# manage fixes provided in the report
if "fixes" in result:
finding.mitigation = "\n".join(
[fix.get("description", {}).get("text") for fix in result["fixes"]],
)
if run_date:
finding.date = run_date
# manage tags provided in the report and rule and remove duplicated
tags = list(set(get_properties_tags(rule) + get_properties_tags(result)))
tags = [s.removeprefix("external/cwe/") for s in tags]
finding.tags = tags
# manage fingerprints
# fingerprinting in SARIF is more complete than in current implementation
# SARIF standard make it possible to have multiple version in the same report
# for now we just take the first one and keep the format to be able to
# compare it
if result.get("fingerprints"):
hashes = get_fingerprints_hashes(result["fingerprints"])
first_item = next(iter(hashes.items()))
finding.unique_id_from_tool = first_item[1]["value"]
elif result.get("partialFingerprints"):
# for this one we keep an order to have id that could be compared
hashes = get_fingerprints_hashes(result["partialFingerprints"])
sorted_hashes = sorted(hashes.keys())
finding.unique_id_from_tool = "|".join(
[f'{key}:{hashes[key]["value"]}' for key in sorted_hashes],
)
# Allow subclasses to customize the finding (uses inheritance)
self.customize_finding(finding, result, rule, location)
result_items.append(finding)
return result_items
def get_finding_cwes(self, result):
"""
Hook method for subclasses to extract custom CWE values from result.
Override this method to add custom CWE extraction logic.
"""
return []
def get_finding_title(self, result, rule, location):
"""
Get title for the finding. Subclasses can override this method
to add custom title formatting. Use super() to get the base title.
"""
title = None
if "message" in result:
title = get_message_from_multiformatMessageString(
result["message"], rule,
)
if title is None and rule is not None:
if "shortDescription" in rule:
title = get_message_from_multiformatMessageString(
rule["shortDescription"], rule,
)
elif "fullDescription" in rule:
title = get_message_from_multiformatMessageString(
rule["fullDescription"], rule,
)
elif "name" in rule:
title = rule["name"]
elif "id" in rule:
title = rule["id"]
if title is None:
msg = "No information found to create a title"
raise ValueError(msg)
return textwrap.shorten(title, 150)
def get_finding_description(self, result, rule, location):
"""
Get description for the finding. Subclasses can override this method
to add custom description formatting. Use super() to get the base description.
"""
description = ""
message = ""
if "message" in result:
message = get_message_from_multiformatMessageString(
result["message"], rule,
)
description += f"**Result message:** {message}\n"
if get_snippet(location) is not None:
description += f"**Snippet:**\n```\n{get_snippet(location)}\n```\n"
if rule is not None:
if "name" in rule:
description += f"**{_('Rule name')}:** {rule.get('name')}\n"
shortDescription = ""
if "shortDescription" in rule:
shortDescription = get_message_from_multiformatMessageString(
rule["shortDescription"], rule,
)
if shortDescription != message:
description += f"**{_('Rule short description')}:** {shortDescription}\n"
if "fullDescription" in rule:
fullDescription = get_message_from_multiformatMessageString(
rule["fullDescription"], rule,
)
if (
fullDescription != message
and fullDescription != shortDescription
):
description += f"**{_('Rule full description')}:** {fullDescription}\n"
if len(result.get("codeFlows", [])) > 0:
description += get_codeFlowsDescription(result["codeFlows"])
return description.removesuffix("\n")
def get_finding_type(self):
"""
Hook method for subclasses to specify finding type.
Returns tuple of (static_finding, dynamic_finding).
"""
return (True, False) # SARIF is static by definition
def customize_finding(self, finding, result, rule, location):
"""
Hook method for subclasses to customize the finding after creation.
Override this method to add custom fields or modify the finding.
"""
def get_rules(run):
rules = {}


All finding details can be found in the DryRun Security Dashboard.

@paulOsinski
Copy link
Contributor

paulOsinski commented Jul 16, 2025

looks good to me / does what I need it to, anyway!

Copy link
Contributor

@mtesauro mtesauro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved

@Maffooch Maffooch requested review from dogboat and blakeaowens July 21, 2025 15:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants