Skip to content

Split Github Vulnerability Scan into separate SCA & SAST parsers #12773

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 17 commits into
base: dev
Choose a base branch
from

Conversation

Logicmn
Copy link

@Logicmn Logicmn commented Jul 11, 2025

Description

Hello! The current parser implementation for GitHub code scanning results is baked into the "Github Vulnerability Scan" scan type, which is a parser originally meant to be used for GitHub SCA (Dependabot) vulnerabilities. Since these two scan types are exceptionally different, issues can arise especially around the fields used for deduplication in the hash code. This PR splits out GitHub code scanning into its own GithubSASTParser, with a scan-type string called ""Github SAST Scan." I have included documentation, unit tests, and a new list of fields for hash code deduplication.

I also included several improvements for the original Github Vulnerability Scan parser. These improvements include:

  • Add support for the cvssSeverities which will replace the cvss field in GitHub's graphql response in October, 2025.
  • Add the permalink from the dependabotUpdate field to the finding description
  • Add GitHub's now supported epss percentage and percentile to finding.epss_score and finding.epss_percentile finding fields
  • Set finding.url to GitHub Dependabot alert hyperlink for convenience
  • Improve vulnerability ID handling (now explicitly sets finding.cve and finding.vuln_id_from_tool fields before falling back to unsaved_vulnerability_ids)
  • Fix a bug where finding.component_version was only being set when the vulnerableRequirements str started with =
  • Improve defensive coding where applicable, like using .get() to access fields

Backward compatibility: existing users of the “Github Vulnerability Scan” scan type (driven by GithubVulnerabilityParser) for SCA imports will see no change. If you’d been using it to ingest SAST/code-scanning JSON, you’ll need to switch your import to the new “Github SAST Scan” scan type (driven by GithubSASTParser).

Ref links:

@github-actions github-actions bot added settings_changes Needs changes to settings.py based on changes in settings.dist.py included in this PR docs unittests parser labels Jul 11, 2025
Copy link

dryrunsecurity bot commented Jul 11, 2025

DryRun Security

This pull request contains a test JSON file with a sensitive data exposure vulnerability, highlighting a potential security risk of storing sensitive information without proper protection, though it is currently marked as non-blocking and within the passing risk threshold.

Sensitive Data Exposure in unittests/scans/github_sast/github_sast_one_vul.json
Vulnerability Sensitive Data Exposure
Description The test JSON file contains a clear-text description of a security vulnerability involving sensitive data storage. While this is a test fixture, it highlights a potential security risk of storing sensitive information without proper protection. The finding references CWE-312 (Cleartext Storage of Sensitive Information) and has a high severity level.

[
{
"number":35,
"created_at":"2024-01-19T14:11:18Z",
"updated_at":"2024-01-19T14:11:20Z",
"url":"https://api.github.com/repos/OWASP/test-repository/code-scanning/alerts/35",
"html_url":"https://github.com/OWASP/test-repository/security/code-scanning/35",
"state":"open",
"fixed_at":"None",
"dismissed_by":"None",
"dismissed_at":"None",
"dismissed_reason":"None",
"dismissed_comment":"None",
"rule":{
"id":"py/clear-text-storage-sensitive-data",
"severity":"error",
"description":"Clear-text storage of sensitive information",
"name":"py/clear-text-storage-sensitive-data",
"tags":[
"external/cwe/cwe-312",
"external/cwe/cwe-315",
"external/cwe/cwe-359",
"security"
],
"security_severity_level":"high"
},
"tool":{
"name":"CodeQL",
"guid":"None",
"version":"2.16.2"
},
"most_recent_instance":{
"ref":"refs/OWASP/test-repository",
"analysis_key":"dynamic/github-code-scanning/codeql:analyze",
"environment":"{\"language\":\"python\"}",
"category":"/language:python",
"state":"open",
"commit_sha":"XXX",
"message":{
"text":"This expression stores sensitive data (secret) as clear text."
},
"location":{
"path":"src/file.py",
"start_line":42,
"end_line":42,
"start_column":17,
"end_column":23
},
"classifications":[]
},
"instances_url":"https://api.github.com/repos/OWASP/test-repository/code-scanning/alerts/35/instances"
}
]


All finding details can be found in the DryRun Security Dashboard.

@Logicmn
Copy link
Author

Logicmn commented Jul 14, 2025

@Maffooch All linting errors should be fixed now, thanks for bearing with. :)


repo = data.get("data").get("repository", {})
repo_url = repo.get("url") or (
f"https://github.com/{repo.get('nameWithOwner')}" if repo.get("nameWithOwner") else None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this always valid, even when users are using a on premise / custom install of GitHub Enterprise?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to point users to the SAST parser and add some instructions to the upgrade notes

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Removed the edge case handling and just left it at repo_url = repo.get("url"). Also updated the docs to make sure it was marked as an optional field.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added specific instructions to docs/content/en/open_source/upgrading/2.49.md for upgrading.

Comment on lines 22 to 24
if not isinstance(data, dict) or "data" not in data:
error_msg = "Invalid report format, expected a GitHub RepositoryVulnerabilityAlert GraphQL query response."
raise ValueError(error_msg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the error users will see that do not change their scan_type and keep uploading SAST reports?

Copy link
Author

@Logicmn Logicmn Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is. I have updated it to provide more guidance, let me know whatcha think:

Invalid report format, expected a GitHub RepositoryVulnerabilityAlert GraphQL query response. If you're trying to upload code scanning results, please use the Github SAST scan type.

@valentijnscholten valentijnscholten added this to the 2.49.0 milestone Jul 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs parser settings_changes Needs changes to settings.py based on changes in settings.dist.py included in this PR unittests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants