Split Github Vulnerability Scan into separate SCA & SAST parsers #12773

Logicmn · 2025-07-11T01:10:41Z

Description

Hello! The current parser implementation for GitHub code scanning results is baked into the "Github Vulnerability Scan" scan type, which is a parser originally meant to be used for GitHub SCA (Dependabot) vulnerabilities. Since these two scan types are exceptionally different, issues can arise especially around the fields used for deduplication in the hash code. This PR splits out GitHub code scanning into its own GithubSASTParser, with a scan-type string called ""Github SAST Scan." I have included documentation, unit tests, and a new list of fields for hash code deduplication.

I also included several improvements for the original Github Vulnerability Scan parser. These improvements include:

Add support for the cvssSeverities which will replace the cvss field in GitHub's graphql response in October, 2025.
Add the permalink from the dependabotUpdate field to the finding description
Add GitHub's now supported epss percentage and percentile to finding.epss_score and finding.epss_percentile finding fields
Set finding.url to GitHub Dependabot alert hyperlink for convenience
Improve vulnerability ID handling (now explicitly sets finding.cve and finding.vuln_id_from_tool fields before falling back to unsaved_vulnerability_ids)
Fix a bug where finding.component_version was only being set when the vulnerableRequirements str started with =
Improve defensive coding where applicable, like using .get() to access fields

Backward compatibility: existing users of the “Github Vulnerability Scan” scan type (driven by GithubVulnerabilityParser) for SCA imports will see no change. If you’d been using it to ingest SAST/code-scanning JSON, you’ll need to switch your import to the new “Github SAST Scan” scan type (driven by GithubSASTParser).

Ref links:

Original impl of GitHub code scanning support: Fix github parser issue 9582 #9583
GitHub code scanning API reference: https://docs.github.com/en/rest/code-scanning/code-scanning

dryrunsecurity · 2025-07-11T01:12:06Z

This pull request contains a test JSON file with a sensitive data exposure vulnerability, highlighting a potential security risk of storing sensitive information without proper protection, though it is currently marked as non-blocking and within the passing risk threshold.

Sensitive Data Exposure in unittests/scans/github_sast/github_sast_one_vul.json

Vulnerability	Sensitive Data Exposure
Description	The test JSON file contains a clear-text description of a security vulnerability involving sensitive data storage. While this is a test fixture, it highlights a potential security risk of storing sensitive information without proper protection. The finding references CWE-312 (Cleartext Storage of Sensitive Information) and has a high severity level.

django-DefectDojo/unittests/scans/github_sast/github_sast_one_vul.json

Lines 1 to 53 in 84b0706

    
           [ 
        
               { 
        
                  "number":35, 
        
                  "created_at":"2024-01-19T14:11:18Z", 
        
                  "updated_at":"2024-01-19T14:11:20Z", 
        
                  "url":"https://api.github.com/repos/OWASP/test-repository/code-scanning/alerts/35", 
        
                  "html_url":"https://github.com/OWASP/test-repository/security/code-scanning/35", 
        
                  "state":"open", 
        
                  "fixed_at":"None", 
        
                  "dismissed_by":"None", 
        
                  "dismissed_at":"None", 
        
                  "dismissed_reason":"None", 
        
                  "dismissed_comment":"None", 
        
                  "rule":{ 
        
                     "id":"py/clear-text-storage-sensitive-data", 
        
                     "severity":"error", 
        
                     "description":"Clear-text storage of sensitive information", 
        
                     "name":"py/clear-text-storage-sensitive-data", 
        
                     "tags":[ 
        
                        "external/cwe/cwe-312", 
        
                        "external/cwe/cwe-315", 
        
                        "external/cwe/cwe-359", 
        
                        "security" 
        
                     ], 
        
                     "security_severity_level":"high" 
        
                  }, 
        
                  "tool":{ 
        
                     "name":"CodeQL", 
        
                     "guid":"None", 
        
                     "version":"2.16.2" 
        
                  }, 
        
                  "most_recent_instance":{ 
        
                     "ref":"refs/OWASP/test-repository", 
        
                     "analysis_key":"dynamic/github-code-scanning/codeql:analyze", 
        
                     "environment":"{\"language\":\"python\"}", 
        
                     "category":"/language:python", 
        
                     "state":"open", 
        
                     "commit_sha":"XXX", 
        
                     "message":{ 
        
                        "text":"This expression stores sensitive data (secret) as clear text." 
        
                     }, 
        
                     "location":{ 
        
                        "path":"src/file.py", 
        
                        "start_line":42, 
        
                        "end_line":42, 
        
                        "start_column":17, 
        
                        "end_column":23 
        
                     }, 
        
                     "classifications":[] 
        
                  }, 
        
                  "instances_url":"https://api.github.com/repos/OWASP/test-repository/code-scanning/alerts/35/instances" 
        
               } 
        
            ]

All finding details can be found in the DryRun Security Dashboard.

Logicmn · 2025-07-14T15:30:02Z

@Maffooch All linting errors should be fixed now, thanks for bearing with. :)

valentijnscholten · 2025-07-15T17:11:11Z

dojo/tools/github_vulnerability/parser.py

+
+        repo = data.get("data").get("repository", {})
+        repo_url = repo.get("url") or (
+            f"https://github.com/{repo.get('nameWithOwner')}" if repo.get("nameWithOwner") else None


is this always valid, even when users are using a on premise / custom install of GitHub Enterprise?

Might be good to point users to the SAST parser and add some instructions to the upgrade notes

Good point. Removed the edge case handling and just left it at repo_url = repo.get("url"). Also updated the docs to make sure it was marked as an optional field.

Added specific instructions to docs/content/en/open_source/upgrading/2.49.md for upgrading.

valentijnscholten · 2025-07-15T17:15:41Z

dojo/tools/github_vulnerability/parser.py

+        if not isinstance(data, dict) or "data" not in data:
+            error_msg = "Invalid report format, expected a GitHub RepositoryVulnerabilityAlert GraphQL query response."
+            raise ValueError(error_msg)


Is this the error users will see that do not change their scan_type and keep uploading SAST reports?

Yes, it is. I have updated it to provide more guidance, let me know whatcha think:

Invalid report format, expected a GitHub RepositoryVulnerabilityAlert GraphQL query response. If you're trying to upload code scanning results, please use the Github SAST scan type.

Logicmn added 9 commits July 10, 2025 12:18

Refactor GithubVulnerability parser and add GithubSAST parser

d17a879

More GithubVulnerability and GithubSAST parser improvements

b34a58c

Add documentation

c50be33

Add tests, update docs, and add hash code fields

2673001

Fix Github vulnerability parser unit test

3b6ee59

Unit tests and parser tweaks

6c6e697

Rm files pushed by mistake

edc4c7c

Revert certain removals from unit test

fd2c43e

Add EPSS field population and update unit tests

d6805c8

Logicmn requested review from Maffooch and mtesauro as code owners July 11, 2025 01:10

github-actions bot added settings_changes Needs changes to settings.py based on changes in settings.dist.py included in this PR docs unittests parser labels Jul 11, 2025

Logicmn added 3 commits July 10, 2025 21:19

Removed some unnecessary comments and formatting

106e769

Ruff formatting

7399641

Fix unit tests

8115ee3

Maffooch requested review from valentijnscholten and dogboat July 11, 2025 23:55

Ruff formatting

745dca2

Fix unit test

d698115

valentijnscholten reviewed Jul 15, 2025

View reviewed changes

valentijnscholten added this to the 2.49.0 milestone Jul 15, 2025

Logicmn added 3 commits July 16, 2025 15:08

Github Vulnerability parser and docs tweaks, and upgrade instructions

b7d143e

Politeness

9f9bc42

Fix dependabot update pr link parsing

84b0706

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Split Github Vulnerability Scan into separate SCA & SAST parsers #12773

Split Github Vulnerability Scan into separate SCA & SAST parsers #12773

Uh oh!

Logicmn commented Jul 11, 2025 •

edited

Loading

Uh oh!

dryrunsecurity bot commented Jul 11, 2025 •

edited

Loading

Uh oh!

Logicmn commented Jul 14, 2025

Uh oh!

valentijnscholten Jul 15, 2025

Uh oh!

valentijnscholten Jul 15, 2025

Uh oh!

Logicmn Jul 16, 2025

Uh oh!

Logicmn Jul 16, 2025

Uh oh!

valentijnscholten Jul 15, 2025

Uh oh!

Logicmn Jul 16, 2025 •

edited

Loading

Uh oh!

Uh oh!

Split Github Vulnerability Scan into separate SCA & SAST parsers #12773

Are you sure you want to change the base?

Split Github Vulnerability Scan into separate SCA & SAST parsers #12773

Uh oh!

Conversation

Logicmn commented Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dryrunsecurity bot commented Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Logicmn commented Jul 14, 2025

Uh oh!

valentijnscholten Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

valentijnscholten Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

Logicmn Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

Logicmn Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

valentijnscholten Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

Logicmn Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Logicmn commented Jul 11, 2025 •

edited

Loading

dryrunsecurity bot commented Jul 11, 2025 •

edited

Loading

Logicmn Jul 16, 2025 •

edited

Loading