Skip to content

SCIO does not identify the codebase source (the path) of a license detection #902

Open
@DennisClark

Description

@DennisClark

A recent scan of an FFmpeg project in SCIO returned a composite license expression that included AND proprietary-license in the various licenses, and that was totally incorrect, as there was no object in the codebase under any proprietary license. Refer to aboutcode-org/scancode-toolkit#3504 for a related problem.

The big issue here is that I could not find any way, either in the SCIO UI, or in the exported scan results, to identify the actual file (complete path name) that triggered the erroneous detections. The exported scan results only include the following:

  {
    "score": 100.0,
    "matcher": "2-aho",
    "end_line": 4182,
    "rule_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/proprietary-license_489.RULE",
    "start_line": 4182,
    "matched_text": "    license=\"nonfree and unredistributable\"",
    "match_coverage": 100.0,
    "matched_length": 4,
    "rule_relevance": 100,
    "rule_identifier": "proprietary-license_489.RULE",
    "license_expression": "proprietary-license"
  },

  {
    "score": 100.0,
    "matcher": "2-aho",
    "end_line": 101,
    "rule_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/proprietary-license_490.RULE",
    "start_line": 101,
    "matched_text": "  --enable-nonfree         allow use of nonfree code, the resulting libs",
    "match_coverage": 100.0,
    "matched_length": 2,
    "rule_relevance": 100,
    "rule_identifier": "proprietary-license_490.RULE",
    "license_expression": "proprietary-license"
  }

There are problems with those rules that are addressed in the SCTK issue, but the only way I could investigate the problem was to download the actual FFmpeg project and search for the files that contained the the matched_text myself. That information should have been in both the scan results and presented in some logical way in the SCIO UI. Consider the simple use case of an analyst seeing a generated license expression in SCIO and wondering where in the code the associated licenses were actually detected.

I am assuming that SCTK actually has the path name but it is not being captured by SCIO; if that is not the case, then this issue needs to be raised upstream in SCTK as well.

Initially assigning this to @AyanSinhaMahapatra but feel free to re-assign if appropriate.

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions