Skip to content

Commit ba4695a

Browse files
JonoYangtdruez
andauthored
Upgrade ScanCode-toolkit to version v31 #411
* Upgrade scancode-toolkit to latest beta release #411 Signed-off-by: Thomas Druez <tdruez@nexb.com> * Add a test class to regen test data #411 Signed-off-by: Thomas Druez <tdruez@nexb.com> * Upgrade container_inspector to latest 31.0.0 version #411 Signed-off-by: Thomas Druez <tdruez@nexb.com> * Handle new scan format in scancode pipes #411 Signed-off-by: Jono Yang <jyang@nexb.com> * Handle package_uids for DiscoveredPackages #411 * Remove create_discovered_packages2 and create_codebase_resources2 Signed-off-by: Jono Yang <jyang@nexb.com> * Update deprecated code #411 * Normalize package_uids before comparing results in tests * Update expected test results Signed-off-by: Jono Yang <jyang@nexb.com> * Regenerate asgiref 3.3.0 test data #411 * Mark ProjectCodebase tests with expectedFailure * We will revisit ProjectCodebase and update it to fit our current models Signed-off-by: Jono Yang <jyang@nexb.com> * Add asgiref-3.3.0_scancode_scan.json #411 * We are using a scancode scan results for tests since asgiref-3.3.0_scan.json is not exactly the same format as scancode's json output Signed-off-by: Jono Yang <jyang@nexb.com> * Add asgiref-3.3.0_walk_test_fixtures.json #411 * Update regen_test_data.py to generate asgiref-3.3.0_walk_test_fixtures.json Signed-off-by: Jono Yang <jyang@nexb.com> * Signed-off-by: Jono Yang <jyang@nexb.com> * Update make_results_summary() #411 * No need to explicity get license_clarity_score in make_results_summary() * Update expected test results Signed-off-by: Jono Yang <jyang@nexb.com> * Exclude system_environment from diff #411 * Add .vscode to .gitignore Signed-off-by: Jono Yang <jyang@nexb.com> * Upgrade scancode-toolkit and extractcode to latest version #411 Signed-off-by: Thomas Druez <tdruez@nexb.com> * Update package_getter #434 #438 * Adapt code from previous version of scancode-toolkit for use in the debian pipeline Signed-off-by: Jono Yang <jyang@nexb.com> * Allow packages to be created without versions #438 * Update DiscoveredPackage.create_from_data to create packages without a version Signed-off-by: Jono Yang <jyang@nexb.com> * Update expected test results Signed-off-by: Jono Yang <jyang@nexb.com> * Report DiscoveredPackage correctly in summary #411 * Ensure that DiscoveredPackages are reported one time in the scan_package pipeline summary * Add test to check key_file_packages field in the summary output Signed-off-by: Jono Yang <jyang@nexb.com> * Add test for docker pipeline for alpine #411 Signed-off-by: Jono Yang <jyang@nexb.com> * Add docker pipeline test for rpm images #411 Signed-off-by: Jono Yang <jyang@nexb.com> * Track package_uids in make_results_summary #435 * Avoid checking if package_data dictionary is already in the key_files_packages list * Keep track of package_uids instead Signed-off-by: Jono Yang <jyang@nexb.com> * Add truncated ubuntu docker image for testing #435 Signed-off-by: Jono Yang <jyang@nexb.com> * Bump scancode and commoncode versions #435 Signed-off-by: Jono Yang <jyang@nexb.com> * Update docker pipeline #435 * We now run scancode-toolkit on the docker image resources using the new --system-package option * This gives us the installed system packages in the returned scan * We use the scan to create the DiscoveredPackages and CodebaseResources * The rest of the pipeline is unchanged Signed-off-by: Jono Yang <jyang@nexb.com> * Fix code validity #411 Signed-off-by: Thomas Druez <tdruez@nexb.com> * Simplify the filtering of key_files_packages using a QuerySet #411 Signed-off-by: Thomas Druez <tdruez@nexb.com> * Remove copied code from docker.py #411 #435 * Create Docker pipeline from combining the rootfs pipeline and scan_package pipeline Signed-off-by: Jono Yang <jyang@nexb.com> * Update alpine test image and results #411 #435 * TODO: create smaller test images for ubuntu and redhat docker image tests Signed-off-by: Jono Yang <jyang@nexb.com> * Properly create multiple package instances #411 * Do not attempt to combine multiple instances of the same package * Store package_uid in extra data by itself * Add test for multiple package instances Signed-off-by: Jono Yang <jyang@nexb.com> * Sort packages in JSON output by type and name #411 * Normalize package_uid in extra_data fields Signed-off-by: Jono Yang <jyang@nexb.com> * Get file info and packages in initial scan #438 * Remove step for scanning application packages Signed-off-by: Jono Yang <jyang@nexb.com> * Revert changes to docker pipes and pipeline #438 * Check for existence of installed_file attribute before using it Signed-off-by: Jono Yang <jyang@nexb.com> * Use generic package_getter for all distros #438 * Ensure both installed_file and codebase_resource have the same checksum field before comparing them Signed-off-by: Jono Yang <jyang@nexb.com> * Use get_path() with strip_root to get paths #438 * Update mappings_keys_by_fieldname * Look for package data in package_data field instead of packages in save_scan_package_results Signed-off-by: Jono Yang <jyang@nexb.com> * Remove distro specific pipes #438 * Move get_installed_packages to rootfs.py * Use get_package_data instead of get_package_info * Rename all instances of packages to package_data when scanning for application packages * Update test docker images and test results * Add test for basic rootfs Signed-off-by: Jono Yang <jyang@nexb.com> * Use list comprehension for key_file_packages #438 Signed-off-by: Jono Yang <jyang@nexb.com> * Add package_uid field to DiscoveredPackage #411 * Update expected test results Signed-off-by: Jono Yang <jyang@nexb.com> * Add test docker image for Ubuntu #438 * Update expected test results * Remove old ubuntu.tar Signed-off-by: Jono Yang <jyang@nexb.com> * Update formatting #411 #438 Signed-off-by: Jono Yang <jyang@nexb.com> * Use smaller rpm docker image for testing #438 Signed-off-by: Jono Yang <jyang@nexb.com> * Replace ubuntu docker test image #438 Signed-off-by: Jono Yang <jyang@nexb.com> * Use purl data in update_or_create_packages #438 * Add package_uid to test package data * Update expected test result Signed-off-by: Jono Yang <jyang@nexb.com> * Bump scancode version to v31.0.0rc1 #438 #411 Signed-off-by: Jono Yang <jyang@nexb.com> * Consider all PURL fields when ordering Packages #411 #438 Signed-off-by: Jono Yang <jyang@nexb.com> * Create Packages before Resources #411 #438 * In the LoadInventory pipeline, create the DiscoveredPackages from a scan before creating the CodebaseResources Signed-off-by: Jono Yang <jyang@nexb.com> * Add test for load_inventory pipeline #411 Signed-off-by: Jono Yang <jyang@nexb.com> * Code cleanups and formatting #411 Signed-off-by: Thomas Druez <tdruez@nexb.com> * Upgrade dependencies #411 Signed-off-by: Thomas Druez <tdruez@nexb.com> * Refactor create_inventory_from_scan to remove duplicated code #411 Signed-off-by: Thomas Druez <tdruez@nexb.com> * Add changelog entry #411 Signed-off-by: Thomas Druez <tdruez@nexb.com> Co-authored-by: Thomas Druez <tdruez@nexb.com>
1 parent 777167e commit ba4695a

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

48 files changed

+197306
-2773
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@ local
4040
policies.yml
4141
*.rdb
4242
*.aof
43+
.vscode
4344

4445
# This is only created when packaging for external redistribution
4546
/thirdparty/

CHANGELOG.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,10 @@ v31.0.0 (next)
77
- WARNING: Drop support for Python 3.6 and 3.7. Add support for Python 3.10.
88
Upgrade Django to version 4.x series.
99

10+
- Upgrade ScanCode-toolkit to version v31.
11+
See https://github.com/nexB/scancode-toolkit/blob/develop/CHANGELOG.rst for an
12+
overview of the changes in v31 compared to v30.
13+
1014
- Implement run status auto-refresh using the htmx JavaScript library.
1115
The statuses of queued and running pipeline are now automatically refreshed
1216
in the project list and project details views every 10 seconds.
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
# Generated by Django 4.0.4 on 2022-06-09 18:26
2+
3+
from django.db import migrations, models
4+
5+
6+
class Migration(migrations.Migration):
7+
8+
dependencies = [
9+
('scanpipe', '0015_alter_codebaseresource_project_and_more'),
10+
]
11+
12+
operations = [
13+
migrations.AddField(
14+
model_name='discoveredpackage',
15+
name='package_uid',
16+
field=models.CharField(blank=True, help_text='Unique identifier for this package.', max_length=1024),
17+
),
18+
]

scanpipe/models.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1726,6 +1726,11 @@ class DiscoveredPackage(
17261726
blank=True,
17271727
help_text=_("A list of dependencies for this package."),
17281728
)
1729+
package_uid = models.CharField(
1730+
max_length=1024,
1731+
blank=True,
1732+
help_text=_("Unique identifier for this package."),
1733+
)
17291734

17301735
# `AbstractPackage` model overrides:
17311736
keywords = models.JSONField(default=list, blank=True)
@@ -1769,7 +1774,7 @@ def create_from_data(cls, project, package_data):
17691774
If one of the values of the required fields is not available, a "ProjectError"
17701775
is created instead of a new DiscoveredPackage instance.
17711776
"""
1772-
required_fields = ["type", "name", "version"]
1777+
required_fields = ["type", "name"]
17731778
missing_values = [
17741779
field_name
17751780
for field_name in required_fields

scanpipe/pipelines/docker.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,12 +20,12 @@
2020
# ScanCode.io is a free software code scanning tool from nexB Inc. and others.
2121
# Visit https://github.com/nexB/scancode.io for support and download.
2222

23-
from scanpipe.pipelines import root_filesystems
23+
from scanpipe.pipelines.root_filesystems import RootFS
2424
from scanpipe.pipes import docker
2525
from scanpipe.pipes import rootfs
2626

2727

28-
class Docker(root_filesystems.RootFS):
28+
class Docker(RootFS):
2929
"""
3030
A pipeline to analyze Docker images.
3131
"""

scanpipe/pipelines/load_inventory.py

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -42,15 +42,14 @@ def get_scan_json_input(self):
4242
Locates a JSON scan input from a project's input/ directory.
4343
"""
4444
inputs = list(self.project.inputs(pattern="*.json"))
45+
4546
if len(inputs) != 1:
4647
raise Exception("Only 1 JSON input file supported")
48+
4749
self.input_location = str(inputs[0].absolute())
4850

4951
def build_inventory_from_scan(self):
5052
"""
51-
Processes a given JSON scan input to populate codebase resources and packages.
53+
Processes a JSON Scan results file to populate codebase resources and packages.
5254
"""
53-
project = self.project
54-
scanned_codebase = scancode.get_virtual_codebase(project, self.input_location)
55-
scancode.create_codebase_resources(project, scanned_codebase)
56-
scancode.create_discovered_packages(project, scanned_codebase)
55+
scancode.create_inventory_from_scan(self.project, self.input_location)

scanpipe/pipelines/scan_codebase.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,6 @@
2222

2323
from scanpipe import pipes
2424
from scanpipe.pipelines import Pipeline
25-
from scanpipe.pipes import output
2625
from scanpipe.pipes import rootfs
2726
from scanpipe.pipes import scancode
2827
from scanpipe.pipes.input import copy_inputs

scanpipe/pipelines/scan_package.py

Lines changed: 7 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -56,13 +56,9 @@ def steps(cls):
5656
"--license-text",
5757
"--package",
5858
"--url",
59-
] + [
6059
"--classify",
61-
"--consolidate",
6260
"--is-license-text",
63-
"--license-clarity-score",
6461
"--summary",
65-
"--summary-key-files",
6662
]
6763

6864
def get_package_archive_input(self):
@@ -102,33 +98,31 @@ def run_scancode(self):
10298
"""
10399
Scans extracted codebase/ content.
104100
"""
105-
self.scan_output = self.project.get_output_file_path("scancode", "json")
101+
scan_output_path = self.project.get_output_file_path("scancode", "json")
102+
self.scan_output_location = str(scan_output_path.absolute())
106103

107104
with self.save_errors(scancode.ScancodeError):
108105
scancode.run_scancode(
109106
location=str(self.project.codebase_path),
110-
output_file=str(self.scan_output),
107+
output_file=self.scan_output_location,
111108
options=self.scancode_options,
112109
raise_on_error=True,
113110
)
114111

115-
if not self.scan_output.exists():
112+
if not scan_output_path.exists():
116113
raise FileNotFoundError("ScanCode output not available.")
117114

118115
def build_inventory_from_scan(self):
119116
"""
120-
Processes the JSON scan results to determine resources and packages.
117+
Processes a JSON Scan results file to populate codebase resources and packages.
121118
"""
122-
project = self.project
123-
scanned_codebase = scancode.get_virtual_codebase(project, str(self.scan_output))
124-
scancode.create_codebase_resources(project, scanned_codebase)
125-
scancode.create_discovered_packages(project, scanned_codebase)
119+
scancode.create_inventory_from_scan(self.project, self.scan_output_location)
126120

127121
def make_summary_from_scan_results(self):
128122
"""
129123
Builds a summary in JSON format from the generated scan results.
130124
"""
131-
summary = scancode.make_results_summary(self.project, str(self.scan_output))
125+
summary = scancode.make_results_summary(self.project, self.scan_output_location)
132126
output_file = self.project.get_output_file_path("summary", "json")
133127

134128
with output_file.open("w") as summary_file:

scanpipe/pipes/__init__.py

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -29,8 +29,6 @@
2929

3030
from django.db.models import Count
3131

32-
from packageurl import normalize_qualifiers
33-
3432
from scanpipe.models import CodebaseResource
3533
from scanpipe.models import DiscoveredPackage
3634
from scanpipe.pipes import scancode
@@ -73,12 +71,19 @@ def update_or_create_package(project, package_data, codebase_resource=None):
7371
"""
7472
Gets, updates or creates a DiscoveredPackage then returns it.
7573
Uses the `project` and `package_data` mapping to lookup and creates the
76-
DiscoveredPackage using its Package URL as a unique key.
74+
DiscoveredPackage using its Package URL and package_uid as a unique key.
7775
"""
7876
purl_data = DiscoveredPackage.extract_purl_data(package_data)
77+
package_uid = package_data.get("package_uid")
78+
purl_data_and_package_uid = {
79+
**purl_data,
80+
"package_uid": package_uid,
81+
}
7982

8083
try:
81-
package = DiscoveredPackage.objects.get(project=project, **purl_data)
84+
package = DiscoveredPackage.objects.get(
85+
project=project, **purl_data_and_package_uid
86+
)
8287
except DiscoveredPackage.DoesNotExist:
8388
package = None
8489

scanpipe/pipes/alpine.py

Lines changed: 0 additions & 32 deletions
This file was deleted.

0 commit comments

Comments
 (0)