Skip to content

Commit 73dc7c8

Browse files
Merge branch 'develop' into v31.2.3-branch-hotfix
Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
2 parents c14d1d9 + 0aa964e commit 73dc7c8

File tree

71,544 files changed

+1193592
-356978
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

71,544 files changed

+1193592
-356978
lines changed

.gitignore

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -95,3 +95,20 @@ tcl
9595
/00-*.txt
9696
/z-todo-licenses-*
9797

98+
# Extra ignores from licensedb
99+
*.pyc
100+
*.db
101+
.installed.cfg
102+
parts
103+
develop-eggs
104+
eggs
105+
downloads
106+
.settings
107+
TAGS
108+
Procfile
109+
local.cfg
110+
geckodriver.log
111+
var
112+
.metaflow
113+
selenium
114+
/dist/

CHANGELOG.rst

Lines changed: 148 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,36 @@
11
Changelog
22
=========
33

4+
v33.0.0 (next next, roadmap)
45

6+
----------------------------
57

6-
v32.0.0 (next next, roadmap)
7-
----------------------------------
8-
9-
Package detection:
10-
~~~~~~~~~~~~~~~~~~
118

129
- We now support new package manifest formats:
1310

1411
- OpenWRT packages.
1512
- Yocto/BitBake .bb recipes.
1613

14+
15+
v32.0.0 (next, roadmap)
16+
-----------------------
17+
18+
Important API changes:
19+
~~~~~~~~~~~~~~~~~~~~~~
20+
21+
This is a major release with major API and output format changes and signicant
22+
feature updates.
23+
24+
In particular changed to the output format for the licenses and packages, and
25+
we changed some of the command line options.
26+
27+
The output format version is now 3.0.0.
28+
29+
30+
31+
Package detection:
32+
~~~~~~~~~~~~~~~~~~
33+
1734
- Update ``GemfileLockParser`` to track the gem which the Gemfile.lock is for,
1835
which we assign to the new ``GemfileLockParser.primary_gem`` field. Update
1936
``GemfileLockHandler.parse()`` to handle the case where there is a primary gem
@@ -39,20 +56,135 @@ Package detection:
3956

4057
https://github.com/nexB/scancode-toolkit/issues/3081
4158

42-
License detection:
43-
~~~~~~~~~~~~~~~~~~~
59+
- Code for parsing a Maven POM, npm package.json, freebsd manifest and haxelib
60+
JSON have been separated into two functions: one that creates a PackageData
61+
object from the parsed Resource, and another that calls the previous function
62+
and yields the PackageData. This was done such that we can use the package
63+
manifest data parsing code outside of the scancode-toolkit context in other
64+
libraries.
4465

45-
- There is a major update to license detection where we now combine one or
46-
matches in a larger license detecion. This remove a larger number of false
47-
positive or ambiguous license detections.
4866

67+
License detection:
68+
~~~~~~~~~~~~~~~~~~~
4969

50-
- The data structure of the JSON output has changed for licenses. We now
51-
return match details once for each matched license expression rather than
52-
once for each license in a matched expression. There is a new top-level
53-
"license_references" attribute that contains the data details for each
54-
detected license only once. This data can contain the reference license text
55-
as an option.
70+
- The SPDX license list has been updated to the latest v3.19
71+
72+
- This is a major update to license detection where we now combine one or more
73+
license matches in a larger license detection. This approach improves the
74+
accuracy of license detection and removes a larger number of false positive
75+
or ambiguous license detections. See for details
76+
https://github.com/nexB/scancode-toolkit/issues/2878
77+
78+
- There is a new ``license_detections`` codebase level attribute with all the
79+
unique license detections in the whole scan, both in resources and packages.
80+
This has the 3 attributes also present in package/resource level license
81+
detections: ``license_expression``, ``matches`` and ``detection_log`` and has
82+
two additional attributes:
83+
84+
- ``identifier``: which is the ``license_expression`` with an UUID created out
85+
of the detection contents and is the same for same detections.
86+
87+
- ``count``: Number of times in the codebase this unique license detection
88+
was encountered.
89+
90+
- The data structure of the JSON output has changed for licenses at file level:
91+
92+
- The ``licenses`` attribute is deleted.
93+
94+
- A new ``for_license_detections`` attribute is aded which references the codebase
95+
level unique license detections, and this is a list of ``identifer`` strings from
96+
the codebase level license detections it references.
97+
98+
- A new ``license_detections`` attribute contains license detections in that file.
99+
This object has three attributes: ``license_expression``, ``detection_log``
100+
and ``matches``. ``matches`` is a list of license matches and is roughly
101+
the same as ``licenses`` in the previous version with additional structure
102+
changes detailed below.
103+
104+
- A new attribute ``license_clues`` contains license matches with the
105+
same data structure as the ``matches`` attribute in ``license_detections``.
106+
This contains license matches that are mere clues and where not considered
107+
to be a proper conclusive license detection.
108+
109+
- The ``license_expressions`` list of license expressions is deleted and
110+
replaced by a ``detected_license_expression`` single expression.
111+
Similarly ``spdx_license_expressions`` was removed and replaced by
112+
``detected_license_expression_spdx``.
113+
114+
- See `license updates documentation <https://scancode-toolkit.readthedocs.io/en/latest/explanations/license-detection-reference.html#change-in-license-data-format-resource>`_
115+
for examples and details.
116+
117+
- The data structure of license attributes in ``package_data`` and the codebase
118+
level ``packages`` has been updated accordingly:
119+
120+
- There is a new ``license_detections`` attribute for the primary, top-level
121+
declared licenses of a package and an ``other_license_detections`` attribute
122+
for the other secondary detections.
123+
124+
- The ``license_expression`` is replaced by the ``declared_license_expression``
125+
and ``other_license_expression`` attributes with their SPDX counterparts
126+
``declared_license_expression_spdx`` and ``other_license_expression_spdx``.
127+
These expressions are parallel to detections.
128+
129+
- The ``declared_license`` attribute is renamed ``extracted_license_statement``
130+
and is now a YAML-encoded string.
131+
132+
See `license updates documentation <https://scancode-toolkit.readthedocs.io/en/latest/explanations/license-detection-reference.html#change-in-license-data-format-package>`_
133+
for examples and details.
134+
135+
- The license matches structure has changed: we used to report one match for each
136+
license ``key`` of a matched license expression. We now report instead one
137+
single match for each matched license expression, and list the license keys
138+
as a ``licenses`` attribute. This avoids data duplication.
139+
Inside each match, we list each match and matched rule attributred directly
140+
avoiding nesting. See `license updates doc <https://scancode-toolkit.readthedocs.io/en/latest/explanations/license-detection-reference.html#licensematch-result-data>`_
141+
for examples and details.
142+
143+
- There are new and codebase level attributes default with `--licenses` to report
144+
reference license metadata and texts once for each license matched across the
145+
scan; we now have two codebase level attributes: ``license_references`` and
146+
``license_rule_references`` that list unique detected license and license rules.
147+
for examples and details. This reference data is also removed from license matches
148+
in all levels i.e. from codebase, package and resource level license detections and
149+
resource level license clues.
150+
See `license updates documentation <https://scancode-toolkit.readthedocs.io/en/latest/explanations/license-detection-reference.html#comparision-before-after-license-references>`_
151+
152+
- We replaced the ``scancode --reindex-licenses`` command line option with a
153+
new separate command named ``scancode-reindex-licenses``.
154+
155+
- The ``--reindex-licenses-for-all-languages`` CLI option is also moved to
156+
the ``scancode-reindex-licenses`` command as an option ``--all-languages``.
157+
158+
- We can now detect licenses using custom license texts and license rules
159+
stored in a directory or packaged as a plugin for consistent reuse and deployment.
160+
161+
- There is an ``--additional-directory`` option with the ``scancode-reindex-licenses``
162+
command to add the licenses from a directory.
163+
164+
- There is also a ``--only-builtin`` option to use ony builtin licenses
165+
ignoring any additional license plugins.
166+
167+
- See https://github.com/nexB/scancode-toolkit/issues/480 for more details.
168+
169+
- We combined the licensedata file and text file of each license in a single
170+
file with a .LICENSE extension. The .yml data file is now included at the
171+
top of each .LICENSE file as "YAML frontmatter". The same applies to license
172+
rules and their .RULE and .yml files. This halves the number of data files
173+
from about 60,000 to 30,000. Git line history is preserved for the combined
174+
text + yml files.
175+
176+
- See https://github.com/nexB/scancode-toolkit/issues/3049
177+
178+
- There is a new ``--get-license-data`` scancode command line option to export
179+
license data in JSON, YAML and HTML, with indexes and a static website for use
180+
in the licensedb web site. This becomes the API way to getr scancode license
181+
data.
182+
183+
See https://github.com/nexB/scancode-toolkit/issues/2738
184+
185+
- The deprecated "--is-license-text" option has been removed.
186+
This is now built-in with the --license-text option and --info
187+
and exposed with the "percentage_of_license_text" attribute.
56188

57189

58190
v31.2.4 - 2023-01-09

Dockerfile

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -37,9 +37,10 @@ WORKDIR /scancode-toolkit
3737
# Copy sources into docker container
3838
COPY . /scancode-toolkit
3939

40-
# Run scancode once for initial configuration, with
41-
# --reindex-licenses to create the base license index
42-
RUN ./scancode --reindex-licenses
40+
# Initial configuration using ./configure, scancode-reindex-licenses to build
41+
# the base license index
42+
RUN ./configure \
43+
&& ./venv/bin/scancode-reindex-licenses
4344

4445
# Add scancode to path
4546
ENV PATH=/scancode-toolkit:$PATH

azure-pipelines.yml

Lines changed: 22 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ jobs:
3333
--ignore=tests/licensedcode/test_detection_datadriven2.py \
3434
--ignore=tests/licensedcode/test_detection_datadriven3.py \
3535
--ignore=tests/licensedcode/test_detection_datadriven4.py \
36+
--ignore=tests/licensedcode/test_additional_license.py \
3637
tests/licensedcode
3738
3839
license_datadriven1_2: |
@@ -78,6 +79,18 @@ jobs:
7879
venv/bin/pytest -n 3 -vvs --test-suite=all \
7980
tests/licensedcode/test_zzzz_cache.py
8081
82+
# this test runs in isolation because it modifies the actual
83+
# license index with additional licenses provided by a plugin
84+
# and we use the special --test-suite=plugins marker for these
85+
# tests
86+
additional_license_combined: |
87+
venv/bin/pip install tests/licensedcode/data/additional_licenses/additional_plugin_1/
88+
venv/bin/pip install tests/licensedcode/data/additional_licenses/additional_plugin_2/
89+
venv/bin/scancode-reindex-licenses \
90+
--additional-directory tests/licensedcode/data/additional_licenses/additional_dir/
91+
venv/bin/pytest -vvs --test-suite=plugins \
92+
tests/licensedcode/test_additional_license.py
93+
8194
- template: etc/ci/azure-posix.yml
8295
parameters:
8396
job_name: ubuntu18_cpython
@@ -98,18 +111,9 @@ jobs:
98111

99112
- template: etc/ci/azure-posix.yml
100113
parameters:
101-
job_name: macos1015_cpython_1
102-
image_name: macos-10.15
103-
python_versions: ['3.8']
104-
python_architecture: x64
105-
test_suites:
106-
all: venv/bin/pytest -n 2 -vvs tests/scancode/test_cli.py
107-
108-
- template: etc/ci/azure-posix.yml
109-
parameters:
110-
job_name: macos1015_cpython_2
111-
image_name: macos-10.15
112-
python_versions: ['3.9', '3.10']
114+
job_name: ubuntu22_cpython
115+
image_name: ubuntu-22.04
116+
python_versions: ['3.8', '3.9', '3.10']
113117
python_architecture: x64
114118
test_suites:
115119
all: venv/bin/pytest -n 2 -vvs tests/scancode/test_cli.py
@@ -194,27 +198,26 @@ jobs:
194198
# Tests using a plain pip install to get the latest of all wheels
195199
################################################################################
196200

197-
198201
- template: etc/ci/azure-posix.yml
199202
parameters:
200-
job_name: ubuntu18_cpython_latest_from_pip
201-
image_name: ubuntu-18.04
203+
job_name: ubuntu22_cpython_latest_from_pip
204+
image_name: ubuntu-22.04
202205
python_versions: ['3.8', '3.9', '3.10']
203206
test_suites:
204207
all: venv/bin/pip install --upgrade-strategy eager --force-reinstall --upgrade -e .[dev] && venv/bin/pytest -n 2 -vvs tests/scancode/test_cli.py
205208

206209
- template: etc/ci/azure-posix.yml
207210
parameters:
208-
job_name: ubuntu20_cpython_latest_from_pip
209-
image_name: ubuntu-20.04
211+
job_name: ubuntu18_cpython_latest_from_pip
212+
image_name: ubuntu-18.04
210213
python_versions: ['3.8', '3.9', '3.10']
211214
test_suites:
212215
all: venv/bin/pip install --upgrade-strategy eager --force-reinstall --upgrade -e .[dev] && venv/bin/pytest -n 2 -vvs tests/scancode/test_cli.py
213216

214217
- template: etc/ci/azure-posix.yml
215218
parameters:
216-
job_name: macos1015_cpython_latest_from_pip
217-
image_name: macos-10.15
219+
job_name: ubuntu20_cpython_latest_from_pip
220+
image_name: ubuntu-20.04
218221
python_versions: ['3.8', '3.9', '3.10']
219222
test_suites:
220223
all: venv/bin/pip install --upgrade-strategy eager --force-reinstall --upgrade -e .[dev] && venv/bin/pytest -n 2 -vvs tests/scancode/test_cli.py

conftest.py

Lines changed: 17 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@
3838
################################################################################
3939
SLOW_TEST = 'scanslow'
4040
VALIDATION_TEST = 'scanvalidate'
41+
PLUGINS_TEST = 'scanplugins'
4142

4243

4344
def pytest_configure(config):
@@ -53,8 +54,14 @@ def pytest_configure(config):
5354
': Mark a ScanCode test as a validation test, super slow, long running test.',
5455
)
5556

57+
config.addinivalue_line(
58+
'markers',
59+
PLUGINS_TEST +
60+
': Mark a ScanCode test as a special CI test to tests installing additional plugins.',
61+
)
62+
5663

57-
TEST_SUITES = 'standard', 'all', 'validate'
64+
TEST_SUITES = ('standard', 'all', 'validate', 'plugins',)
5865

5966

6067
def pytest_addoption(parser):
@@ -72,9 +79,11 @@ def pytest_addoption(parser):
7279
help='Select which test suite to run: '
7380
'"standard" runs the standard test suite designed to run reasonably fast. '
7481
'"all" runs "standard" and "slow" (long running) tests. '
75-
'"validate" runs all the tests. '
82+
'"validate" runs all the tests, except the "plugins" tests. '
83+
'"plugins" runs special plugins tests. Needs extra setup, and is used only in the CI. '
7684
'Use the @pytest.mark.scanslow marker to mark a test as "slow" test. '
7785
'Use the @pytest.mark.scanvalidate marker to mark a test as a "validate" test.'
86+
'Use the @pytest.mark.scanplugins marker to mark a test as a "plugins" test.'
7887
)
7988

8089
################################################################################
@@ -87,13 +96,19 @@ def pytest_collection_modifyitems(config, items):
8796
test_suite = config.getvalue('test_suite')
8897
run_everything = test_suite == 'validate'
8998
run_slow_test = test_suite in ('all', 'validate')
99+
run_only_plugins = test_suite == 'plugins'
90100

91101
tests_to_run = []
92102
tests_to_skip = []
93103

94104
for item in items:
95105
is_validate = bool(item.get_closest_marker(VALIDATION_TEST))
96106
is_slow = bool(item.get_closest_marker(SLOW_TEST))
107+
is_plugins = bool(item.get_closest_marker(PLUGINS_TEST))
108+
109+
if is_plugins and not run_only_plugins:
110+
tests_to_skip.append(item)
111+
continue
97112

98113
if is_validate and not run_everything:
99114
tests_to_skip.append(item)

0 commit comments

Comments
 (0)