Skip to content

Commit d264aec

Browse files
committed
Update CHANGELOG
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
1 parent 05d163a commit d264aec

File tree

1 file changed

+91
-85
lines changed

1 file changed

+91
-85
lines changed

CHANGELOG.rst

Lines changed: 91 additions & 85 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@ Changelog
22
=========
33

44
v33.0.0 (next next, roadmap)
5+
56
----------------------------
67

78

@@ -14,6 +15,19 @@ v33.0.0 (next next, roadmap)
1415
v32.0.0 (next, roadmap)
1516
-----------------------
1617

18+
Important API changes:
19+
~~~~~~~~~~~~~~~~~~~~~~
20+
21+
This is a major release with major API and output format changes and signicant
22+
feature updates.
23+
24+
In particular changed to the output format for the licenses and packages, and
25+
we changed some of the command line options.
26+
27+
The output format version is now 3.0.0
28+
29+
30+
1731
Package detection:
1832
~~~~~~~~~~~~~~~~~~
1933

@@ -52,109 +66,101 @@ Package detection:
5266

5367
License detection:
5468
~~~~~~~~~~~~~~~~~~~
55-
56-
- There is a major update to license detection where we now combine one or
57-
matches in a larger license detecion. This remove a larger number of false
58-
positive or ambiguous license detections.
59-
69+
70+
- This is a major update to license detection where we now combine one or more
71+
license matches in a larger license detection. This approach improves the
72+
accuracy of license detection and removes a larger number of false positive
73+
or ambiguous license detections. See for details
6074
https://github.com/nexB/scancode-toolkit/issues/2878
6175

6276
- The data structure of the JSON output has changed for licenses at file level:
6377

64-
- The previously used ``licenses`` attribute is deleted.
78+
- The``licenses`` attribute is deleted.
6579

66-
- To replace the ``licenses`` attribute, a new ``license_detections`` attribute
67-
is added at the file-level with the license detections in that file.
68-
This has three data attributes: ``license_expression``, ``detection_log``
69-
and ``matches``. Here ``matches`` is similar to previous ``licenses``
70-
with some additional changes in data structure as detailed in the
71-
following sections.
72-
73-
- A new attribute ``license_clues`` is added, which has license matches with the
74-
same data structure as the ``matches`` field in ``license_detections``.
75-
This has license matches which are mere clues and not proper detections.
76-
77-
- The ``license_expressions`` field is removed, which was a list of license
78-
expressions and it is replaced with ``detected_license_expression`` which
79-
is a single license expression. Similarly ``spdx_license_expressions`` was
80-
removed and replaced by ``detected_license_expression_spdx``.
81-
82-
- See `license updates doc <https://scancode-toolkit.readthedocs.io/en/latest/explanations/license-detection-reference.html#change-in-license-data-format-resource>`_
83-
for examples and more details.
84-
85-
- Similarly the data structure of license fields in ``package_data`` and the
86-
codebase level ``packages`` has also changed:
87-
88-
- There is a ``license_detections`` attribute with the detections, same as the
89-
file ``license_detections`` attribute, and there is also a
90-
``other_license_detections`` attribute. Here ``license_detections`` has
91-
the detections for the primary/declared licenses, and the rest of the
92-
secondary detecions are at ``other_license_detections``.
93-
94-
- The ``license_expression`` field has been dropped, and instead we have
95-
``declared_license_expression`` and ``other_license_expression`` fields
96-
with their SPDX counterparts: ``declared_license_expression_spdx`` and
97-
``other_license_expression_spdx``.
98-
99-
- The ``declared_license`` field also has been renamed to
100-
``extracted_license_statement``, and previously this ``declared_license``
101-
field could be a list, a dict or a string, but now
102-
``extracted_license_statement`` is always a string.
103-
104-
See `license updates doc <https://scancode-toolkit.readthedocs.io/en/latest/explanations/license-detection-reference.html#change-in-license-data-format-package>`_
105-
for examples and more details.
106-
107-
- The data structure of License matches has also changed: for every license match
108-
we previously had the attribute ``key`` i.e. a license key, but now we have
109-
``license_expression`` instead. So we now return match details once for each
110-
matched license expression rather than once for each license in a matched expression.
111-
We also have a flat data structure inside ``matches`` instead of the ``matched_rule``
112-
data dictionary, and the ``licenses`` now contains data for all the licenses present in the
113-
license expression. See `license updates doc <https://scancode-toolkit.readthedocs.io/en/latest/explanations/license-detection-reference.html#licensematch-result-data>`_
114-
for examples and more details.
115-
116-
- There is a new command line option ``--licenses-reference`` which would add license
117-
data as reference for all the license detections. This option would add two
118-
codebase level attributes: ``license_references`` and ``rule_references``,
119-
which are lists of license and rules respectively. This also removes the corresponding
120-
fields from ``matches`` in ``license_detections`` as they are referenced in these
121-
two codebase level fields. This also removes duplication as license/rule data is
122-
given only once across the scan and not at every license match.
123-
See `license updates doc <https://scancode-toolkit.readthedocs.io/en/latest/explanations/license-detection-reference.html#comparision-before-after-license-references>`_
124-
for examples and more details.
125-
126-
- There is a new ``scancode-reindex-licenses`` command that replace the
127-
``scancode --reindex-licenses`` command line option.
80+
- A new ``license_detections`` attribute contains license detections in that file.
81+
This object has three attributes: ``license_expression``, ``detection_log``
82+
and ``matches``. ``matches`` is a list of license matches and is roughly
83+
the same as ``licenses`` in the previous version with additional structure
84+
changes detailed below.
85+
86+
- A new attribute ``license_clues`` contains license matches with the
87+
same data structure as the ``matches`` attribute in ``license_detections``.
88+
This contains license matches that are mere clues and where not considered
89+
to be a proper conclusive license detection.
90+
91+
- The ``license_expressions`` list of license expressions is deleted and
92+
replaced by a ``detected_license_expression`` single expression.
93+
Similarly ``spdx_license_expressions`` was removed and replaced by
94+
``detected_license_expression_spdx``.
95+
96+
- See `license updates documentation <https://scancode-toolkit.readthedocs.io/en/latest/explanations/license-detection-reference.html#change-in-license-data-format-resource>`_
97+
for examples and details.
98+
99+
- The data structure of license attributes in ``package_data`` and the codebase
100+
level ``packages`` has been updated accordingly:
101+
102+
- There is a new ``license_detections`` attribute for the primary, top-level
103+
declared licenses of a package and an ``other_license_detections`` attribute
104+
for the other secondary detections.
105+
106+
- The ``license_expression`` is replaced by the ``declared_license_expression``
107+
and ``other_license_expression`` attributes with their SPDX counterparts
108+
``declared_license_expression_spdx`` and ``other_license_expression_spdx``.
109+
These expressions are parallel to detections.
110+
111+
- The ``declared_license`` attribute is renamed ``extracted_license_statement``
112+
and is now a YAML-encoded string.
113+
114+
See `license updates documentation <https://scancode-toolkit.readthedocs.io/en/latest/explanations/license-detection-reference.html#change-in-license-data-format-package>`_
115+
for examples and details.
116+
117+
- The license matches structure has changed: we used to report one match for each
118+
license ``key`` of a matched license expression. We now report instead one
119+
single match for each matched license expression, and list the license keys
120+
as a ``licenses`` attribute. This avoids data duplication.
121+
Inside each match, we list each match and matched rule attributred directly
122+
avoiding nesting. See `license updates doc <https://scancode-toolkit.readthedocs.io/en/latest/explanations/license-detection-reference.html#licensematch-result-data>`_
123+
for examples and details.
124+
125+
- There is a new ``--licenses-reference`` command line option to report
126+
reference license metadata and texts once for each license matched across the
127+
scan; we now have two codebase level attributes: ``license_references`` and
128+
``rule_references`` that list unique detected license and license rules.
129+
See `license updates documentation <https://scancode-toolkit.readthedocs.io/en/latest/explanations/license-detection-reference.html#comparision-before-after-license-references>`_
130+
for examples and details.
131+
132+
- We replaced the ``scancode --reindex-licenses`` command line option with a
133+
new separate command named ``scancode-reindex-licenses``.
128134

129135
- The ``--reindex-licenses-for-all-languages`` CLI option is also moved to
130136
the ``scancode-reindex-licenses`` command as an option ``--all-languages``.
131137

132-
- We can now detect licenses using custom license texts and license rules.
133-
These can be provided as a one off in a directory or packaged as a plugin
134-
for consistent reuse and deployment.
138+
- We can now detect licenses using custom license texts and license rules
139+
stored in a directory or packaged as a plugin for consistent reuse and deployment.
135140

136141
- There is an ``--additional-directory`` option with the ``scancode-reindex-licenses``
137-
command to use the licenses from the directory.
142+
command to add the licenses from a directory.
138143

139-
- There is also a ``--only-builtin`` option added to only use the builtin
140-
licenses to build the cache, once there are plugins installed with
141-
additional licenses/rules.
144+
- There is also a ``--only-builtin`` option to use ony builtin licenses
145+
ignoring any additional license plugins.
142146

143147
- See https://github.com/nexB/scancode-toolkit/issues/480 for more details.
144148

145-
- Scancode LICENSE and RULE files now also contain their data as YAML frontmatter,
146-
which previously used to be in their respective YAML files. This reduces number of
147-
files in those directories, 'rules' and 'licenses' to half. Git line history is
148-
preserved for the files.
149+
- We combined the licensedata file and text file of each license in a single
150+
file with a .LICENSE extension. The .yml data file is now included at the
151+
top of each .LICENSE file as "YAML frontmatter". The same applies to license
152+
rules and their .RULE and .yml files. This halves the number of data files
153+
from about 60,000 to 30,000. Git line history is preserved for the combined
154+
text + yml files.
149155

150-
https://github.com/nexB/scancode-toolkit/issues/3049
156+
- See https://github.com/nexB/scancode-toolkit/issues/3049
151157

152-
- A new command line option ``--get-license-data`` is added to dump license data in
153-
JSON, YAML and HTML formats, and also generates a local index and a static website
154-
to view the data. This will essentially be an API/way to get scancode license data
155-
as opposed to just reading the files.
158+
- Theer is a new ``--get-license-data`` scancode command line option to export
159+
license data in JSON, YAML and HTML, with indexes and a static website for use
160+
in the licensedb web site. This becomes the API way to getr scancode license
161+
data.
156162

157-
https://github.com/nexB/scancode-toolkit/issues/2738
163+
See https://github.com/nexB/scancode-toolkit/issues/2738
158164

159165

160166
v31.2.1 - 2022-10-05

0 commit comments

Comments
 (0)