You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- The data structure of the JSON output has changed for licenses at file level:
63
77
64
-
- The previously used ``licenses`` attribute is deleted.
78
+
- The``licenses`` attribute is deleted.
65
79
66
-
- To replace the ``licenses`` attribute, a new ``license_detections`` attribute
67
-
is added at the file-level with the license detections in that file.
68
-
This has three data attributes: ``license_expression``, ``detection_log``
69
-
and ``matches``. Here ``matches`` is similar to previous ``licenses``
70
-
with some additional changes in data structure as detailed in the
71
-
following sections.
72
-
73
-
- A new attribute ``license_clues`` is added, which has license matches with the
74
-
same data structure as the ``matches`` field in ``license_detections``.
75
-
This has license matches which are mere clues and not proper detections.
76
-
77
-
- The ``license_expressions`` field is removed, which was a list of license
78
-
expressions and it is replaced with ``detected_license_expression`` which
79
-
is a single license expression. Similarly ``spdx_license_expressions`` was
80
-
removed and replaced by ``detected_license_expression_spdx``.
81
-
82
-
- See `license updates doc <https://scancode-toolkit.readthedocs.io/en/latest/explanations/license-detection-reference.html#change-in-license-data-format-resource>`_
83
-
for examples and more details.
84
-
85
-
- Similarly the data structure of license fields in ``package_data`` and the
86
-
codebase level ``packages`` has also changed:
87
-
88
-
- There is a ``license_detections`` attribute with the detections, same as the
89
-
file ``license_detections`` attribute, and there is also a
90
-
``other_license_detections`` attribute. Here ``license_detections`` has
91
-
the detections for the primary/declared licenses, and the rest of the
92
-
secondary detecions are at ``other_license_detections``.
93
-
94
-
- The ``license_expression`` field has been dropped, and instead we have
95
-
``declared_license_expression`` and ``other_license_expression`` fields
96
-
with their SPDX counterparts: ``declared_license_expression_spdx`` and
97
-
``other_license_expression_spdx``.
98
-
99
-
- The ``declared_license`` field also has been renamed to
100
-
``extracted_license_statement``, and previously this ``declared_license``
101
-
field could be a list, a dict or a string, but now
102
-
``extracted_license_statement`` is always a string.
103
-
104
-
See `license updates doc <https://scancode-toolkit.readthedocs.io/en/latest/explanations/license-detection-reference.html#change-in-license-data-format-package>`_
105
-
for examples and more details.
106
-
107
-
- The data structure of License matches has also changed: for every license match
108
-
we previously had the attribute ``key`` i.e. a license key, but now we have
109
-
``license_expression`` instead. So we now return match details once for each
110
-
matched license expression rather than once for each license in a matched expression.
111
-
We also have a flat data structure inside ``matches`` instead of the ``matched_rule``
112
-
data dictionary, and the ``licenses`` now contains data for all the licenses present in the
113
-
license expression. See `license updates doc <https://scancode-toolkit.readthedocs.io/en/latest/explanations/license-detection-reference.html#licensematch-result-data>`_
114
-
for examples and more details.
115
-
116
-
- There is a new command line option ``--licenses-reference`` which would add license
117
-
data as reference for all the license detections. This option would add two
118
-
codebase level attributes: ``license_references`` and ``rule_references``,
119
-
which are lists of license and rules respectively. This also removes the corresponding
120
-
fields from ``matches`` in ``license_detections`` as they are referenced in these
121
-
two codebase level fields. This also removes duplication as license/rule data is
122
-
given only once across the scan and not at every license match.
123
-
See `license updates doc <https://scancode-toolkit.readthedocs.io/en/latest/explanations/license-detection-reference.html#comparision-before-after-license-references>`_
124
-
for examples and more details.
125
-
126
-
- There is a new ``scancode-reindex-licenses`` command that replace the
127
-
``scancode --reindex-licenses`` command line option.
80
+
- A new ``license_detections`` attribute contains license detections in that file.
81
+
This object has three attributes: ``license_expression``, ``detection_log``
82
+
and ``matches``. ``matches`` is a list of license matches and is roughly
83
+
the same as ``licenses`` in the previous version with additional structure
84
+
changes detailed below.
85
+
86
+
- A new attribute ``license_clues`` contains license matches with the
87
+
same data structure as the ``matches`` attribute in ``license_detections``.
88
+
This contains license matches that are mere clues and where not considered
89
+
to be a proper conclusive license detection.
90
+
91
+
- The ``license_expressions`` list of license expressions is deleted and
92
+
replaced by a ``detected_license_expression`` single expression.
93
+
Similarly ``spdx_license_expressions`` was removed and replaced by
94
+
``detected_license_expression_spdx``.
95
+
96
+
- See `license updates documentation <https://scancode-toolkit.readthedocs.io/en/latest/explanations/license-detection-reference.html#change-in-license-data-format-resource>`_
97
+
for examples and details.
98
+
99
+
- The data structure of license attributes in ``package_data`` and the codebase
100
+
level ``packages`` has been updated accordingly:
101
+
102
+
- There is a new ``license_detections`` attribute for the primary, top-level
103
+
declared licenses of a package and an ``other_license_detections`` attribute
104
+
for the other secondary detections.
105
+
106
+
- The ``license_expression`` is replaced by the ``declared_license_expression``
107
+
and ``other_license_expression`` attributes with their SPDX counterparts
108
+
``declared_license_expression_spdx`` and ``other_license_expression_spdx``.
109
+
These expressions are parallel to detections.
110
+
111
+
- The ``declared_license`` attribute is renamed ``extracted_license_statement``
112
+
and is now a YAML-encoded string.
113
+
114
+
See `license updates documentation <https://scancode-toolkit.readthedocs.io/en/latest/explanations/license-detection-reference.html#change-in-license-data-format-package>`_
115
+
for examples and details.
116
+
117
+
- The license matches structure has changed: we used to report one match for each
118
+
license ``key`` of a matched license expression. We now report instead one
119
+
single match for each matched license expression, and list the license keys
120
+
as a ``licenses`` attribute. This avoids data duplication.
121
+
Inside each match, we list each match and matched rule attributred directly
122
+
avoiding nesting. See `license updates doc <https://scancode-toolkit.readthedocs.io/en/latest/explanations/license-detection-reference.html#licensematch-result-data>`_
123
+
for examples and details.
124
+
125
+
- There is a new ``--licenses-reference`` command line option to report
126
+
reference license metadata and texts once for each license matched across the
127
+
scan; we now have two codebase level attributes: ``license_references`` and
128
+
``rule_references`` that list unique detected license and license rules.
129
+
See `license updates documentation <https://scancode-toolkit.readthedocs.io/en/latest/explanations/license-detection-reference.html#comparision-before-after-license-references>`_
130
+
for examples and details.
131
+
132
+
- We replaced the ``scancode --reindex-licenses`` command line option with a
133
+
new separate command named ``scancode-reindex-licenses``.
128
134
129
135
- The ``--reindex-licenses-for-all-languages`` CLI option is also moved to
130
136
the ``scancode-reindex-licenses`` command as an option ``--all-languages``.
131
137
132
-
- We can now detect licenses using custom license texts and license rules.
133
-
These can be provided as a one off in a directory or packaged as a plugin
134
-
for consistent reuse and deployment.
138
+
- We can now detect licenses using custom license texts and license rules
139
+
stored in a directory or packaged as a plugin for consistent reuse and deployment.
135
140
136
141
- There is an ``--additional-directory`` option with the ``scancode-reindex-licenses``
137
-
command to use the licenses from the directory.
142
+
command to add the licenses from a directory.
138
143
139
-
- There is also a ``--only-builtin`` option added to only use the builtin
140
-
licenses to build the cache, once there are plugins installed with
141
-
additional licenses/rules.
144
+
- There is also a ``--only-builtin`` option to use ony builtin licenses
145
+
ignoring any additional license plugins.
142
146
143
147
- See https://github.com/nexB/scancode-toolkit/issues/480 for more details.
144
148
145
-
- Scancode LICENSE and RULE files now also contain their data as YAML frontmatter,
146
-
which previously used to be in their respective YAML files. This reduces number of
147
-
files in those directories, 'rules' and 'licenses' to half. Git line history is
148
-
preserved for the files.
149
+
- We combined the licensedata file and text file of each license in a single
150
+
file with a .LICENSE extension. The .yml data file is now included at the
151
+
top of each .LICENSE file as "YAML frontmatter". The same applies to license
152
+
rules and their .RULE and .yml files. This halves the number of data files
153
+
from about 60,000 to 30,000. Git line history is preserved for the combined
0 commit comments