Skip to content

Commit 05e6af0

Browse files
Merge branch 'develop' into update-to-spdx-3.22
Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
2 parents 68ab994 + 20430c4 commit 05e6af0

File tree

272 files changed

+4675
-309
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

272 files changed

+4675
-309
lines changed

ROADMAP-ABOUTCODE.rst

Lines changed: 282 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,282 @@
1+
AboutCode global Roadmap
2+
========================
3+
4+
python-inspector
5+
Support all package manifests beyond req and setup.py
6+
7+
SCIO: ScanCode.io, pipelines for SCA
8+
-------------------------------------
9+
10+
Compositition analysis of Deployed binaries
11+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
12+
13+
Build pipelines for key tech stacks. For each of these automate the end-to-end
14+
analysis of a package binaries mapping it back to it sources and matching it
15+
upstream to its PurlDB origin:
16+
17+
- for Java
18+
- for JavaScript, CSS
19+
- for C/C++ ELFs
20+
- for C/C++ WinPE
21+
- for C/C++ Mach-O
22+
- for .Net, C#
23+
- for Golang
24+
- for Android apk
25+
- for Python
26+
- for Rust
27+
- for Ruby
28+
29+
30+
Matching pipeline
31+
~~~~~~~~~~~~~~~~~~
32+
33+
Build a dedicated pipeline to matching (client side)
34+
35+
36+
Scan TODO/Review app
37+
~~~~~~~~~~~~~~~~~~~~~
38+
39+
- Build an app in SCIO to automate flagging scan items that needs review or attention.
40+
- Create a UI and backend to organize the scan review.
41+
- Consider including and merging the "scantext" license detection review app
42+
43+
44+
Pre-built container image(s)
45+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
46+
47+
- Build and publish container images
48+
- Consider building a single image for CLI deployments
49+
- Consider publishe the app image for standalone CLI deployments
50+
51+
Package management
52+
~~~~~~~~~~~~~~~~~~~~
53+
54+
- Adopt the two levels manifests/package instances
55+
- Refactor dependencies as deps and requirements
56+
57+
58+
Deploy free analysis public server
59+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
60+
61+
- Consider sponsorship from Amazon/Google/Azure
62+
63+
Create and document standard CI/CD integrations
64+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
65+
66+
- GitHub
67+
- GitLab
68+
- Azure
69+
70+
71+
SCTK: ScanCode Toolkit
72+
-----------------------
73+
74+
License detection quality improvements
75+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
76+
77+
- Include automatic key phrases in license detection rules
78+
Use important key phrases for license detection https://github.com/nexB/scancode-toolkit/issues/2637
79+
80+
- Add required phrase automatically + unknown detection in licenses plus testing
81+
- More license detection bugs reported recently
82+
83+
- Detect summary for all packages, and populate more package fields correctly like copyright/holders
84+
85+
- We can report the declared license and other licenses in the license summary
86+
of a full scan. The primary license is based; next is to do the
87+
same across each package found nested in a scanned codebase. And also compute
88+
an individual license clarity score for each these.
89+
90+
91+
- license expression simplify and license expression category
92+
93+
94+
Improve package detection
95+
~~~~~~~~~~~~~~~~~~~~~~~~~~
96+
97+
- Create synthethic, private packages from non-packaged files based on license and copyright
98+
- Create simplified purl-only lightweight package detection
99+
- Evolve model for dependencies towards requirements and true dependencies
100+
- Track private non-published packages
101+
102+
Primary copyright detection for packages
103+
104+
- This is closely tied to the primary license detection and should focus
105+
on package manifests and key files.
106+
- Support copyright parsing from all package ecosystems.
107+
108+
109+
110+
Published improved release packagings/bundles/installers
111+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
112+
113+
- Publish smaller wheels with a single focus for easier integration as a library
114+
115+
- Release self-contained app(s) for ease of use, bundled with a Python and everything on it:
116+
117+
- extractcode
118+
- scancode proper
119+
- packagedcode only
120+
- licensedcode only
121+
- cluecode only
122+
123+
- Adopt Python 3.12
124+
- Adopt macOS and Linux on ARM
125+
126+
127+
ABCTK: AboutCode Toolkit
128+
----------------------------
129+
130+
- add support for patterns for docoumented resources
131+
- add support for exclude for docoumented resources
132+
- document deployed resource for a development resource
133+
134+
135+
PURLDB: PurlDB
136+
----------------
137+
138+
- purl2all: On demand indexing for all supported package ecosystems
139+
- purl2sym: Collect source and binary symbols
140+
- index-time matching to find the true origin
141+
- implement multi-tier indexing: purl/metadata/archive/files
142+
- MatchCode matching engine
143+
144+
- embed a SCIO with a matching pipeline for match a whole codebase at once
145+
- expore new endpoint for matching whole codebase
146+
- support multiple SCIO workers for indexing
147+
- implement proper ranking of matched code results
148+
- refactor directory matching to be a pre-matching step to file matching
149+
150+
151+
VCIO: VulnerableCode.io
152+
------------------------
153+
154+
- Adopt VulnTotal model throughout
155+
- Log advisory history
156+
- Add vulnerable code reachability
157+
- Add vulnerable code required context/config
158+
- Add more upstream resources
159+
- Deploy purlsync public pilot
160+
161+
162+
PURL: purl and vers specs
163+
--------------------------
164+
165+
- Merge and advertize vers spec.
166+
- Standardize purl with ECMA
167+
168+
169+
INSPECTORS: misc package and technology inspectors
170+
----------------------------------------------------
171+
172+
- Universal Inspector/DependentCode
173+
174+
- Resolve any purl dependencies
175+
- Non-vulnerable dependency resolution
176+
177+
- Inspector for Java and Android DEX
178+
179+
- Decompile and collect binary symbols.
180+
- Collect source symbols
181+
- Resolve dependencies for Gradle, SBT and Maven.
182+
183+
- Inspector for JavaScript, CSS
184+
185+
- Decompile/deminify and collect bundled and minified symbols.
186+
- Analyze map files
187+
- Collect source symbols
188+
- Resolve dependencies for npm, yarn and pnpm.
189+
190+
- Inspector for C/C++
191+
- Collect source symbols
192+
193+
- Inspector for ELFs
194+
195+
- Decompile and collect binary symbols.
196+
- Collect DWARFs and ELFs section symbols
197+
- Resolve dependencies for pkgconfig and ldd
198+
199+
- Inspector for WinPE
200+
201+
- Decompile and collect binary symbols.
202+
- Collect winpdb symbols
203+
204+
- Inspector for Mach-O
205+
206+
- Decompile and collect binary symbols.
207+
- Collect DWARFs and ELFs section symbols
208+
209+
- Inspector for .Net, C#
210+
211+
- Decompile and collect binary symbols from assemblies (see also WinPE)
212+
- Collect source symbols
213+
- Resolve dependencies for nuget/dotnet (completed)
214+
215+
- Inspector for Golang
216+
217+
- Decompile and collect binary symbols from pclntab
218+
- Collect source symbols
219+
- Resolve dependencies
220+
221+
- Inspector for Python
222+
223+
- Decompile and collect binary symbols from bytecode
224+
- Collect source symbols
225+
- Resolve dependencies (completed)
226+
227+
- Inspector for Rust
228+
229+
- Decompile and collect binary symbols
230+
- Collect source symbols
231+
- Resolve dependencies
232+
233+
- Inspector for Swift
234+
235+
- Decompile and collect binary symbols
236+
- Collect source symbols
237+
- Resolve dependencies
238+
239+
- Inspector for Dart/Flutter
240+
241+
- Decompile and collect binary symbols
242+
- Collect source symbols
243+
- Resolve dependencies
244+
245+
- Inspector for Ruby
246+
247+
- Collect source symbols
248+
- Resolve dependencies
249+
250+
- Inspector for Debian
251+
252+
- Parse Debian formats (completed)
253+
- Parse installed database (completed)
254+
- Compare versions (completed)
255+
- Resolve dependencies
256+
257+
- Inspector for Alpine
258+
259+
- Parse Alpine formats (completed)
260+
- Parse installed database (completed)
261+
- Compare versions (completed)
262+
- Resolve dependencies
263+
264+
- Inspector for RPM
265+
266+
- Parse RPM formats (partially completed)
267+
- Parse installed database (completed)
268+
- Compare versions (completed)
269+
- Resolve dependencies
270+
271+
- Inspector for containers
272+
273+
- Parse container images formats and manifests (completed)
274+
275+
276+
Other libraries
277+
-----------------
278+
279+
- FetchCode: support all supported package ecosystems, use in purlDB and SCIO
280+
- univers: support all supported package ecosystems
281+
- license-expression : update to support latest SPDX updates, auto-update bundled licenses
282+

ROADMAP.rst

Lines changed: 19 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -15,23 +15,33 @@ to distinguish the forest from the trees. Therefore reporting the primary
1515
license detection is important: when we get scan results, we can often
1616
get 30 licenses for a single a package and this volume is a problem
1717
even if it is correct and it is technically correct.
18+
1819
The goal of this improvement is to:
1920

20-
- combine multiple related license matches in a single license detection
21+
- Combine multiple related license matches in a single license detection.
2122

22-
- in a license detection, expose a primary license expression in addition
23+
- In a license detection, expose a primary license expression in addition
2324
to the complete, full license expression.
24-
25-
- make the logic of selection of the primary license visible, at the minimum
26-
with a log of combination and primary license selection operations
25+
26+
- Make the logic of selection of the primary license visible, at the minimum
27+
with a log of combination and primary license selection operations.
2728

2829
This is for SCTK first.
2930

30-
Status: This has been completed in SCTK and also included in SCIO. We use
31+
Status:
32+
33+
This has been completed in SCTK and also included in SCIO. We use
3134
an updated --summary option and a new license clarity score for this.
3235
We also have LicenseDetections for resources/packages and a top level
3336
unique license detections as a summary.
3437

38+
Next steps:
39+
40+
- We can report the declared license and other licenses in the license summary
41+
of a full scan. The primary license is based; next is to do the
42+
same across each package found nested in a scanned codebase. And also compute
43+
an individual license clarity score for each these.
44+
3545

3646
2. Package files.
3747
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -71,6 +81,8 @@ This is completed in SCTK.
7181

7282
This is the same issue as for primary license, but for holders
7383

84+
This has not been completed. This is less critical to complete as the tracing
85+
is much simpler and can be done manually in the rare cases where this is needed.
7486

7587

7688
Roadmap
@@ -128,4 +140,4 @@ Roadmap
128140
- Revamp how common list of suprrious licenses are detected (this is a bug)
129141
- Use important key phrases for license detection https://github.com/nexB/scancode-toolkit/issues/2637
130142

131-
This is mostly completed, for follow up see https://github.com/nexB/scancode-toolkit/issues/2878.
143+
This is mostly completed, for follow up see https://github.com/nexB/scancode-toolkit/issues/2878

src/cluecode/copyrights.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -886,6 +886,8 @@ def build_detection_from_node(
886886
# of a copyright statement
887887
(r'^neither$', 'JUNK'),
888888
(r'^nor$', 'JUNK'),
889+
890+
(r'^data-.*$', 'JUNK'),
889891

890892
(r'^providing$', 'JUNK'),
891893
(r'^Execute$', 'JUNK'),

0 commit comments

Comments
 (0)