@@ -8,12 +8,13 @@ Project
8
8
9
9
A **project ** encapsulates the analysis of software code:
10
10
11
- - it has a **workspace ** which is a directory that contains the software code files under
12
- analysis
13
- - it is related to one or more **code analysis pipelines ** scripts to automate its analysis
14
- - it tracks ``Codebase Resources `` e.g. its **code files and directories **
15
- - it tracks ``Discovered Packages `` e.g. its the **system and application packages ** origin and
16
- license discovered in the codebase
11
+ - It has a **workspace **, which is a directory that contains the software code
12
+ files under analysis.
13
+ - It makes use of one or more **code analysis pipelines ** scripts to automate
14
+ the code analysis process.
15
+ - It tracks ``Codebase Resources ``, i.e. its **code files and directories **
16
+ - It tracks ``Discovered Packages ``, i.e. **system and application packages **
17
+ origin and license discovered in the codebase.
17
18
18
19
In the database, **a project is identified by its unique name **.
19
20
@@ -25,78 +26,82 @@ In the database, **a project is identified by its unique name**.
25
26
Project workspace
26
27
-----------------
27
28
28
- A project workspace is the root directory where **all the project files are stored **.
29
+ A project workspace is the root directory where **a project's files are stored **.
29
30
30
- The following directories exists under this workspace directory:
31
+ The following directories exist under the workspace directory:
31
32
32
- - :guilabel: `input/ ` contains all the original uploaded and input files used of the project.
33
- For instance, it could be a codebase archive.
34
- - :guilabel: `codebase/ ` contains the files and directories (aka. resources) tracked as
35
- CodebaseResource records in the database.
36
- - :guilabel: `output/ ` contains all output files created by the pipelines: reports,
37
- scan results, etc.
38
- - :guilabel: `tmp/ ` is a scratch pad for temporary files generated during the pipelines runs.
33
+ - :guilabel: `input/ ` contains all uploaded files used as the input of a project,
34
+ such as a codebase archive.
35
+ - :guilabel: `codebase/ ` contains files and directories - i.e. resources -
36
+ tracked as CodebaseResource records in the database.
37
+ - :guilabel: `output/ ` contains any output files created by the pipelines,
38
+ including reports, scan results, etc.
39
+ - :guilabel: `tmp/ ` is a scratch pad for temporary files generated during
40
+ pipelines runs.
39
41
40
42
.. _pipelines_concept :
41
43
42
44
Pipelines
43
45
---------
44
46
45
- A pipeline is a Python script that contains a series of steps from start to end
46
- to execute in order to **perform a code analysis **.
47
+ A pipeline is a Python script that contains a series of steps, which are
48
+ executed sequentially to **perform a code analysis **.
47
49
48
- It usually starts from the uploaded input files, and may extract these then
49
- generates ``CodebaseResource `` records in the database accordingly.
50
+ It usually starts with the uploaded input files, which might need to be
51
+ extracted first. Then, it generates ``CodebaseResource `` records in the database
52
+ accordingly.
50
53
51
54
Those resources can then be **analyzed, scanned, and matched ** as needed.
52
55
Analysis results and reports are eventually posted at the end of a pipeline run.
53
56
54
- All pipelines are located in the ``scanpipe.pipelines `` module.
55
- Each pipeline consist of a Python script including one subclass of the ``Pipeline `` class.
57
+ All :ref: `built_in_pipelines ` are located in the ``scanpipe.pipelines `` module.
58
+ Each pipeline consists of a Python script and includes one subclass of the
59
+ ``Pipeline `` class.
56
60
Each step is a method of the ``Pipeline `` class.
57
- The execution order of the steps is declared through the ``steps `` class attribute
58
- which is a sequence of steps to execute.
61
+ The execution order of the steps - or the sequence of steps execution - is
62
+ declared through the ``steps `` class attribute.
63
+
64
+ .. tip ::
65
+ Refer to :ref: `custom_pipelines ` for adding pipelines to ScanCode.io.
59
66
60
67
.. note ::
61
68
One or more pipelines can be assigned to a project as a sequence.
62
69
63
-
64
70
Codebase Resources
65
71
------------------
66
72
67
73
A project ``Codebase Resources `` are records of its **code files and directories **.
68
74
``CodebaseResource `` is a database model and each record is identified by its path
69
75
under the project workspace.
70
76
71
- Some of the ``CodebaseResource `` interesting attributes are :
77
+ The following are some of the ``CodebaseResource `` attributes:
72
78
73
- - a **status ** used to track the analysis status for this resource.
74
- - a **type ** ( such as file, directory or symlink)
75
- - various attributes to track detected **copyrights **, **license expressions **,
79
+ - A **status **, which is used to track the analysis status for this resource.
80
+ - A **type **, such as a file, a directory or a symlink
81
+ - Various attributes to track detected **copyrights **, **license expressions **,
76
82
**copyright holders **, and **related packages **.
77
83
78
84
.. note ::
79
- In general the attributes and their names are the same that are used in
80
- `ScanCode-toolkit <https://github.com/nexB/scancode-toolkit >`_ for files.
81
-
85
+ Please note that `ScanCode-toolkit <https://github.com/nexB/scancode-toolkit >`_
86
+ use the same attributes and attribute names for files.
82
87
83
88
Discovered Packages
84
89
-------------------
85
90
86
91
A project ``Discovered Packages `` are records of the **system and application packages **
87
- discovered in its code.
92
+ discovered in the code unedr analysis .
88
93
``DiscoveredPackage `` is a database model and each record is identified by its ``Package URL ``.
89
- ``Package URL `` is a grassroot efforts to create informative identifiers for software
90
- packages such as Debian, RPM, npm, Maven, or PyPI packages.
91
- See https://github.com/package-url for details.
94
+ ``Package URL `` is a fundamental effort to create informative identifiers for
95
+ software packages, such as Debian, RPM, npm, Maven, or PyPI packages.
96
+ See https://github.com/package-url for more details.
92
97
93
- Some of the ``DiscoveredPackage `` interesting attributes are :
98
+ The following are some of the ``DiscoveredPackage `` attributes:
94
99
95
- - type, name, version (all Package URL attributes)
96
- - homepage_url, download_url and other URLs
97
- - checksums ( such as SHA1, MD5)
98
- - copyright , license_expression, declared_license
100
+ - A type, name, version (all Package URL attributes)
101
+ - A homepage_url, download_url, and other URLs
102
+ - Checksums, such as SHA1, MD5
103
+ - Copyright , license_expression, and declared_license
99
104
100
105
.. note ::
101
- In general the attributes and their names are the same that are used in
102
- ` ScanCode-toolkit < https://github.com/nexB/scancode-toolkit >`_ for packages.
106
+ Please note that ` ScanCode-toolkit < https://github.com/nexB/scancode-toolkit >`_
107
+ use the same attributes and attribute names for packages.
0 commit comments