Skip to content

Commit 4fbdb4c

Browse files
hyounes4560tdruez
andauthored
Update the custom pipeline file (#250)
* Update the custom pipeline file * Updating the text content * Updating the text content in the scanpipe-concepts file Signed-off-by: Hanan Younes <hyounes4560@conestogac.on.ca> * Add a minor update to the custom pipeline file * Fixing a line too long issue Signed-off-by: Hanan Younes <hyounes4560@conestogac.on.ca> * Minor edit to the updated documentation #237 Co-authored-by: Thomas Druez <tdruez@nexb.com>
1 parent 97779b0 commit 4fbdb4c

File tree

3 files changed

+75
-63
lines changed

3 files changed

+75
-63
lines changed

docs/custom-pipelines.rst

Lines changed: 26 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -3,29 +3,35 @@
33
Custom Pipelines
44
================
55

6-
- A pipeline is a **Python class** that lives in a Python module as a ``.py`` **file**.
6+
A Pipeline is a Python script that performs code analysis by executing a
7+
sequence of steps.
8+
9+
- A pipeline is a **Python class** that lives in a Python module as a ``.py``
10+
**file**.
711
- A pipeline class **always inherits** from the ``Pipeline`` base class
8-
:ref:`pipeline_base_class`, or from another existing pipeline class, such as the
9-
:ref:`built_in_pipelines`.
10-
- It **defines steps** using the ``steps`` classmethod.
12+
:ref:`pipeline_base_class`, or from other existing pipeline classes, such as
13+
the :ref:`built_in_pipelines`.
14+
- It **defines steps** - execution order of the steps - using the ``steps``
15+
classmethod.
1116

1217
See :ref:`pipelines_concept` for more details.
1318

1419
Pipeline registration
1520
---------------------
1621

1722
Built-in pipelines are located in :guilabel:`scanpipe/pipelines/` directory and
18-
registered during the ScanCode.io installation.
23+
are registered during the ScanCode.io installation.
1924

20-
Custom pipelines can be added as Python files ``.py`` in the directories defined in
21-
the :ref:`scancodeio_settings_pipelines_dirs` setting and will be automatically
22-
registered at runtime.
25+
Whereas custom pipelines are added as Python files ``.py`` in the directories
26+
defined in the :ref:`scancodeio_settings_pipelines_dirs` setting. Custom
27+
pipelines are registered at runtime.
2328

2429
Create a Pipeline
2530
-----------------
2631

27-
Create a new Python file ``my_pipeline.py`` in the and make sure the directory is
28-
registered in the :ref:`scancodeio_settings_pipelines_dirs` setting.
32+
Create a new Python file ``my_pipeline.py``, and make sure to include the full
33+
path of the new pipeline directory in the :ref:`scancodeio_settings_pipelines_dirs`
34+
setting.
2935

3036
.. code-block:: python
3137
@@ -48,14 +54,15 @@ registered in the :ref:`scancodeio_settings_pipelines_dirs` setting.
4854
4955
5056
.. tip::
51-
Have a look in the :guilabel:`scanpipe/pipelines/` directory for more pipeline
57+
You can view the :guilabel:`scanpipe/pipelines/` directory for more pipeline
5258
examples.
5359

5460
Modify existing Pipelines
5561
-------------------------
5662

57-
Any existing pipeline can be reused as a base and customized.
58-
You may want to override existing steps, add new ones, and remove some.
63+
Existing pipelines are flexible and can be reused as a base for custom pipelines
64+
, i.e. be customized. For instance, you can override existing steps, add new
65+
ones, or remove any of them.
5966

6067
.. code-block:: python
6168
@@ -87,14 +94,14 @@ You may want to override existing steps, add new ones, and remove some.
8794
pass
8895
8996
90-
Report step example
91-
-------------------
97+
Custom Pipeline example
98+
-----------------------
9299

93-
Example of a custom pipeline based on the built-in :ref:`pipeline_scan_codebase` one
94-
with an extra reporting step.
100+
The example below shows a custom pipeline that is based on the built-in
101+
:ref:`pipeline_scan_codebase` pipeline with an extra reporting step.
95102

96-
Add the following content to a Python file and register its directory in the
97-
:ref:`scancodeio_settings_pipelines_dirs`.
103+
Add the following code snippet to a Python file and register the path of
104+
the file's directory in the :ref:`scancodeio_settings_pipelines_dirs`.
98105

99106
.. code-block:: python
100107

docs/scanpipe-command-line.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
.. _scanpipe_command_line:
22

3-
Management Commands
4-
===================
3+
Command Line Interface
4+
======================
55

66
The main entry point is the :guilabel:`scanpipe` command which is available
77
directly when you are in the activated virtualenv or at this path:

docs/scanpipe-concepts.rst

Lines changed: 47 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -8,12 +8,13 @@ Project
88

99
A **project** encapsulates the analysis of software code:
1010

11-
- it has a **workspace** which is a directory that contains the software code files under
12-
analysis
13-
- it is related to one or more **code analysis pipelines** scripts to automate its analysis
14-
- it tracks ``Codebase Resources`` e.g. its **code files and directories**
15-
- it tracks ``Discovered Packages`` e.g. its the **system and application packages** origin and
16-
license discovered in the codebase
11+
- It has a **workspace**, which is a directory that contains the software code
12+
files under analysis.
13+
- It makes use of one or more **code analysis pipelines** scripts to automate
14+
the code analysis process.
15+
- It tracks ``Codebase Resources``, i.e. its **code files and directories**
16+
- It tracks ``Discovered Packages``, i.e. **system and application packages**
17+
origin and license discovered in the codebase.
1718

1819
In the database, **a project is identified by its unique name**.
1920

@@ -25,78 +26,82 @@ In the database, **a project is identified by its unique name**.
2526
Project workspace
2627
-----------------
2728

28-
A project workspace is the root directory where **all the project files are stored**.
29+
A project workspace is the root directory where **a project's files are stored**.
2930

30-
The following directories exists under this workspace directory:
31+
The following directories exist under the workspace directory:
3132

32-
- :guilabel:`input/` contains all the original uploaded and input files used of the project.
33-
For instance, it could be a codebase archive.
34-
- :guilabel:`codebase/` contains the files and directories (aka. resources) tracked as
35-
CodebaseResource records in the database.
36-
- :guilabel:`output/` contains all output files created by the pipelines: reports,
37-
scan results, etc.
38-
- :guilabel:`tmp/` is a scratch pad for temporary files generated during the pipelines runs.
33+
- :guilabel:`input/` contains all uploaded files used as the input of a project,
34+
such as a codebase archive.
35+
- :guilabel:`codebase/` contains files and directories - i.e. resources -
36+
tracked as CodebaseResource records in the database.
37+
- :guilabel:`output/` contains any output files created by the pipelines,
38+
including reports, scan results, etc.
39+
- :guilabel:`tmp/` is a scratch pad for temporary files generated during
40+
pipelines runs.
3941

4042
.. _pipelines_concept:
4143

4244
Pipelines
4345
---------
4446

45-
A pipeline is a Python script that contains a series of steps from start to end
46-
to execute in order to **perform a code analysis**.
47+
A pipeline is a Python script that contains a series of steps, which are
48+
executed sequentially to **perform a code analysis**.
4749

48-
It usually starts from the uploaded input files, and may extract these then
49-
generates ``CodebaseResource`` records in the database accordingly.
50+
It usually starts with the uploaded input files, which might need to be
51+
extracted first. Then, it generates ``CodebaseResource`` records in the database
52+
accordingly.
5053

5154
Those resources can then be **analyzed, scanned, and matched** as needed.
5255
Analysis results and reports are eventually posted at the end of a pipeline run.
5356

54-
All pipelines are located in the ``scanpipe.pipelines`` module.
55-
Each pipeline consist of a Python script including one subclass of the ``Pipeline`` class.
57+
All :ref:`built_in_pipelines` are located in the ``scanpipe.pipelines`` module.
58+
Each pipeline consists of a Python script and includes one subclass of the
59+
``Pipeline`` class.
5660
Each step is a method of the ``Pipeline`` class.
57-
The execution order of the steps is declared through the ``steps`` class attribute
58-
which is a sequence of steps to execute.
61+
The execution order of the steps - or the sequence of steps execution - is
62+
declared through the ``steps`` class attribute.
63+
64+
.. tip::
65+
Refer to :ref:`custom_pipelines` for adding pipelines to ScanCode.io.
5966

6067
.. note::
6168
One or more pipelines can be assigned to a project as a sequence.
6269

63-
6470
Codebase Resources
6571
------------------
6672

6773
A project ``Codebase Resources`` are records of its **code files and directories**.
6874
``CodebaseResource`` is a database model and each record is identified by its path
6975
under the project workspace.
7076

71-
Some of the ``CodebaseResource`` interesting attributes are:
77+
The following are some of the ``CodebaseResource`` attributes:
7278

73-
- a **status** used to track the analysis status for this resource.
74-
- a **type** (such as file, directory or symlink)
75-
- various attributes to track detected **copyrights**, **license expressions**,
79+
- A **status**, which is used to track the analysis status for this resource.
80+
- A **type**, such as a file, a directory or a symlink
81+
- Various attributes to track detected **copyrights**, **license expressions**,
7682
**copyright holders**, and **related packages**.
7783

7884
.. note::
79-
In general the attributes and their names are the same that are used in
80-
`ScanCode-toolkit <https://github.com/nexB/scancode-toolkit>`_ for files.
81-
85+
Please note that `ScanCode-toolkit <https://github.com/nexB/scancode-toolkit>`_
86+
use the same attributes and attribute names for files.
8287

8388
Discovered Packages
8489
-------------------
8590

8691
A project ``Discovered Packages`` are records of the **system and application packages**
87-
discovered in its code.
92+
discovered in the code unedr analysis.
8893
``DiscoveredPackage`` is a database model and each record is identified by its ``Package URL``.
89-
``Package URL`` is a grassroot efforts to create informative identifiers for software
90-
packages such as Debian, RPM, npm, Maven, or PyPI packages.
91-
See https://github.com/package-url for details.
94+
``Package URL`` is a fundamental effort to create informative identifiers for
95+
software packages, such as Debian, RPM, npm, Maven, or PyPI packages.
96+
See https://github.com/package-url for more details.
9297

93-
Some of the ``DiscoveredPackage`` interesting attributes are:
98+
The following are some of the ``DiscoveredPackage`` attributes:
9499

95-
- type, name, version (all Package URL attributes)
96-
- homepage_url, download_url and other URLs
97-
- checksums (such as SHA1, MD5)
98-
- copyright, license_expression, declared_license
100+
- A type, name, version (all Package URL attributes)
101+
- A homepage_url, download_url, and other URLs
102+
- Checksums, such as SHA1, MD5
103+
- Copyright, license_expression, and declared_license
99104

100105
.. note::
101-
In general the attributes and their names are the same that are used in
102-
`ScanCode-toolkit <https://github.com/nexB/scancode-toolkit>`_ for packages.
106+
Please note that `ScanCode-toolkit <https://github.com/nexB/scancode-toolkit>`_
107+
use the same attributes and attribute names for packages.

0 commit comments

Comments
 (0)