Skip to content

Commit 381b2a7

Browse files
committed
Refine the Custom pipeline documentation #237
Signed-off-by: Thomas Druez <tdruez@nexb.com>
1 parent cdf6d00 commit 381b2a7

File tree

5 files changed

+87
-13
lines changed

5 files changed

+87
-13
lines changed

docs/scanpipe-pipelines.rst renamed to docs/built-in-pipelines.rst

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
1-
.. _scanpipe_pipelines:
1+
.. _built_in_pipelines:
22

3-
Pipelines
4-
=========
3+
Built-in Pipelines
4+
==================
55

66
.. _pipeline_base_class:
77

@@ -25,6 +25,8 @@ Root Filesystem Analysis
2525
.. autoclass:: scanpipe.pipelines.root_filesystems.RootFS()
2626
:members:
2727

28+
.. _pipeline_scan_codebase:
29+
2830
Scan Codebase
2931
-------------
3032
.. autoclass:: scanpipe.pipelines.scan_codebase.ScanCodebase()

docs/custom-pipelines.rst

Lines changed: 78 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -3,22 +3,29 @@
33
Custom Pipelines
44
================
55

6-
A pipeline always inherits from the ``Pipeline`` base class :ref:`pipeline_base_class`
7-
It define steps using the ``steps`` class method.
6+
- A pipeline is a **Python class** that lives in a Python module as a ``.py`` **file**.
7+
- A pipeline class **always inherits** from the ``Pipeline`` base class
8+
:ref:`pipeline_base_class`, or from another existing pipeline class, such as the
9+
:ref:`built_in_pipelines`.
10+
- It **defines steps** using the ``steps`` classmethod.
11+
12+
See :ref:`pipelines_concept` for more details.
813

914
Pipeline registration
1015
---------------------
1116

12-
Built-in pipelines are located in scanpipe/pipelines/ and registered during the
13-
ScanCode.io installation.
17+
Built-in pipelines are located in :guilabel:`scanpipe/pipelines/` directory and
18+
registered during the ScanCode.io installation.
1419

15-
Custom pipelines can be added as python files in the TBD/ directory and will be
16-
automatically registered at runtime.
20+
Custom pipelines can be added as Python files ``.py`` in the directories defined in
21+
the :ref:`scancodeio_settings_pipelines_dirs` setting and will be automatically
22+
registered at runtime.
1723

1824
Create a Pipeline
1925
-----------------
2026

21-
Create a new Python file ``my_pipeline.py`` in the TBD/ directory.
27+
Create a new Python file ``my_pipeline.py`` in the and make sure the directory is
28+
registered in the :ref:`scancodeio_settings_pipelines_dirs` setting.
2229

2330
.. code-block:: python
2431
@@ -41,7 +48,8 @@ Create a new Python file ``my_pipeline.py`` in the TBD/ directory.
4148
4249
4350
.. tip::
44-
Have a look in the scanpipe/pipelines/ directory for more pipeline examples.
51+
Have a look in the :guilabel:`scanpipe/pipelines/` directory for more pipeline
52+
examples.
4553

4654
Modify existing Pipelines
4755
-------------------------
@@ -64,7 +72,7 @@ You may want to override existing steps, add new ones, and remove some.
6472
cls.run_scancode,
6573
cls.build_inventory_from_scan,
6674
67-
# Commented-out as I'm not interested in a csv output
75+
# Commented-out as not interested in a csv output
6876
# cls.csv_output,
6977
7078
# My extra steps
@@ -77,3 +85,64 @@ You may want to override existing steps, add new ones, and remove some.
7785
7886
def extra_step2(self):
7987
pass
88+
89+
90+
Report step example
91+
-------------------
92+
93+
Example of a custom pipeline based on the built-in :ref:`pipeline_scan_codebase` one
94+
with an extra reporting step.
95+
96+
Add the following content to a Python file and register its directory in the
97+
:ref:`scancodeio_settings_pipelines_dirs`.
98+
99+
.. code-block:: python
100+
101+
from collections import defaultdict
102+
103+
from jinja2 import Template
104+
105+
from scanpipe.pipelines.scan_codebase import ScanCodebase
106+
107+
108+
class ScanAndReport(ScanCodebase):
109+
"""
110+
Run the ScanCodebase built-in pipeline steps and generate a licenses report.
111+
"""
112+
113+
@classmethod
114+
def steps(cls):
115+
return ScanCodebase.steps() + (
116+
cls.report_licenses_with_resources,
117+
)
118+
119+
# See https://jinja.palletsprojects.com/en/3.0.x/templates/ for documentation
120+
report_template = """
121+
{% for matched_text, paths in resources.items() -%}
122+
{{ matched_text }}
123+
124+
{% for path in paths -%}
125+
{{ path }}
126+
{% endfor %}
127+
128+
{% endfor %}
129+
"""
130+
131+
def report_licenses_with_resources(self):
132+
"""
133+
Retrieve codebase resources filtered by license categories,
134+
Generate a licenses report file from a template.
135+
"""
136+
categories = ["Commercial", "Copyleft"]
137+
resources = self.project.codebaseresources.licenses_categories(categories)
138+
139+
resources_by_licenses = defaultdict(list)
140+
for resource in resources:
141+
for license_data in resource.licenses:
142+
matched_text = license_data.get("matched_text")
143+
resources_by_licenses[matched_text].append(resource.path)
144+
145+
template = Template(self.report_template, lstrip_blocks=True, trim_blocks=True)
146+
report_stream = template.stream(resources=resources_by_licenses)
147+
report_file = self.project.get_output_file_path("license-report", "txt")
148+
report_stream.dump(str(report_file))

docs/index.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ you’ll find information on:
3131
:caption: Reference Documentation
3232

3333
scanpipe-concepts
34-
scanpipe-pipelines
34+
built-in-pipelines
3535
custom-pipelines
3636
scanpipe-pipes
3737
scanpipe-output

docs/scancodeio-settings.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,8 @@ of parallel processes to 4::
7474

7575
SCANCODE_DEFAULT_OPTIONS=--processes 4,--timeout 120
7676

77+
.. _scancodeio_settings_pipelines_dirs:
78+
7779
SCANCODEIO_PIPELINES_DIRS
7880
-------------------------
7981

docs/scanpipe-concepts.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@ The following directories exists under this workspace directory:
3737
scan results, etc.
3838
- :guilabel:`tmp/` is a scratch pad for temporary files generated during the pipelines runs.
3939

40+
.. _pipelines_concept:
4041

4142
Pipelines
4243
---------

0 commit comments

Comments
 (0)