3
3
Custom Pipelines
4
4
================
5
5
6
- A pipeline always inherits from the ``Pipeline `` base class :ref: `pipeline_base_class `
7
- It define steps using the ``steps `` class method.
6
+ - A pipeline is a **Python class ** that lives in a Python module as a ``.py `` **file **.
7
+ - A pipeline class **always inherits ** from the ``Pipeline `` base class
8
+ :ref: `pipeline_base_class `, or from another existing pipeline class, such as the
9
+ :ref: `built_in_pipelines `.
10
+ - It **defines steps ** using the ``steps `` classmethod.
11
+
12
+ See :ref: `pipelines_concept ` for more details.
8
13
9
14
Pipeline registration
10
15
---------------------
11
16
12
- Built-in pipelines are located in scanpipe/pipelines/ and registered during the
13
- ScanCode.io installation.
17
+ Built-in pipelines are located in :guilabel: ` scanpipe/pipelines/ ` directory and
18
+ registered during the ScanCode.io installation.
14
19
15
- Custom pipelines can be added as python files in the TBD/ directory and will be
16
- automatically registered at runtime.
20
+ Custom pipelines can be added as Python files ``.py `` in the directories defined in
21
+ the :ref: `scancodeio_settings_pipelines_dirs ` setting and will be automatically
22
+ registered at runtime.
17
23
18
24
Create a Pipeline
19
25
-----------------
20
26
21
- Create a new Python file ``my_pipeline.py `` in the TBD/ directory.
27
+ Create a new Python file ``my_pipeline.py `` in the and make sure the directory is
28
+ registered in the :ref: `scancodeio_settings_pipelines_dirs ` setting.
22
29
23
30
.. code-block :: python
24
31
@@ -41,7 +48,8 @@ Create a new Python file ``my_pipeline.py`` in the TBD/ directory.
41
48
42
49
43
50
.. tip ::
44
- Have a look in the scanpipe/pipelines/ directory for more pipeline examples.
51
+ Have a look in the :guilabel: `scanpipe/pipelines/ ` directory for more pipeline
52
+ examples.
45
53
46
54
Modify existing Pipelines
47
55
-------------------------
@@ -64,7 +72,7 @@ You may want to override existing steps, add new ones, and remove some.
64
72
cls .run_scancode,
65
73
cls .build_inventory_from_scan,
66
74
67
- # Commented-out as I'm not interested in a csv output
75
+ # Commented-out as not interested in a csv output
68
76
# cls.csv_output,
69
77
70
78
# My extra steps
@@ -77,3 +85,64 @@ You may want to override existing steps, add new ones, and remove some.
77
85
78
86
def extra_step2 (self ):
79
87
pass
88
+
89
+
90
+ Report step example
91
+ -------------------
92
+
93
+ Example of a custom pipeline based on the built-in :ref: `pipeline_scan_codebase ` one
94
+ with an extra reporting step.
95
+
96
+ Add the following content to a Python file and register its directory in the
97
+ :ref: `scancodeio_settings_pipelines_dirs `.
98
+
99
+ .. code-block :: python
100
+
101
+ from collections import defaultdict
102
+
103
+ from jinja2 import Template
104
+
105
+ from scanpipe.pipelines.scan_codebase import ScanCodebase
106
+
107
+
108
+ class ScanAndReport (ScanCodebase ):
109
+ """
110
+ Run the ScanCodebase built-in pipeline steps and generate a licenses report.
111
+ """
112
+
113
+ @ classmethod
114
+ def steps (cls ):
115
+ return ScanCodebase.steps() + (
116
+ cls .report_licenses_with_resources,
117
+ )
118
+
119
+ # See https://jinja.palletsprojects.com/en/3.0.x/templates/ for documentation
120
+ report_template = """
121
+ {% for matched_text, paths in resources.items() -%}
122
+ {{ matched_text }}
123
+
124
+ {% for path in paths -%}
125
+ {{ path }}
126
+ {% endfor %}
127
+
128
+ {% endfor %}
129
+ """
130
+
131
+ def report_licenses_with_resources (self ):
132
+ """
133
+ Retrieve codebase resources filtered by license categories,
134
+ Generate a licenses report file from a template.
135
+ """
136
+ categories = [" Commercial" , " Copyleft" ]
137
+ resources = self .project.codebaseresources.licenses_categories(categories)
138
+
139
+ resources_by_licenses = defaultdict(list )
140
+ for resource in resources:
141
+ for license_data in resource.licenses:
142
+ matched_text = license_data.get(" matched_text" )
143
+ resources_by_licenses[matched_text].append(resource.path)
144
+
145
+ template = Template(self .report_template, lstrip_blocks = True , trim_blocks = True )
146
+ report_stream = template.stream(resources = resources_by_licenses)
147
+ report_file = self .project.get_output_file_path(" license-report" , " txt" )
148
+ report_stream.dump(str (report_file))
0 commit comments