Skip to content

Commit e48c509

Browse files
GSOC 2024 Report - Pranay Das (#144)
Signed-off-by: 404-geek <pranayd61@gmail.com> Co-authored-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
1 parent ca99049 commit e48c509

File tree

2 files changed

+230
-0
lines changed

2 files changed

+230
-0
lines changed

docs/source/archive/gsoc-toc.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ GSoC 2024
1515
:maxdepth: 2
1616

1717
gsoc/reports/2024/scancode_toolkit_swastkk
18+
gsoc/reports/2024/scancodeio_scorecode_pranay
1819

1920
GSoC 2022
2021
---------
Lines changed: 229 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,229 @@
1+
==================================================
2+
Enrich SBOM data based on OSSF Security Score Card
3+
==================================================
4+
5+
6+
**Organization:** `AboutCode <https://aboutcode.org>`_
7+
8+
**Projects:**
9+
10+
- `Scancode.io <https://github.com/aboutcode-org/scancode.io>`_
11+
- `Scorecode <https://github.com/aboutcode-org/scorecode>`_
12+
13+
**Mentee:** `Pranay Das (404-geek) <https://github.com/404-geek>`_
14+
15+
**Mentors:**
16+
17+
- `Philippe Ombredanne <https://github.com/pombredanne>`_
18+
- `Ayan Sinha Mahapatra <https://github.com/AyanSinhaMahapatra>`_
19+
- `Thomas Druez <https://github.com/thomasdruez>`_
20+
- `Jonathan Yang <https://github.com/JonoYang>`_
21+
- `Tushar Goel <https://github.com/tushar-goel>`_
22+
23+
24+
--------------------------------------------------------------------------------
25+
26+
Overview
27+
--------
28+
29+
The primary objective of this project was to fetch and integrate the OpenSSF Scorecard
30+
data into the Scancode.io platform for all detected packages, thereby enhancing its
31+
capabilities for security and community health metrics analysis. The project
32+
involved work on two key repositories: `Scorecode`,which was developed as a PyPI
33+
package, and `Scancode.io`, where the integration with Scorecard data was implemented
34+
within scanning pipelines.
35+
36+
**Scorecode**
37+
38+
`scorecode package <https://pypi.org/project/scorecode/>`_ serves as a PyPI package
39+
that has the functions to fetch and store OpenSSF Scorecard data using the OpenSSF
40+
public API (`https://api.securityscorecards.dev/ <https://api.securityscorecards.dev/>`_).
41+
It also includes Django mixin models that can be extended and integrated
42+
into other platforms with databases, such as Scancode.io and PurlDB, ensuring seamless
43+
utilization of Scorecard data across various projects.
44+
45+
46+
47+
**Scancode.io**
48+
49+
In the `Scancode.io` project, I added a pipeline that interacts with the `scorecode`
50+
package to fetch and store Scorecard data in the Scancode.io database. The data can then
51+
be exported into various outputs like the Software Bill of Materials (SBOM)
52+
CycloneDX format (and SPDX too in the future), providing insights into security and
53+
community health
54+
55+
--------------------------------------------------------------------------------
56+
57+
Implementation
58+
--------------
59+
60+
**1. Scorecode Repository:**
61+
62+
- Developed a PyPI package to interact with the OpenSSF API and fetch Scorecard data
63+
to be used in other software packages and store it in appropriate objects.
64+
- Created Django mixin models to enable easy extension and integration of Scorecard
65+
data into platforms with databases like Scancode.io.
66+
67+
For more information, you can visit the
68+
`scorecode package on PyPI <https://pypi.org/project/scorecode/>`_.
69+
70+
71+
**2. Scancode.io Integration:**
72+
73+
- Developed a pipeline within Scancode.io to call `Scorecode` functions, retrieve
74+
Scorecard data, and save it in the Scancode.io database.
75+
- Enhanced the existing SBOM export functionality to include Scorecard data, allowing
76+
for detailed security posture and community health metrics analysis in CycloneDX
77+
format.
78+
79+
**4. Testing:**
80+
81+
- Conducted comprehensive testing across two primary repositories hosted on GitHub
82+
and GitLab to ensure accurate fetching, storage, and export of Scorecard data:
83+
84+
- **GitHub**:
85+
86+
- `nexB/scancode-toolkit <https://github.com/nexB/scancode-toolkit>`_
87+
- `tensorflow/tensorflow <https://github.com/tensorflow/tensorflow>`_
88+
- `apache/spark <https://github.com/apache/spark>`_
89+
90+
- **GitLab**: `gitlab-org/gitlab <https://gitlab.com/gitlab-org/gitlab>`_
91+
92+
- Verified seamless integration and accurate data retrieval across different package
93+
ecosystems supported by Scancode.io, ensuring that the Scorecard data aligns with
94+
the expected structure and content.
95+
96+
- Implemented and executed automated test cases using `pytest`, which include:
97+
98+
- Validation of key fields such as ``scoring_tool``, ``scoring_tool_version``,
99+
``score_date``, ``score``, ``scoring_tool_documentation_url``, and ``checks``.
100+
101+
- Type checks for each field to ensure data integrity.
102+
103+
- URL validation to confirm that the documentation links are correctly formatted
104+
and point to the expected resources.
105+
106+
- Added additional test cases for edge scenarios such as non-existent repositories,
107+
private repositories, and invalid input formats to ensure robustness and
108+
reliability.
109+
110+
111+
--------------------------------------------------------------------------------
112+
113+
Linked Pull Requests
114+
--------------------
115+
116+
.. list-table::
117+
:widths: 10 40 20 30
118+
:header-rows: 1
119+
120+
* - Sr. no
121+
- Name
122+
- Link
123+
- Status
124+
* - 1
125+
- Scorecard Integration
126+
- `aboutcode.org/scancode.io#1294 <https://github.com/aboutcode-org/scancode.io/pull/1294>`_
127+
- Open
128+
* - 2
129+
- Models integration
130+
- `aboutcode.org/scorecode#5 <https://github.com/aboutcode-org/scorecode/pull/5>`_
131+
- Merged
132+
* - 3
133+
- Scorcard api call integration
134+
- `aboutcode.org/scorecode#1 <https://github.com/aboutcode-org/scorecode/pull/1>`_
135+
- Merged
136+
* - 4
137+
- Mixin models for storing scorecard data
138+
- `aboutcode.org/scorecode#4 <https://github.com/aboutcode-org/scorecode/pull/4>`_
139+
- Merged
140+
141+
142+
143+
Related Issues
144+
--------------
145+
146+
.. list-table::
147+
:widths: 10 60 30
148+
:header-rows: 1
149+
150+
* - Sr. no
151+
- Name
152+
- Link
153+
* - 1
154+
- Store OSSF scorecard data in scancode.io models
155+
- `aboutcode-org/scancode.io#1283 <https://github.com/aboutcode-org/scancode.io/issues/1283>`_
156+
* - 2
157+
- Show OSSF scorecard data in the UI as quality data
158+
- `aboutcode-org/scancode.io#1284 <https://github.com/aboutcode-org/scancode.io/issues/1284>`_
159+
* - 3
160+
- Export OSSF scorecard data in SBOMs
161+
- `aboutcode-org/scancode.io#1285 <https://github.com/aboutcode-org/scancode.io/issues/1285>`_
162+
* - 4
163+
- Compute summary and clarity for EACH package in a codebase
164+
- `aboutcode-org/scorecode#3 <https://github.com/aboutcode-org/scorecode/issues/3>`_
165+
* - 5
166+
- Provide data values in scan results to correspond with license_clarity_score
167+
elements
168+
- `aboutcode-org/scorecode#2 <https://github.com/aboutcode-org/scorecode/issues/2>`_
169+
170+
171+
Project Reference Links
172+
-----------------------
173+
174+
* `Project Idea <https://github.com/aboutcode-org/aboutcode/wiki/GSOC-2024-Project-Ideas#purldbscancodeio-enrich-an-sbom-based-on-ossf-security-score-card>`_
175+
176+
* `Official GSoC project page <https://summerofcode.withgoogle.com/programs/2024/projects/kB8HkEli>`_
177+
178+
* `GSoC Proposal <https://docs.google.com/document/d/10EiGjTGR_eZExMjcxEmwmMQPt7B9i6lHc_osW4Ogm6c/edit?usp=sharing>`_
179+
180+
* `Project Board <https://github.com/orgs/nexB/projects/60/views/6>`_
181+
182+
183+
Pre GSOC Work
184+
-----------------------
185+
186+
Before GSoC officially started, I had the opportunity to contribute to the
187+
`ScanCode.io <https://github.com/aboutcode-org/scancode.io>`_ and
188+
`purldb.io <https://github.com/aboutcode-org/purldb>`_ project. During this
189+
period, I focused on enhancing various functionalities and laying the groundwork for
190+
the upcoming integration of the OpenSSF Scorecard. Below is a list of key pull requests
191+
I made:
192+
193+
- `Add endpoint to create or update a package set <https://github.com/aboutcode-org/purldb/pull/350>`_
194+
- `Fixes Github Mapper route <https://github.com/aboutcode-org/purldb/pull/370>`_
195+
- `removed redundant PackageViewSet class code and added history field into package API nexB#389 nexB#221 <https://github.com/aboutcode-org/purldb/pull/390>`_
196+
- `alpine url bug fix and AGPL License version issue <https://github.com/aboutcode-org/scancode-toolkit/pull/3744>`_
197+
198+
These contributions were essential in building a solid foundation for the integration of
199+
the ScoreCode repository during GSoC.
200+
201+
Post GSoC
202+
---------
203+
204+
After GSoC, the goal is to merge the pull requests into their respective repositories,
205+
enabling users to leverage the OpenSSF Scorecard integration for enhanced vulnerability
206+
analysis in Scancode.io. Future work includes extending this integration to other
207+
platforms like PurlDB.
208+
209+
--------------------------------------------------------------------------------
210+
211+
Acknowledgements
212+
----------------
213+
214+
This project wouldn't have been possible without the incredible support and mentorship
215+
of an outstanding team:
216+
217+
- `Philippe Ombredanne <https://github.com/pombredanne>`_
218+
- `Ayan Sinha Mahapatra <https://github.com/AyanSinhaMahapatra>`_
219+
- `Thomas Druez <https://github.com/thomasdruez>`_
220+
- `Jonathan Yang <https://github.com/JonoYang>`_
221+
- `Tushar Goel <https://github.com/tushar-goel>`_
222+
223+
The weekly status calls were more than just updates; they were a source of inspiration,
224+
ideas, and camaraderie. And the 1:1 calls with `Ayan Sinha Mahapatra`_ and
225+
`Philippe Ombredanne`_ were like mini-masterclasses in software development.
226+
227+
To my mentors: Thank you for not just teaching me the ropes but for showing me how to
228+
swing from them! This journey was as much about learning as it was about having fun,
229+
and I couldn't have asked for a better crew to sail with.

0 commit comments

Comments
 (0)