|
| 1 | +================================================== |
| 2 | +Enrich SBOM data based on OSSF Security Score Card |
| 3 | +================================================== |
| 4 | + |
| 5 | + |
| 6 | +**Organization:** `AboutCode <https://aboutcode.org>`_ |
| 7 | + |
| 8 | +**Projects:** |
| 9 | + |
| 10 | +- `Scancode.io <https://github.com/aboutcode-org/scancode.io>`_ |
| 11 | +- `Scorecode <https://github.com/aboutcode-org/scorecode>`_ |
| 12 | + |
| 13 | +**Mentee:** `Pranay Das (404-geek) <https://github.com/404-geek>`_ |
| 14 | + |
| 15 | +**Mentors:** |
| 16 | + |
| 17 | +- `Philippe Ombredanne <https://github.com/pombredanne>`_ |
| 18 | +- `Ayan Sinha Mahapatra <https://github.com/AyanSinhaMahapatra>`_ |
| 19 | +- `Thomas Druez <https://github.com/thomasdruez>`_ |
| 20 | +- `Jonathan Yang <https://github.com/JonoYang>`_ |
| 21 | +- `Tushar Goel <https://github.com/tushar-goel>`_ |
| 22 | + |
| 23 | + |
| 24 | +-------------------------------------------------------------------------------- |
| 25 | + |
| 26 | +Overview |
| 27 | +-------- |
| 28 | + |
| 29 | +The primary objective of this project was to fetch and integrate the OpenSSF Scorecard |
| 30 | +data into the Scancode.io platform for all detected packages, thereby enhancing its |
| 31 | +capabilities for security and community health metrics analysis. The project |
| 32 | +involved work on two key repositories: `Scorecode`,which was developed as a PyPI |
| 33 | +package, and `Scancode.io`, where the integration with Scorecard data was implemented |
| 34 | +within scanning pipelines. |
| 35 | + |
| 36 | +**Scorecode** |
| 37 | + |
| 38 | +`scorecode package <https://pypi.org/project/scorecode/>`_ serves as a PyPI package |
| 39 | +that has the functions to fetch and store OpenSSF Scorecard data using the OpenSSF |
| 40 | +public API (`https://api.securityscorecards.dev/ <https://api.securityscorecards.dev/>`_). |
| 41 | +It also includes Django mixin models that can be extended and integrated |
| 42 | +into other platforms with databases, such as Scancode.io and PurlDB, ensuring seamless |
| 43 | +utilization of Scorecard data across various projects. |
| 44 | + |
| 45 | + |
| 46 | + |
| 47 | +**Scancode.io** |
| 48 | + |
| 49 | +In the `Scancode.io` project, I added a pipeline that interacts with the `scorecode` |
| 50 | +package to fetch and store Scorecard data in the Scancode.io database. The data can then |
| 51 | +be exported into various outputs like the Software Bill of Materials (SBOM) |
| 52 | +CycloneDX format (and SPDX too in the future), providing insights into security and |
| 53 | +community health |
| 54 | + |
| 55 | +-------------------------------------------------------------------------------- |
| 56 | + |
| 57 | +Implementation |
| 58 | +-------------- |
| 59 | + |
| 60 | +**1. Scorecode Repository:** |
| 61 | + |
| 62 | + - Developed a PyPI package to interact with the OpenSSF API and fetch Scorecard data |
| 63 | + to be used in other software packages and store it in appropriate objects. |
| 64 | + - Created Django mixin models to enable easy extension and integration of Scorecard |
| 65 | + data into platforms with databases like Scancode.io. |
| 66 | + |
| 67 | +For more information, you can visit the |
| 68 | +`scorecode package on PyPI <https://pypi.org/project/scorecode/>`_. |
| 69 | + |
| 70 | + |
| 71 | +**2. Scancode.io Integration:** |
| 72 | + |
| 73 | + - Developed a pipeline within Scancode.io to call `Scorecode` functions, retrieve |
| 74 | + Scorecard data, and save it in the Scancode.io database. |
| 75 | + - Enhanced the existing SBOM export functionality to include Scorecard data, allowing |
| 76 | + for detailed security posture and community health metrics analysis in CycloneDX |
| 77 | + format. |
| 78 | + |
| 79 | +**4. Testing:** |
| 80 | + |
| 81 | + - Conducted comprehensive testing across two primary repositories hosted on GitHub |
| 82 | + and GitLab to ensure accurate fetching, storage, and export of Scorecard data: |
| 83 | + |
| 84 | + - **GitHub**: |
| 85 | + |
| 86 | + - `nexB/scancode-toolkit <https://github.com/nexB/scancode-toolkit>`_ |
| 87 | + - `tensorflow/tensorflow <https://github.com/tensorflow/tensorflow>`_ |
| 88 | + - `apache/spark <https://github.com/apache/spark>`_ |
| 89 | + |
| 90 | + - **GitLab**: `gitlab-org/gitlab <https://gitlab.com/gitlab-org/gitlab>`_ |
| 91 | + |
| 92 | + - Verified seamless integration and accurate data retrieval across different package |
| 93 | + ecosystems supported by Scancode.io, ensuring that the Scorecard data aligns with |
| 94 | + the expected structure and content. |
| 95 | + |
| 96 | + - Implemented and executed automated test cases using `pytest`, which include: |
| 97 | + |
| 98 | + - Validation of key fields such as ``scoring_tool``, ``scoring_tool_version``, |
| 99 | + ``score_date``, ``score``, ``scoring_tool_documentation_url``, and ``checks``. |
| 100 | + |
| 101 | + - Type checks for each field to ensure data integrity. |
| 102 | + |
| 103 | + - URL validation to confirm that the documentation links are correctly formatted |
| 104 | + and point to the expected resources. |
| 105 | + |
| 106 | + - Added additional test cases for edge scenarios such as non-existent repositories, |
| 107 | + private repositories, and invalid input formats to ensure robustness and |
| 108 | + reliability. |
| 109 | + |
| 110 | + |
| 111 | +-------------------------------------------------------------------------------- |
| 112 | + |
| 113 | +Linked Pull Requests |
| 114 | +-------------------- |
| 115 | + |
| 116 | +.. list-table:: |
| 117 | + :widths: 10 40 20 30 |
| 118 | + :header-rows: 1 |
| 119 | + |
| 120 | + * - Sr. no |
| 121 | + - Name |
| 122 | + - Link |
| 123 | + - Status |
| 124 | + * - 1 |
| 125 | + - Scorecard Integration |
| 126 | + - `aboutcode.org/scancode.io#1294 <https://github.com/aboutcode-org/scancode.io/pull/1294>`_ |
| 127 | + - Open |
| 128 | + * - 2 |
| 129 | + - Models integration |
| 130 | + - `aboutcode.org/scorecode#5 <https://github.com/aboutcode-org/scorecode/pull/5>`_ |
| 131 | + - Merged |
| 132 | + * - 3 |
| 133 | + - Scorcard api call integration |
| 134 | + - `aboutcode.org/scorecode#1 <https://github.com/aboutcode-org/scorecode/pull/1>`_ |
| 135 | + - Merged |
| 136 | + * - 4 |
| 137 | + - Mixin models for storing scorecard data |
| 138 | + - `aboutcode.org/scorecode#4 <https://github.com/aboutcode-org/scorecode/pull/4>`_ |
| 139 | + - Merged |
| 140 | + |
| 141 | + |
| 142 | + |
| 143 | +Related Issues |
| 144 | +-------------- |
| 145 | + |
| 146 | +.. list-table:: |
| 147 | + :widths: 10 60 30 |
| 148 | + :header-rows: 1 |
| 149 | + |
| 150 | + * - Sr. no |
| 151 | + - Name |
| 152 | + - Link |
| 153 | + * - 1 |
| 154 | + - Store OSSF scorecard data in scancode.io models |
| 155 | + - `aboutcode-org/scancode.io#1283 <https://github.com/aboutcode-org/scancode.io/issues/1283>`_ |
| 156 | + * - 2 |
| 157 | + - Show OSSF scorecard data in the UI as quality data |
| 158 | + - `aboutcode-org/scancode.io#1284 <https://github.com/aboutcode-org/scancode.io/issues/1284>`_ |
| 159 | + * - 3 |
| 160 | + - Export OSSF scorecard data in SBOMs |
| 161 | + - `aboutcode-org/scancode.io#1285 <https://github.com/aboutcode-org/scancode.io/issues/1285>`_ |
| 162 | + * - 4 |
| 163 | + - Compute summary and clarity for EACH package in a codebase |
| 164 | + - `aboutcode-org/scorecode#3 <https://github.com/aboutcode-org/scorecode/issues/3>`_ |
| 165 | + * - 5 |
| 166 | + - Provide data values in scan results to correspond with license_clarity_score |
| 167 | + elements |
| 168 | + - `aboutcode-org/scorecode#2 <https://github.com/aboutcode-org/scorecode/issues/2>`_ |
| 169 | + |
| 170 | + |
| 171 | +Project Reference Links |
| 172 | +----------------------- |
| 173 | + |
| 174 | +* `Project Idea <https://github.com/aboutcode-org/aboutcode/wiki/GSOC-2024-Project-Ideas#purldbscancodeio-enrich-an-sbom-based-on-ossf-security-score-card>`_ |
| 175 | + |
| 176 | +* `Official GSoC project page <https://summerofcode.withgoogle.com/programs/2024/projects/kB8HkEli>`_ |
| 177 | + |
| 178 | +* `GSoC Proposal <https://docs.google.com/document/d/10EiGjTGR_eZExMjcxEmwmMQPt7B9i6lHc_osW4Ogm6c/edit?usp=sharing>`_ |
| 179 | + |
| 180 | +* `Project Board <https://github.com/orgs/nexB/projects/60/views/6>`_ |
| 181 | + |
| 182 | + |
| 183 | +Pre GSOC Work |
| 184 | +----------------------- |
| 185 | + |
| 186 | +Before GSoC officially started, I had the opportunity to contribute to the |
| 187 | +`ScanCode.io <https://github.com/aboutcode-org/scancode.io>`_ and |
| 188 | +`purldb.io <https://github.com/aboutcode-org/purldb>`_ project. During this |
| 189 | +period, I focused on enhancing various functionalities and laying the groundwork for |
| 190 | +the upcoming integration of the OpenSSF Scorecard. Below is a list of key pull requests |
| 191 | +I made: |
| 192 | + |
| 193 | +- `Add endpoint to create or update a package set <https://github.com/aboutcode-org/purldb/pull/350>`_ |
| 194 | +- `Fixes Github Mapper route <https://github.com/aboutcode-org/purldb/pull/370>`_ |
| 195 | +- `removed redundant PackageViewSet class code and added history field into package API nexB#389 nexB#221 <https://github.com/aboutcode-org/purldb/pull/390>`_ |
| 196 | +- `alpine url bug fix and AGPL License version issue <https://github.com/aboutcode-org/scancode-toolkit/pull/3744>`_ |
| 197 | + |
| 198 | +These contributions were essential in building a solid foundation for the integration of |
| 199 | +the ScoreCode repository during GSoC. |
| 200 | + |
| 201 | +Post GSoC |
| 202 | +--------- |
| 203 | + |
| 204 | +After GSoC, the goal is to merge the pull requests into their respective repositories, |
| 205 | +enabling users to leverage the OpenSSF Scorecard integration for enhanced vulnerability |
| 206 | +analysis in Scancode.io. Future work includes extending this integration to other |
| 207 | +platforms like PurlDB. |
| 208 | + |
| 209 | +-------------------------------------------------------------------------------- |
| 210 | + |
| 211 | +Acknowledgements |
| 212 | +---------------- |
| 213 | + |
| 214 | +This project wouldn't have been possible without the incredible support and mentorship |
| 215 | +of an outstanding team: |
| 216 | + |
| 217 | +- `Philippe Ombredanne <https://github.com/pombredanne>`_ |
| 218 | +- `Ayan Sinha Mahapatra <https://github.com/AyanSinhaMahapatra>`_ |
| 219 | +- `Thomas Druez <https://github.com/thomasdruez>`_ |
| 220 | +- `Jonathan Yang <https://github.com/JonoYang>`_ |
| 221 | +- `Tushar Goel <https://github.com/tushar-goel>`_ |
| 222 | + |
| 223 | +The weekly status calls were more than just updates; they were a source of inspiration, |
| 224 | +ideas, and camaraderie. And the 1:1 calls with `Ayan Sinha Mahapatra`_ and |
| 225 | +`Philippe Ombredanne`_ were like mini-masterclasses in software development. |
| 226 | + |
| 227 | +To my mentors: Thank you for not just teaching me the ropes but for showing me how to |
| 228 | +swing from them! This journey was as much about learning as it was about having fun, |
| 229 | +and I couldn't have asked for a better crew to sail with. |
0 commit comments