From 907308c71bf2e7675482ee526e63654c14ccabab Mon Sep 17 00:00:00 2001 From: Arthit Suriyawongkul Date: Tue, 8 Oct 2024 18:43:46 +0700 Subject: [PATCH 1/7] Fix Markdown warnings - Remove trailing whitespaces - Ensure one blank line before and after headings/lists - Enforce one H1 heading in a document, the rest will be H2, H3, etc. - Standardize the use of bullet markup (using all `-`, instead of mixing `-` and `*`) Signed-off-by: Arthit Suriyawongkul --- CONTRIBUTING.md | 31 +++++--- DOCUMENTATION.md | 22 +++++- README.md | 197 +++++++++++++++++++++++++---------------------- 3 files changed, 143 insertions(+), 107 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 4c4a04341..88638e70e 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -15,7 +15,7 @@ intention prior to creating a patch. ## Development process -We use the GitHub flow that is described here: https://guides.github.com/introduction/flow/ +We use the GitHub flow that is described here: Here's the process to make changes to the codebase: @@ -30,15 +30,19 @@ Here's the process to make changes to the codebase: and optionally follow the further steps described to sync your fork and the original repository. 4. Create a new branch in your fork and set up environment: + ```sh git checkout -b fix-or-improve-something python -m venv ./venv ./venv/bin/activate pip install -e ".[development]" ``` - Note: By using the group `[development]` for the installation, all dependencies (including optional ones) will be - installed. This way we make sure that all tests are executed. + + Note: By using the group `[development]` for the installation, all dependencies (including optional ones) will be + installed. This way we make sure that all tests are executed. + 5. Make some changes and commit them to the branch: + ```sh git commit --signoff -m 'description of my changes' ``` @@ -49,27 +53,33 @@ Here's the process to make changes to the codebase: of [the Developer Certificate of Origin](https://developercertificate.org/). Git has utilities for signing off on commits: `git commit -s` or `--signoff` signs a current commit, and `git rebase --signoff ` retroactively signs a range of past commits. + 6. Test your changes: + ```sh pytest -vvs # in the repo root ``` -7. Check your code style. When opening a pull request, your changes will automatically be checked with `isort`, `black` - and `flake8` to make sure your changes fit with the rest of the code style. +7. Check your code style. When opening a pull request, your changes will automatically be checked with `isort`, `black` + and `flake8` to make sure your changes fit with the rest of the code style. + ```sh # run the following commands in the repo root - isort src tests + isort src tests black src tests - flake8 src tests + flake8 src tests ``` - `black` and `isort` will automatically format the code and sort the imports. The configuration for these linters + + `black` and `isort` will automatically format the code and sort the imports. The configuration for these linters can be found in the `pyproject.toml`. `flake8` lists all problems found which then need to be resolved manually. The configuration for the linter can be found in the `.flake8` file. 8. Push the branch to your fork on GitHub: + ```sh git push origin fix-or-improve-something ``` + 9. Make a pull request on GitHub. 10. Continue making more changes and commits on the branch, with `git commit --signoff` and `git push`. 11. When done, write a comment on the PR asking for a code review. @@ -77,6 +87,7 @@ Here's the process to make changes to the codebase: possible, or with `squash`. 13. The temporary branch on GitHub should be deleted (there is a button for deleting it). 14. Delete the local branch as well: + ```sh git checkout master git pull -p @@ -84,11 +95,11 @@ Here's the process to make changes to the codebase: git branch -d fix-or-improve-something ``` -# How to run tests +## How to run tests The tests framework is using pytest: -``` +```sh pip install pytest pytest -vvs ``` diff --git a/DOCUMENTATION.md b/DOCUMENTATION.md index f8c7e274b..5119dd467 100644 --- a/DOCUMENTATION.md +++ b/DOCUMENTATION.md @@ -1,12 +1,15 @@ # Code architecture documentation ## Package Overview + Beneath the top-level package `spdx_tools` you will find three sub-packages: + - `spdx`, which contains the code to create, parse, write and validate SPDX documents of versions 2.2 and 2.3 - `spdx3`, which will contain the same feature set for versions 3.x once they are released - `common`, which contains code that is shared between the different versions, such as type-checking and `spdx_licensing`. ## `spdx` + The `spdx` package contains the code dealing with SPDX-2 documents. The subpackages serve the purpose to divide the code into logically independent chunks. Shared code can be found in the top-level modules here. `model`, `parser`, `validation` and `writer` constitute the four main components of this library and are further described below. @@ -14,9 +17,11 @@ The subpackages serve the purpose to divide the code into logically independent `jsonschema` and `rdfschema` contain code specific to the corresponding serialization format. ### `model` + The internal data model closely follows the [official SPDX-2.3 specification](https://spdx.github.io/spdx-spec/v2.3/). Entrypoint to the model is the `Document` class, which has the following attributes: + - `creation_info`: a single instance of the `CreationInfo` class - `packages`: a list of `Package` objects - `files`: a list of `File` objects @@ -35,6 +40,7 @@ A custom extension of the `@dataclass` annotation is used that is called `@datac Apart from all the usual `dataclass` functionality, this implements fields of a class as properties with their own getter and setter methods. This is used in particular to implement type checking when properties are set. Source of truth for these checks are the attribute definitions at the start of the respective class that must specify the correct type hint. + The `beartype` library is used to check type conformity (`typeguard` was used in the past but has been replaced since due to performance issues). In case of a type mismatch a `TypeError` is raised. To ensure that all possible type errors are found during the construction of an object, a custom `__init__()` that calls `check_types_and_set_values()` is part of every class. @@ -43,26 +49,31 @@ This function tries to set all values provided by the constructor and collects a For the SPDX values `NONE` and `NOASSERTION` the classes `SpdxNone` and `SpdxNoAssertion` are used, respectively. Both can be instantiated without any arguments. ### `parser` + The parsing and writing modules are split into subpackages according to the serialization formats: `json`, `yaml`, `xml`, `tagvalue` and `rdf`. As the first three share the same tree structure that can be parsed into a dictionary, their shared logic is contained in the `jsonlikedict` package. One overarching concept of all parsers is the goal of dealing with parsing errors (like faulty types or missing mandatory fields) as long as possible before failing. Thus, the `SPDXParsingError` that is finally raised collects as much information as possible about all parsing errors that occurred. #### `tagvalue` + Since Tag-Value is an SPDX-specific format, there exist no readily available parsers for it. -This library implements its own deserialization code using the `ply` library's `lex` module for lexing and the `yacc` module for parsing. +This library implements its own deserialization code using the `ply` library's `lex` module for lexing and the `yacc` module for parsing. #### `rdf` + The `rdflib` library is used to deserialize RDF graphs from XML format. -The graph is then being parsed and translated into the internal data model. +The graph is then being parsed and translated into the internal data model. #### `json`, `yaml`, `xml` + In a first step, all three of JSON, YAML and XML formats are deserialized into a dictionary representing their tree structure. This is achieved via the `json`, `yaml` and `xmltodict` packages, respectively. Special note has to be taken in the XML case which does not support lists and numbers. The logic concerning the translation from these dicts to the internal data model can be found in the `jsonlikedict` package. ### `writer` + For serialization purposes, only non-null fields are written out. All writers expect a valid SPDX document from the internal model as input. To ensure this is actually the case, the standard behaviour of every writer function is to call validation before the writing process. @@ -71,18 +82,21 @@ Also by default, all list properties in the model are scanned for duplicates whi This can be disabled by setting the `drop_duplicates` boolean to false. #### `tagvalue` + The ordering of the tags follows the [example in the official specification](https://github.com/spdx/spdx-spec/blob/development/v2.3.1/examples/SPDXTagExample-v2.3.spdx). #### `rdf` + The RDF graph is constructed from the internal data model and serialized to XML format afterward, using the `rdflib` library. #### `json`, `yaml`, `xml` + As all three of JSON, YAML and XML formats share the same tree structure, the first step is to generate the dictionary representing that tree. This is achieved by the `DocumentConverter` class in the `jsonschema` package. Subsequently, the dictionary is serialized using the `json`, `yaml` and `xmltodict` packages, respectively. - ### `validation` + The `validation` package takes care of all nonconformities with the SPDX specification that are not due to incorrect typing. This mainly includes checks for correctly formatted strings or the actual existence of references SPDXIDs. Entrypoint is the `document_validator` module with the `validate_full_spdx_document()` function. @@ -93,6 +107,7 @@ Validation and reference checking of SPDXIDs (and possibly external document ref For the validation of license expressions we utilise the `license-expression` library's `validate` and `parse` functions, which take care of checking license symbols against the [SPDX license list](https://spdx.org/licenses/). Invalidities are captured in instances of a custom `ValidationMessage` class. This has two attributes: + - `validation_message` is a string that describes the actual problem - `validation_context` is a `ValidationContext` object that helps to pinpoint the source of the problem by providing the faulty element's SPDXID (if it has one), the parent SPDXID (if that is known), the element's type and finally the full element itself. It is left open to the implementer which of this information to use in the following evaluation of the validation process. @@ -101,6 +116,7 @@ Every validation function returns a list of `ValidationMessage` objects, which a That is, if an empty list is returned, the document is valid. ## `spdx3` + Due to the SPDX-3 model still being in development, this package is still a work in progress. However, as the basic building blocks of parsing, writing, creation and validation are still important in the new version, the `spdx3` package is planned to be structured similarly to the `spdx` package. diff --git a/README.md b/README.md index 99112ea02..c9665394c 100644 --- a/README.md +++ b/README.md @@ -3,11 +3,9 @@ CI status (Linux, macOS and Windows): [![Install and Test][1]][2] [1]: https://github.com/spdx/tools-python/actions/workflows/install_and_test.yml/badge.svg - [2]: https://github.com/spdx/tools-python/actions/workflows/install_and_test.yml - -# Breaking changes v0.7 -> v0.8 +## Breaking changes v0.7 -> v0.8 Please be aware that the upcoming 0.8 release has undergone a significant refactoring in preparation for the upcoming SPDX v3.0 release, leading to breaking changes in the API. @@ -15,124 +13,134 @@ Please refer to the [migration guide](https://github.com/spdx/tools-python/wiki/ to update your existing code. The main features of v0.8 are: + - full validation of SPDX documents against the v2.2 and v2.3 specification - support for SPDX's RDF format with all v2.3 features -- experimental support for the upcoming SPDX v3 specification. Note, however, that support is neither complete nor - stable at this point, as the spec is still evolving. SPDX3-related code is contained in a separate subpackage "spdx3" +- experimental support for the upcoming SPDX v3 specification. Note, however, that support is neither complete nor + stable at this point, as the spec is still evolving. SPDX3-related code is contained in a separate subpackage "spdx3" and its use is optional. We do not recommend using it in production code yet. - -# Information +## Information This library implements SPDX parsers, convertors, validators and handlers in Python. -- Home: https://github.com/spdx/tools-python -- Issues: https://github.com/spdx/tools-python/issues -- PyPI: https://pypi.python.org/pypi/spdx-tools -- Browse the API: https://spdx.github.io/tools-python - -Important updates regarding this library are shared via the SPDX tech mailing list: https://lists.spdx.org/g/Spdx-tech. +- Home: +- Issues: +- PyPI: +- Browse the API: +Important updates regarding this library are shared via the SPDX tech mailing list: . -# License +## License [Apache-2.0](LICENSE) -# Features +## Features -* API to create and manipulate SPDX v2.2 and v2.3 documents -* Parse, convert, create and validate SPDX files -* supported formats: Tag/Value, RDF, JSON, YAML, XML -* visualize the structure of a SPDX document by creating an `AGraph`. Note: This is an optional feature and requires +- API to create and manipulate SPDX v2.2 and v2.3 documents +- Parse, convert, create and validate SPDX files +- supported formats: Tag/Value, RDF, JSON, YAML, XML +- visualize the structure of a SPDX document by creating an `AGraph`. Note: This is an optional feature and requires additional installation of optional dependencies ## Experimental support for SPDX 3.0 -* Create v3.0 elements and payloads -* Convert v2.2/v2.3 documents to v3.0 -* Serialize to JSON-LD -See [Quickstart to SPDX 3.0](#quickstart-to-spdx-30) below. -The implementation is based on the descriptive markdown files in the repository https://github.com/spdx/spdx-3-model (latest commit: a5372a3c145dbdfc1381fc1f791c68889aafc7ff). +- Create v3.0 elements and payloads +- Convert v2.2/v2.3 documents to v3.0 +- Serialize to JSON-LD +See [Quickstart to SPDX 3.0](#quickstart-to-spdx-30) below. +The implementation is based on the descriptive Markdown files in the repository (latest commit: a5372a3c145dbdfc1381fc1f791c68889aafc7ff). -# Installation +## Installation As always you should work in a virtualenv (venv). You can install a local clone -of this repo with `yourenv/bin/pip install .` or install it from PyPI +of this repo with `yourenv/bin/pip install .` or install it from PyPI (check for the [newest release](https://pypi.org/project/spdx-tools/#history) and install it like `yourenv/bin/pip install spdx-tools==0.8.0a2`). Note that on Windows it would be `Scripts` instead of `bin`. -# How to use +## How to use -## Command-line usage +### Command-line usage 1. **PARSING/VALIDATING** (for parsing any format): -* Use `pyspdxtools -i ` where `` is the location of the file. The input format is inferred automatically from the file ending. + - Use `pyspdxtools -i ` where `` is the location of the file. The input format is inferred automatically from the file ending. -* If you are using a source distribution, try running: - `pyspdxtools -i tests/data/SPDXJSONExample-v2.3.spdx.json` + - If you are using a source distribution, try running: + `pyspdxtools -i tests/data/SPDXJSONExample-v2.3.spdx.json` 2. **CONVERTING** (for converting one format to another): -* Use `pyspdxtools -i -o ` where `` is the location of the file to be converted - and `` is the location of the output file. The input and output formats are inferred automatically from the file endings. + - Use `pyspdxtools -i -o ` where `` is the location of the file to be converted + and `` is the location of the output file. The input and output formats are inferred automatically from the file endings. -* If you are using a source distribution, try running: - `pyspdxtools -i tests/data/SPDXJSONExample-v2.3.spdx.json -o output.tag` + - If you are using a source distribution, try running: + `pyspdxtools -i tests/data/SPDXJSONExample-v2.3.spdx.json -o output.tag` -* If you want to skip the validation process, provide the `--novalidation` flag, like so: - `pyspdxtools -i tests/data/SPDXJSONExample-v2.3.spdx.json -o output.tag --novalidation` + - If you want to skip the validation process, provide the `--novalidation` flag, like so: + `pyspdxtools -i tests/data/SPDXJSONExample-v2.3.spdx.json -o output.tag --novalidation` (use this with caution: note that undetected invalid documents may lead to unexpected behavior of the tool) - -* For help use `pyspdxtools --help` + + - For help use `pyspdxtools --help` 3. **GRAPH GENERATION** (optional feature) -* This feature generates a graph representing all elements in the SPDX document and their connections based on the provided - relationships. The graph can be rendered to a picture. Below is an example for the file `tests/data/SPDXJSONExample-v2.3.spdx.json`: -![SPDXJSONExample-v2.3.spdx.png](assets/SPDXJSONExample-v2.3.spdx.png) -* Make sure you install the optional dependencies `networkx` and `pygraphviz`. To do so run `pip install ".[graph_generation]"`. -* Use `pyspdxtools -i --graph -o ` where `` is an output file name with valid format for `pygraphviz` (check - the documentation [here](https://pygraphviz.github.io/documentation/stable/reference/agraph.html#pygraphviz.AGraph.draw)). -* If you are using a source distribution, try running - `pyspdxtools -i tests/data/SPDXJSONExample-v2.3.spdx.json --graph -o SPDXJSONExample-v2.3.spdx.png` to generate - a png with an overview of the structure of the example file. - -## Library usage + - This feature generates a graph representing all elements in the SPDX document and their connections based on the provided + relationships. The graph can be rendered to a picture. Below is an example for the file `tests/data/SPDXJSONExample-v2.3.spdx.json`: + ![SPDXJSONExample-v2.3.spdx.png](assets/SPDXJSONExample-v2.3.spdx.png) + + - Make sure you install the optional dependencies `networkx` and `pygraphviz`. To do so run `pip install ".[graph_generation]"`. + - Use `pyspdxtools -i --graph -o ` where `` is an output file name with valid format for `pygraphviz` (check + the documentation [here](https://pygraphviz.github.io/documentation/stable/reference/agraph.html#pygraphviz.AGraph.draw)). + - If you are using a source distribution, try running + `pyspdxtools -i tests/data/SPDXJSONExample-v2.3.spdx.json --graph -o SPDXJSONExample-v2.3.spdx.png` to generate + a png with an overview of the structure of the example file. + +### Library usage + 1. **DATA MODEL** - * The `spdx_tools.spdx.model` package constitutes the internal SPDX v2.3 data model (v2.2 is simply a subset of this). All relevant classes for SPDX document creation are exposed in the `__init__.py` found [here](src%2Fspdx_tools%2Fspdx%2Fmodel%2F__init__.py). - * SPDX objects are implemented via `@dataclass_with_properties`, a custom extension of `@dataclass`. - * Each class starts with a list of its properties and their possible types. When no default value is provided, the property is mandatory and must be set during initialization. - * Using the type hints, type checking is enforced when initializing a new instance or setting/getting a property on an instance + + - The `spdx_tools.spdx.model` package constitutes the internal SPDX v2.3 data model (v2.2 is simply a subset of this). All relevant classes for SPDX document creation are exposed in the `__init__.py` found [here](src%2Fspdx_tools%2Fspdx%2Fmodel%2F__init__.py). + - SPDX objects are implemented via `@dataclass_with_properties`, a custom extension of `@dataclass`. + - Each class starts with a list of its properties and their possible types. When no default value is provided, the property is mandatory and must be set during initialization. + - Using the type hints, type checking is enforced when initializing a new instance or setting/getting a property on an instance (wrong types will raise `ConstructorTypeError` or `TypeError`, respectively). This makes it easy to catch invalid properties early and only construct valid documents. - * Note: in-place manipulations like `list.append(item)` will circumvent the type checking (a `TypeError` will still be raised when reading `list` again). We recommend using `list = list + [item]` instead. - * The main entry point of an SPDX document is the `Document` class from the [document.py](src%2Fspdx_tools%2Fspdx%2Fmodel%2Fdocument.py) module, which links to all other classes. - * For license handling, the [license_expression](https://github.com/nexB/license-expression) library is used. - * Note on `documentDescribes` and `hasFiles`: These fields will be converted to relationships in the internal data model. As they are deprecated, these fields will not be written in the output. + - Note: in-place manipulations like `list.append(item)` will circumvent the type checking (a `TypeError` will still be raised when reading `list` again). We recommend using `list = list + [item]` instead. + - The main entry point of an SPDX document is the `Document` class from the [document.py](src%2Fspdx_tools%2Fspdx%2Fmodel%2Fdocument.py) module, which links to all other classes. + - For license handling, the [license_expression](https://github.com/nexB/license-expression) library is used. + - Note on `documentDescribes` and `hasFiles`: These fields will be converted to relationships in the internal data model. As they are deprecated, these fields will not be written in the output. + 2. **PARSING** - * Use `parse_file(file_name)` from the `parse_anything.py` module to parse an arbitrary file with one of the supported file endings. - * Successful parsing will return a `Document` instance. Unsuccessful parsing will raise `SPDXParsingError` with a list of all encountered problems. + + - Use `parse_file(file_name)` from the `parse_anything.py` module to parse an arbitrary file with one of the supported file endings. + - Successful parsing will return a `Document` instance. Unsuccessful parsing will raise `SPDXParsingError` with a list of all encountered problems. + 3. **VALIDATING** - * Use `validate_full_spdx_document(document)` to validate an instance of the `Document` class. - * This will return a list of `ValidationMessage` objects, each consisting of a String describing the invalidity and a `ValidationContext` to pinpoint the source of the validation error. - * Validation depends on the SPDX version of the document. Note that only versions `SPDX-2.2` and `SPDX-2.3` are supported by this tool. + + - Use `validate_full_spdx_document(document)` to validate an instance of the `Document` class. + - This will return a list of `ValidationMessage` objects, each consisting of a String describing the invalidity and a `ValidationContext` to pinpoint the source of the validation error. + - Validation depends on the SPDX version of the document. Note that only versions `SPDX-2.2` and `SPDX-2.3` are supported by this tool. + 4. **WRITING** - * Use `write_file(document, file_name)` from the `write_anything.py` module to write a `Document` instance to the specified file. + + - Use `write_file(document, file_name)` from the `write_anything.py` module to write a `Document` instance to the specified file. The serialization format is determined from the filename ending. - * Validation is performed per default prior to the writing process, which is cancelled if the document is invalid. You can skip the validation via `write_file(document, file_name, validate=False)`. + - Validation is performed per default prior to the writing process, which is cancelled if the document is invalid. You can skip the validation via `write_file(document, file_name, validate=False)`. Caution: Only valid documents can be serialized reliably; serialization of invalid documents is not supported. -## Example +### Example + Here are some examples of possible use cases to quickly get you started with the spdx-tools. If you want more examples, like how to create an SPDX document from scratch, have a look [at the examples folder](examples). + ```python import logging from license_expression import get_spdx_licensing -from spdx_tools.spdx.model import (Checksum, ChecksumAlgorithm, File, +from spdx_tools.spdx.model import (Checksum, ChecksumAlgorithm, File, FileType, Relationship, RelationshipType) from spdx_tools.spdx.parser.parse_anything import parse_file from spdx_tools.spdx.validation.document_validator import validate_full_spdx_document @@ -147,14 +155,14 @@ document.creation_info.name = "new document name" # define a file and a DESCRIBES relationship between the file and the document checksum = Checksum(ChecksumAlgorithm.SHA1, "71c4025dd9897b364f3ebbb42c484ff43d00791c") -file = File(name="./fileName.py", spdx_id="SPDXRef-File", checksums=[checksum], - file_types=[FileType.TEXT], +file = File(name="./fileName.py", spdx_id="SPDXRef-File", checksums=[checksum], + file_types=[FileType.TEXT], license_concluded=get_spdx_licensing().parse("MIT and GPL-2.0"), license_comment="licenseComment", copyright_text="copyrightText") relationship = Relationship("SPDXRef-DOCUMENT", RelationshipType.DESCRIBES, "SPDXRef-File") -# add the file and the relationship to the document +# add the file and the relationship to the document # (note that we do not use "document.files.append(file)" as that would circumvent the type checking) document.files = document.files + [file] document.relationships = document.relationships + [relationship] @@ -165,51 +173,52 @@ validation_messages = validate_full_spdx_document(document) for validation_message in validation_messages: logging.warning(validation_message.validation_message) -# if there are no validation messages, the document is valid +# if there are no validation messages, the document is valid # and we can safely serialize it without validating again if not validation_messages: write_file(document, "new_spdx_document.rdf", validate=False) ``` -# Quickstart to SPDX 3.0 +## Quickstart to SPDX 3.0 + In contrast to SPDX v2, all elements are now subclasses of the central `Element` class. -This includes packages, files, snippets, relationships, annotations, but also SBOMs, SpdxDocuments, and more. +This includes packages, files, snippets, relationships, annotations, but also SBOMs, SpdxDocuments, and more. For serialization purposes, all Elements that are to be serialized into the same file are collected in a `Payload`. This is just a dictionary that maps each Element's SpdxId to itself. Use the `write_payload()` functions to serialize a payload. -There currently are two options: -* The `spdx_tools.spdx3.writer.json_ld.json_ld_writer` module generates a JSON-LD file of the payload. -* The `spdx_tools.spdx3.writer.console.payload_writer` module prints a debug output to console. Note that this is not an official part of the SPDX specification and will probably be dropped as soon as a better standard emerges. +There currently are two options: + +- The `spdx_tools.spdx3.writer.json_ld.json_ld_writer` module generates a JSON-LD file of the payload. +- The `spdx_tools.spdx3.writer.console.payload_writer` module prints a debug output to console. Note that this is not an official part of the SPDX specification and will probably be dropped as soon as a better standard emerges. You can convert an SPDX v2 document to v3 via the `spdx_tools.spdx3.bump_from_spdx2.spdx_document` module. The `bump_spdx_document()` function will return a payload containing an `SpdxDocument` Element and one Element for each package, file, snippet, relationship, or annotation contained in the v2 document. +## Dependencies -# Dependencies - -* PyYAML: https://pypi.org/project/PyYAML/ for handling YAML. -* xmltodict: https://pypi.org/project/xmltodict/ for handling XML. -* rdflib: https://pypi.python.org/pypi/rdflib/ for handling RDF. -* ply: https://pypi.org/project/ply/ for handling tag-value. -* click: https://pypi.org/project/click/ for creating the CLI interface. -* beartype: https://pypi.org/project/beartype/ for type checking. -* uritools: https://pypi.org/project/uritools/ for validation of URIs. -* license-expression: https://pypi.org/project/license-expression/ for handling SPDX license expressions. +- PyYAML: for handling YAML. +- xmltodict: for handling XML. +- rdflib: for handling RDF. +- ply: for handling tag-value. +- click: for creating the CLI interface. +- beartype: for type checking. +- uritools: for validation of URIs. +- license-expression: for handling SPDX license expressions. -# Support +## Support -* Submit issues, questions or feedback at https://github.com/spdx/tools-python/issues -* Join the chat at https://gitter.im/spdx-org/Lobby -* Join the discussion on https://lists.spdx.org/g/spdx-tech and - https://spdx.dev/participate/tech/ +- Submit issues, questions or feedback at +- Join the chat at +- Join the discussion on and + -# Contributing +## Contributing Contributions are very welcome! See [CONTRIBUTING.md](./CONTRIBUTING.md) for instructions on how to contribute to the codebase. -# History +## History This is the result of an initial GSoC contribution by @[ah450](https://github.com/ah450) -(or https://github.com/a-h-i) and is maintained by a community of SPDX adopters and enthusiasts. +(or ) and is maintained by a community of SPDX adopters and enthusiasts. In order to prepare for the release of SPDX v3.0, the repository has undergone a major refactoring during the time from 11/2022 to 07/2023. From e7a342433b851b1866b803f87c80c0a2c441600e Mon Sep 17 00:00:00 2001 From: Arthit Suriyawongkul Date: Tue, 8 Oct 2024 19:44:31 +0700 Subject: [PATCH 2/7] Fix process.md Signed-off-by: Arthit Suriyawongkul --- .../spdx3/writer/json_ld/process.md | 30 ++++++++++--------- 1 file changed, 16 insertions(+), 14 deletions(-) diff --git a/src/spdx_tools/spdx3/writer/json_ld/process.md b/src/spdx_tools/spdx3/writer/json_ld/process.md index 7d04d5ccd..1968e6e48 100644 --- a/src/spdx_tools/spdx3/writer/json_ld/process.md +++ b/src/spdx_tools/spdx3/writer/json_ld/process.md @@ -1,27 +1,29 @@ -### Workflow +# Workflow Process to produce context file and a serialization example: -1. Run -``` -spec-parser --gen-md --gen-refs --gen-rdf ../spdx-3-model/model -``` -- spdx-3-model (commit: 6cb4316, last commit where spec-parser is able to run)
-- spec-parser (main with commits from PR 44, 45) +1. Run -2. Convert the generated `spec-parser/md_generated/model.ttl` to a json-ld file using https://frogcat.github.io/ttl2jsonld/demo/. + ```sh + spec-parser --gen-md --gen-refs --gen-rdf ../spdx-3-model/model + ``` + + - spdx-3-model (commit: 6cb4316, last commit where spec-parser is able to run) + - spec-parser (main with commits from PR 44, 45) + +2. Convert the generated `spec-parser/md_generated/model.ttl` to a json-ld file using . 3. Convert owl to context using `convert_spdx_owl_to_jsonld_context("SPDX_OWL.json")`. 4. Place the generated `context.json` in `spdx_tools/spdx3/writer/jsonld/`. 5. To generate the jsonld from the testfile run -``` -pyspdxtools3 -i ./tests/spdx/data/SPDXJSONExample-v2.3.spdx.json -o example_with_context -``` + ```sh + pyspdxtools3 -i ./tests/spdx/data/SPDXJSONExample-v2.3.spdx.json -o example_with_context + ``` -### Manually +## Manually +## Known limitations -### Known limitations - Validation of enums does not work - Additional keys seem to be ignored in validation -- inherited properties aren't validated +- Inherited properties aren't validated From 56a7df87ab35feb5f8ce7b747d27b97713275040 Mon Sep 17 00:00:00 2001 From: Arthit Suriyawongkul Date: Tue, 8 Oct 2024 21:47:00 +0700 Subject: [PATCH 3/7] Fix tests/spdx/data path Signed-off-by: Arthit Suriyawongkul --- README.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index c9665394c..54fbe21d0 100644 --- a/README.md +++ b/README.md @@ -57,7 +57,7 @@ The implementation is based on the descriptive Markdown files in the repository As always you should work in a virtualenv (venv). You can install a local clone of this repo with `yourenv/bin/pip install .` or install it from PyPI (check for the [newest release](https://pypi.org/project/spdx-tools/#history) and install it like -`yourenv/bin/pip install spdx-tools==0.8.0a2`). Note that on Windows it would be `Scripts` +`yourenv/bin/pip install spdx-tools==0.8.3`). Note that on Windows it would be `Scripts` instead of `bin`. ## How to use @@ -69,7 +69,7 @@ instead of `bin`. - Use `pyspdxtools -i ` where `` is the location of the file. The input format is inferred automatically from the file ending. - If you are using a source distribution, try running: - `pyspdxtools -i tests/data/SPDXJSONExample-v2.3.spdx.json` + `pyspdxtools -i tests/spdx/data/SPDXJSONExample-v2.3.spdx.json` 2. **CONVERTING** (for converting one format to another): @@ -77,10 +77,10 @@ instead of `bin`. and `` is the location of the output file. The input and output formats are inferred automatically from the file endings. - If you are using a source distribution, try running: - `pyspdxtools -i tests/data/SPDXJSONExample-v2.3.spdx.json -o output.tag` + `pyspdxtools -i tests/spdx/data/SPDXJSONExample-v2.3.spdx.json -o output.tag` - If you want to skip the validation process, provide the `--novalidation` flag, like so: - `pyspdxtools -i tests/data/SPDXJSONExample-v2.3.spdx.json -o output.tag --novalidation` + `pyspdxtools -i tests/spdx/data/SPDXJSONExample-v2.3.spdx.json -o output.tag --novalidation` (use this with caution: note that undetected invalid documents may lead to unexpected behavior of the tool) - For help use `pyspdxtools --help` @@ -88,14 +88,14 @@ instead of `bin`. 3. **GRAPH GENERATION** (optional feature) - This feature generates a graph representing all elements in the SPDX document and their connections based on the provided - relationships. The graph can be rendered to a picture. Below is an example for the file `tests/data/SPDXJSONExample-v2.3.spdx.json`: + relationships. The graph can be rendered to a picture. Below is an example for the file `tests/spdx/data/SPDXJSONExample-v2.3.spdx.json`: ![SPDXJSONExample-v2.3.spdx.png](assets/SPDXJSONExample-v2.3.spdx.png) - Make sure you install the optional dependencies `networkx` and `pygraphviz`. To do so run `pip install ".[graph_generation]"`. - Use `pyspdxtools -i --graph -o ` where `` is an output file name with valid format for `pygraphviz` (check the documentation [here](https://pygraphviz.github.io/documentation/stable/reference/agraph.html#pygraphviz.AGraph.draw)). - If you are using a source distribution, try running - `pyspdxtools -i tests/data/SPDXJSONExample-v2.3.spdx.json --graph -o SPDXJSONExample-v2.3.spdx.png` to generate + `pyspdxtools -i tests/spdx/data/SPDXJSONExample-v2.3.spdx.json --graph -o SPDXJSONExample-v2.3.spdx.png` to generate a png with an overview of the structure of the example file. ### Library usage From 6078b56a9bbc640b4b7a6d5c433daf0f339d7d3e Mon Sep 17 00:00:00 2001 From: Arthit Suriyawongkul Date: Thu, 10 Oct 2024 15:13:44 +0700 Subject: [PATCH 4/7] Update spec-parser parameters Signed-off-by: Arthit Suriyawongkul --- src/spdx_tools/spdx3/writer/json_ld/process.md | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/src/spdx_tools/spdx3/writer/json_ld/process.md b/src/spdx_tools/spdx3/writer/json_ld/process.md index 1968e6e48..485df8805 100644 --- a/src/spdx_tools/spdx3/writer/json_ld/process.md +++ b/src/spdx_tools/spdx3/writer/json_ld/process.md @@ -5,16 +5,17 @@ Process to produce context file and a serialization example: 1. Run ```sh - spec-parser --gen-md --gen-refs --gen-rdf ../spdx-3-model/model + python spec-parser/main.py spdx-3-model/model parser_output ``` - - spdx-3-model (commit: 6cb4316, last commit where spec-parser is able to run) - - spec-parser (main with commits from PR 44, 45) + - spdx-3-model (main; where v3.0.1 development happens) + - spec-parser (main) -2. Convert the generated `spec-parser/md_generated/model.ttl` to a json-ld file using . -3. Convert owl to context using `convert_spdx_owl_to_jsonld_context("SPDX_OWL.json")`. -4. Place the generated `context.json` in `spdx_tools/spdx3/writer/jsonld/`. -5. To generate the jsonld from the testfile run +2. Convert the generated `parser_output/rdf/spdx-model.ttl` to a JSON-LD file + using . +3. Convert OWL to context using `owl_to_context.py`. +4. Place the generated `context.json` in `src/spdx_tools/spdx3/writer/json_ld/`. +5. To generate the JSON-LD from the test file, run: ```sh pyspdxtools3 -i ./tests/spdx/data/SPDXJSONExample-v2.3.spdx.json -o example_with_context From 4f611819ff5211760a3be6861a0d7876773fb9f6 Mon Sep 17 00:00:00 2001 From: Arthit Suriyawongkul Date: Thu, 10 Oct 2024 19:03:39 +0700 Subject: [PATCH 5/7] Add link to official serialization doc Signed-off-by: Arthit Suriyawongkul --- src/spdx_tools/spdx3/writer/json_ld/process.md | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/src/spdx_tools/spdx3/writer/json_ld/process.md b/src/spdx_tools/spdx3/writer/json_ld/process.md index 485df8805..02dc7b715 100644 --- a/src/spdx_tools/spdx3/writer/json_ld/process.md +++ b/src/spdx_tools/spdx3/writer/json_ld/process.md @@ -1,5 +1,10 @@ # Workflow +Official SPDX v3.0 serialization documentation and context file +are available at: + +## Manually generate context file + Process to produce context file and a serialization example: 1. Run @@ -21,8 +26,6 @@ Process to produce context file and a serialization example: pyspdxtools3 -i ./tests/spdx/data/SPDXJSONExample-v2.3.spdx.json -o example_with_context ``` -## Manually - ## Known limitations - Validation of enums does not work From 64f117c23f165cc2d261ef3001ff24699cf0c3a2 Mon Sep 17 00:00:00 2001 From: Arthit Suriyawongkul Date: Thu, 10 Oct 2024 19:16:19 +0700 Subject: [PATCH 6/7] Add link to SPDX 3.0 model Signed-off-by: Arthit Suriyawongkul --- CHANGELOG.md | 4 ++-- README.md | 16 +++++++++++----- 2 files changed, 13 insertions(+), 7 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index f17ee022c..56126e898 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -141,12 +141,12 @@ Starting a Changelog. * Dropped Python 2 support. Python >= 3.6 is now required. * Added `pyspdxtools_convertor` and `pyspdxtools_parser` CLI scripts. See [the readme](README.md) for usage instructions. -* Updated the tools to support SPDX versions up to 2.3 and to conform with the specification. Apart from many bugfixes +* Updated the tools to support SPDX versions up to 2.3 and to conform with the specification. Apart from many bugfixes and new properties, some of the more significant changes include: * Support for multiple packages per document * Support for multiple checksums for packages and files * Support for files outside a package -* **Note**: Validation was updated to follow the 2.3 specification. Since there is currently no support for +* **Note**: Validation was updated to follow the 2.3 specification. Since there is currently no support for version-specific handling, some details may be handled incorrectly for documents using lower versions. The changes are mostly restricted to properties becoming optional and new property values becoming available, and should be of limited impact. See https://spdx.github.io/spdx-spec/v2.3/diffs-from-previous-editions/ diff --git a/README.md b/README.md index 54fbe21d0..1f8eb12e3 100644 --- a/README.md +++ b/README.md @@ -29,7 +29,8 @@ This library implements SPDX parsers, convertors, validators and handlers in Pyt - PyPI: - Browse the API: -Important updates regarding this library are shared via the SPDX tech mailing list: . +Important updates regarding this library are shared via +the SPDX tech mailing list: . ## License @@ -39,9 +40,10 @@ Important updates regarding this library are shared via the SPDX tech mailing li - API to create and manipulate SPDX v2.2 and v2.3 documents - Parse, convert, create and validate SPDX files -- supported formats: Tag/Value, RDF, JSON, YAML, XML -- visualize the structure of a SPDX document by creating an `AGraph`. Note: This is an optional feature and requires -additional installation of optional dependencies +- Supported formats: Tag/Value, RDF, JSON, YAML, XML +- Visualize the structure of a SPDX document by creating an `AGraph`. + Note: This is an optional feature and requires + additional installation of optional dependencies ## Experimental support for SPDX 3.0 @@ -50,7 +52,11 @@ additional installation of optional dependencies - Serialize to JSON-LD See [Quickstart to SPDX 3.0](#quickstart-to-spdx-30) below. -The implementation is based on the descriptive Markdown files in the repository (latest commit: a5372a3c145dbdfc1381fc1f791c68889aafc7ff). +The implementation is based on the descriptive Markdown files in the repository + +(commit: a5372a3c145dbdfc1381fc1f791c68889aafc7ff). +The latest SPDX 3.0 model is available at +. ## Installation From 3418c2f1e1f497b66ebc1bc538212f5335d5e98a Mon Sep 17 00:00:00 2001 From: Arthit Suriyawongkul Date: Thu, 10 Oct 2024 22:12:22 +0700 Subject: [PATCH 7/7] Remove mentions that SPDX 3 is not released yet Signed-off-by: Arthit Suriyawongkul --- CHANGELOG.md | 6 +++--- DOCUMENTATION.md | 4 ++-- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 56126e898..e44b33018 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -143,9 +143,9 @@ Starting a Changelog. * Added `pyspdxtools_convertor` and `pyspdxtools_parser` CLI scripts. See [the readme](README.md) for usage instructions. * Updated the tools to support SPDX versions up to 2.3 and to conform with the specification. Apart from many bugfixes and new properties, some of the more significant changes include: - * Support for multiple packages per document - * Support for multiple checksums for packages and files - * Support for files outside a package + * Support for multiple packages per document + * Support for multiple checksums for packages and files + * Support for files outside a package * **Note**: Validation was updated to follow the 2.3 specification. Since there is currently no support for version-specific handling, some details may be handled incorrectly for documents using lower versions. The changes are mostly restricted to properties becoming optional and new property values becoming diff --git a/DOCUMENTATION.md b/DOCUMENTATION.md index 5119dd467..6a9bcb2a5 100644 --- a/DOCUMENTATION.md +++ b/DOCUMENTATION.md @@ -5,7 +5,7 @@ Beneath the top-level package `spdx_tools` you will find three sub-packages: - `spdx`, which contains the code to create, parse, write and validate SPDX documents of versions 2.2 and 2.3 -- `spdx3`, which will contain the same feature set for versions 3.x once they are released +- `spdx3`, which will contain the same feature set for versions 3.x - `common`, which contains code that is shared between the different versions, such as type-checking and `spdx_licensing`. ## `spdx` @@ -117,7 +117,7 @@ That is, if an empty list is returned, the document is valid. ## `spdx3` -Due to the SPDX-3 model still being in development, this package is still a work in progress. +This package is still a work in progress. However, as the basic building blocks of parsing, writing, creation and validation are still important in the new version, the `spdx3` package is planned to be structured similarly to the `spdx` package.