Skip to content

add support for PEP 639 License Clarity #870

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

radoering
Copy link
Member

@radoering radoering commented May 18, 2025

Related to: python-poetry/poetry#9670
Downstream tests will be fixed in python-poetry/poetry#10413 after merging this one.

  • Added tests for changed code.
  • Updated documentation for changed code.

Summary by Sourcery

Implement PEP 639 license clarity support by adding parsing and validation for project.license-files and SPDX license expressions, bump core metadata version to 2.4, include license data in distributions, and deprecate legacy license tables and classifiers.

New Features:

  • Support the [project].license-files key with glob patterns for specifying license files.
  • Support SPDX license expressions via [project].license as a top-level string instead of the legacy table format.

Bug Fixes:

  • Raise errors when mixing legacy [project].license table definitions with license-files or invalid glob patterns.
  • Provide clear exceptions for missing or unreadable license files during package creation and metadata export.

Enhancements:

  • Bump core metadata version to 2.4 and emit License-Expression and License-File fields in built distributions.
  • Emit deprecation warnings for legacy [project].license table subkeys and deprecated license classifiers during strict validation.
  • Place license files under dist-info/licenses in built wheels and include them in source distributions.

Tests:

  • Add extensive tests for license parsing scenarios, invalid glob patterns, validation warnings, metadata generation, and builder inclusion of license files.

Copy link

sourcery-ai bot commented May 18, 2025

Reviewer's Guide

This PR implements full support for PEP 639 “License Clarity” by extending the core factory to parse new [project].license-files globs and SPDX license expressions, enhancing strict validation, bumping core metadata to 2.4, updating masonry builders to emit License-Expression and License-File fields and embed license files under dist-info/licenses, and updating/adding tests to cover all new scenarios.

Sequence Diagram: PEP 639 License Processing

sequenceDiagram
    actor Developer
    participant P as PyProjectTOML
    participant F as Factory
    participant PP as ProjectPackage
    participant M as Metadata
    participant B as Builder

    Developer->>P: Defines [project].license (SPDX)\nand [project].license-files
    F->>P: Reads pyproject.toml data
    F->>PP: _configure_package_metadata(package, project_data)
    activate F
        F->>F: canonicalize_license_expression(project_data["license"])
        PP->>PP: set package.license_expression
        F->>F: Validate license-files globs
        PP->>PP: set package.license_files (globs)
    deactivate F

    F->>F: validate(toml_data)
    activate F
        F->>F: _validate_project(project_data, result) // Validates SPDX, warns on legacy
    deactivate F

    M->>PP: from_package(package)
    activate M
        M->>M: set meta.metadata_version = "2.4"
        M->>M: set meta.license_expression (from package.license_expression)
        M->>PP: Processes package.license_files (globs from package.root_dir)
        activate PP
            PP->>PP: package.root_dir.glob(pattern)
        deactivate PP
        opt Globs match no files
            M->>M: Raise RuntimeError
        end
        M->>M: set meta.license_files (resolved relative paths)
    deactivate M

    B->>M: get_metadata_content()
    activate B
        B->>B: Writes Metadata-Version: 2.4
        opt meta.license_expression is set
            B->>B: Writes License-Expression: ...
        end
        loop for each license_file in meta.license_files
            B->>B: Writes License-File: ...
        end
    deactivate B

    B->>M: _get_legal_files()
    activate B
        B->>B: Returns files based on meta.license_files
    deactivate B

    B->>B: Includes license files (e.g., in dist-info/licenses/)
Loading

Class Diagram: PEP 639 License Handling Changes

classDiagram
    direction LR
    class Factory {
        +String _configure_package_metadata(ProjectPackage package, dict project, dict tool_poetry, Path root)
        +None _validate_project(dict project, dict result)
    }
    class ProjectPackage {
        +license_expression: NormalizedLicenseExpression
        +license_files: LicenseFileConfig
        +List~String~ all_classifiers()
    }
    class Metadata {
        +String metadata_version = "2.4"
        +String license_expression
        +Tuple~String~ license_files
        +Metadata from_package(ProjectPackage package)
    }
    class Builder {
        #Metadata _meta
        +String get_metadata_content()
        #Set~Path~ _get_legal_files()
    }
    class WheelBuilder {
        +Path prepare_metadata(Path metadata_directory)
    }

    Factory ..> ProjectPackage : configures
    Metadata ..> ProjectPackage : generated from
    Builder ..> Metadata : uses
    WheelBuilder --|> Builder
Loading

File-Level Changes

Change Details Files
Enhance factory parsing of license and license-files
  • Support new [project].license-files key: validate glob syntax and store as tuple
  • Parse string license as SPDX expression with canonicalization, fallback and warnings for invalid formats
  • Handle deprecated table-form [project].license subkeys: error when mixing with license-files, read license.text and license.file
  • Populate package.license_expression and package.license_files accordingly
src/poetry/core/factory.py
Add strict validation rules for project license and classifiers
  • Introduce _validate_project to warn on deprecated license tables and invalid SPDX expressions
  • Warn on deprecated License :: classifiers in project.classifiers
src/poetry/core/factory.py
Bump metadata version and compute license data
  • Update Metadata.metadata_version to 2.4
  • Add license_expression and license_files fields
  • Populate Metadata.from_package: prefer license_expression, assemble license_files via user globs or default patterns, error on unmatched globs
src/poetry/core/masonry/metadata.py
Update builders to emit license fields and include files
  • Parameterize METADATA_BASE version and append License-Expression and multiple License-File entries
  • Modify wheel builder to copy license files into dist-info/licenses
  • Replace legacy legal file scan with explicit meta.license_files usage
src/poetry/core/masonry/builders/builder.py
src/poetry/core/masonry/builders/wheel.py
Revise tests and fixtures for PEP 639 support
  • Extend factory tests for license expression, license-files handling, invalid globs
  • Add validation tests for license tables and classifiers
  • Cover Metadata.from_package license scenarios and file patterns
  • Adjust builder and wheel tests to expect new metadata entries and license file paths
  • Update fixtures to use new [project].license and license-files syntax
tests/test_factory.py
tests/masonry/test_metadata.py
tests/masonry/builders/*
tests/masonry/test_api.py
tests/packages/test_package.py
tests/fixtures
Bump packaging requirement for PEP 639 support
  • Upgrade packaging dependency to >=24.2 for license expression support
vendors/pyproject.toml

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@reneleonhardt
Copy link

Wow, all this code only for parsing new license fields?

@radoering
Copy link
Member Author

Wow, all this code only for parsing new license fields?

And we do not even do the SPDX parsing by ourselves but use packaging for that. However, there are many MUSTs and SHOULDs in the standard. This requires a lot of error handling. Apart from that I added (quite long) comments with extracts from the standard in order to understand possibly unintuitive parts of the implementation. And of course, more than half of the new code are tests.

@reneleonhardt
Copy link

I noticed, thanks for all the work!

@radoering radoering marked this pull request as ready for review May 31, 2025 04:47
Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @radoering - I've reviewed your changes - here's some feedback:

  • The new license parsing logic in Factory._configure_package_metadata is deeply nested—consider breaking it into smaller helper methods to improve readability and maintainability.
  • Metadata.from_package and Builder._get_legal_files share very similar license‐file discovery code—extract that into a common utility to avoid duplication.
  • The parameterized license tests are quite verbose and repetitive—introduce helper fixtures or functions to encapsulate common setup and assertions.
Here's what I looked at during the review
  • 🟡 General issues: 2 issues found
  • 🟢 Security: all looks good
  • 🟡 Testing: 1 issue found
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants