Package Specification and Ecosystem #700
claymcleod
started this conversation in
Requests for Comment
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Note
The state of this RFC is a work in progress.
Overview
This RFC proposes the introduction of a package specification and ecosystem that is formally supported within any compliant WDL implementation. It accomplishes this by defining the structure of WDL packages and outlining the mechanisms by which those packages may be imported within consumer WDL workflows.
The following is an example of what importing a task for
fastp
might look like in a WDL worflow of the future. Here, we'll use the existing BioWDL implementation offastp
.Motivation
Importing in WDL today is a relatively limited operation—far too limited to meet the needs of modern, distributed software development and a package ecosystem. Only local file references or references to concrete URLs are supported during import, neither of which are sufficient for properly supporting a distributed package ecosystem.
main
/master
branch. Since no conventions or backwards compatibility guarantees are made, documents can easily change out from under you at any time.https://raw.githubusercontent.com
), one can implement some level of versioned importing. The result is still undesirable though, as the version of dependencies is baked into the source code itself. This means that a change fromv2.0.1
tov2.0.2
of a dependency means that all URLs pointing to this dependency need to be updated. This is unnecessary work and added noise for the git commit history.All of this is made a bit easier through the use of relative imports (e.g.,
import "../task.wdl"), which ensure that imports are resolved relative the current document location. This means that, whether you run a workflow from a local document (
sprocket run ./workflow.wdl) or a remote document (
sprocket run https://example.com/workflow.wdl`), imports will sourced correctly. That being said, this does not solve the issue of importing from external packages.A more ideal state
We consider the following requirements of a solid foundation upon which to build a package ecosystem.
Goals
Given the above ideal state, this RFC sets out the following goals.
Antigoals
Prior Art
#226
As far as I can tell, this is the earliest official proposal to add some level of package management/versioning to WDL. The issue is rather short and describes a syntax like the following.
As previously stated, I'm not in support of stopping at the proposed mechanism because it assumes a centralized package system (something I don't think is feasible with our relatively small contributor team nor advisable based on our emphasis in enabling distributed development and maintenance).
#493
The revival of this idea in early 2022, this proposal similarly focuses on the details of a centralized package repository. For the pieces that overlap with this proposal (essentially just the syntax of how this importing would work), this proposal and #493 overlap quite a bit—just a difference in the order of the "import" and "from" clauses.
#499
A few months later, some work was done to outline what the package format might look like concretely. Some inspiration was taken from this work regarding the information that should be collected (name, author, version, license). Further, the concerns regarding reproducibility were included in the goals of this proposal where needed. For the remainder of text in that proposal, the fact that we are relying so heavily on distributed package management with git rather than a centralized package store makes much of the conversation irrelevant to this proposal (e.g., tar vs zip).
#698
#698 proposes that we relax the constraints around what document versions can be imported, essentially advocating that any WDL v1.x version must be able to load any WDL documents with versions v1.x or lower. This adheres to common expectations regarding backwards compatibility of software with the same major version and would be required for this proposal to be manageable (else, the entire ecosystem could become deadlocked waiting for root packages in the ecosystem to update their WDL version when a new minor revision of WDL is released).
Proposal
Note
This section specifically uses the
sprocket
engine that my team is writing as an example. This is because it's hard to make a concrete case for what a solution might look like outside of both (a) the WDL specification and (b) a concrete execution engine.Overview
This RFC proposed the following actions: all of which get their own sections below.
Definition of a WDL package
The ability to create a dependency tree of packages requires defining a manifest file that describes the content of each package and all of it's dependencies. To that end, a WDL package is defined as a directory containing a manifest file and zero or more WDL files.
Manifest file
The manifest file for a package always lives at the root of the directory with the name
wdl-package.json
. The manifest file contains the following keys.authors
. The authors field is an array of strings where each string is an author description with no enforced format. The convention of singular authors should be"First Last <first.last@example.com>"
, but this is not a requirement.version
. The version field is a string that adheres to the SemVer v2.0.0 specification with respect to backwards compatibility guarantees and versioning expectations.license
. The license field is a string that defines the license requirements of the package. It contains a list of SPDX license identifiers concatenated withAND
andOR
. In more complex cases, parentheses can be used to group items and establish associativity.dependencies
. The dependencies field is a JSON object that contains one key per import for that package. Each key must be a valid WDL identifier. The value of the field can be one of the following.Here would be an example of a
wdl-package.json
file.These can then be imported using the following syntax.
Name field
Notably, the name field is missing from the manifest file. This is an intentional decision, as packages and imported directly from source and not from a package management server where they have an identifier. This fact removes the need to capture a name, as the importer simply names the package whatever they want to refer to it by (as long as it's a valid WDL identifier) in their
wdl-package.json
.Other notes
wdl-package.json
was chosen to ensure there is no conflict with theWarning
This RFC is being rewritten and I've stopped here. Content below this point is currently out of date.
Introduce syntax for symbolic imports
First, let's address the relatively straightforward question of what for the imports would take. Because imports today must be enclosed in quotes, a backwards compatible change that seems sufficient would be that any non-quoted import is assumed to be a symbolic import. For example,
Next, there are a number of different ways you can structure the identifier that symbolic imports link into—particularly as it relates to conflicts in the identifiers. To start, we suggest to start with the simplest approach, which is an identifier conforming to a word pattern (
[a-zA-Z0-9_]+
). The lack of namespacing means that everything must be explicitly handled by the user.Consider a
Sprocket.toml
file that contains a dependency section like so:Effectively, we're empowering (and requiring) users to solve name conflicts themselves. The authors feel this a reasonable first step in this direction rather than, say, a complicated set of name conflict requirements.
Concretely, the following import modalities would be required (as show through examples in the fictional
Sprocket.toml
file).Notably, the specifics of how these imports would be linked in across the various execution engines is left up to the execution engine—there is no requirement from the specification other than (a) these modalities must be supported and (b) they must be clearly documented for users to be able to take advantage of.
Concerns left to address.
Beta Was this translation helpful? Give feedback.
All reactions