Interface version canonicalization #536

lann · 2025-06-25T19:34:14Z

lann · 2025-06-25T21:28:14Z

For the binary encoding the most straightforward option from a quick review would seem to be adding variants of importname' / exportname' along the lines of:

importname' ::= 0x00 len:<u32> in:<importname>                       => in  (if len = |in|)
              | 0x01 len:<u32> in:<importname> fullverlen:<u16> fullver:<valid semver>

I suppose if we wanted to optimize the binary a bit this extra field could contain just the part of the original version that got lopped off by canonicalization.

On this field width:

fullverlen:<u16>

https://semver.org/#does-semver-have-a-size-limit-on-the-version-string

No, but use good judgment. A 255 character version string is probably overkill, for example. Also, specific systems may impose their own limits on the size of the string.

🤷

lukewagner · 2025-06-26T21:02:44Z

@lann Thanks for starting this! For the binary encoding question: yes, taking over the 0x00 byte and using it as a discriminant is a nice coincidence we can take advantage of (and could you update the corresponding bullet in the "Warts" section at the end)?

I suppose if we wanted to optimize the binary a bit this extra field could contain just the part of the original version that got lopped off by canonicalization.

Is there a simplicity argument to be made that requiring the concatenation of the version and the fullversion to match <valid semver> is simpler than allowing the fullversion to be <valid semver> and then adding the additional validation requirement (which I assume we want) that the fullversion has to "match" the version? If so, that could be a second argument in favor in addition to size.

lukewagner

Looking good! A few drive-by comments:

design/mvp/Explainer.md

lukewagner

(oops, meant to "comment" not approve before it's even ready to review 🙃 )

alexcrichton · 2025-06-27T14:41:51Z

For the binary encoding, here's another possible encoding:

importname' ::= 0x00 len:<u32> in:<importname>                       => in  (if len = |in|)
              | 0x01 len:<u32> in:<importname>                       => "${in.name}@N"  (if len = |in|,  in.version = N.*)
              | 0x02 len:<u32> in:<importname>                       => "${in.name}@0.N"  (if len = |in|,  in.version = 0.N.*)
              | 0x03 len:<u32> in:<importname>                       => "${in.name}@0.0.N"  (if len = |in|,  in.version = 0.0.N.*)

maybe with affordances for rc/etc unsure. The basic idea though is that the actual import name would always be foo:bar/baz@0.1.2 in the binary format but the semantic meaning (e.g. the text format) would be a subslice of such a string. This codifies that in the binary format it's always a valid semver and the discriminant byte says basically how to shorten it. The goal here would be to make the binary format still pretty clear what it can be without changing the meaning of the meaning at a parsed layer.

fullverlen:

https://semver.org/#does-semver-have-a-size-limit-on-the-version-string

No, but use good judgment. A 255 character version string is probably overkill, for example. Also, specific systems may impose their own limits on the size of the string.

For this I'd recommend using <u32> regardless. We already limit many strings far below the theoretical 4G limit with a 32-bit length and keeping <u32> makes it more consistent with the rest of the decoding process. Otherwise when implementing a decoder you'd have to implement a specific function for decoding a 16-bit LEB which is otherwise not required when parsing WebAssembly today. Basically while I agree that >255 characters for a version is silly, I'd say that for consistency with the rest of the binary format this'd want to be <u32> if we go with this variant.

lukewagner · 2025-06-30T17:28:10Z

@alexcrichton Good idea; that cleanly answers some of the questions above. My only light concern is that tools might just treat the <importname> as the name and miss the nuance of chopping off parts of the versions. I suppose tests and common low-level tools could catch/factor-out most of this though. But if we go this direction: I suppose technically we don't even need the {0, 1, 2, 3} opcode; it could just be derived from the full <valid semver> string, making version canonicalization a binary encoding detail. Thoughts?

alexcrichton · 2025-06-30T18:03:27Z

I agree yeah there's risk since the name in the binary format is "so simple", but yeah that's also where I'd hope that tests could weed things out. It'd be pretty simple in parser libraries I'd imagine to avoid exposing the full name as the import name if the discriminant was present.

My thinking though was that the name always has a full and valid semver, as defined by semver itself. That way the discriminant says what the "real" import name is (e.g. chopping off other stuff) for linking/semantic purposes. Although I may be misunderstanding what you're thinking about how to drop the discriminant?

lann · 2025-06-30T18:25:16Z

I think @lukewagner is suggesting that the differences between 1/2/3 can be derived from the string itself. The algo would be something like:

starting at @:
if the string between @ and the first . isn't 0, trim before the first .
if the string between @ and the second . isn't 0.0, trim before the second .
otherwise, trim immediately after any digits after the second . (which should only be a - or +)

alexcrichton · 2025-06-30T18:30:52Z

Ah I see! So something like (as a transition to the future):

importname' ::= 0x00 len:<u32> in:<importname>  => in  (if len = |in|)
              | 0x01 len:<u32> in:<importname>  => "${in.name}@${in.canonver}"  (if len = |in|)

where in the future we'd drop 0x00 entirely (and possibly rename 0x01 to 0x00). The <importname> is always required to have a full and valid semver too?

lann · 2025-06-30T18:41:36Z

I think of the "semver-aware" options I prefer @lukewagner's 1 (extra) discriminant option; if you are parsing semver anyway then the logic is only marginally more complex than the 3 discriminant option.

I'm more ambivalent on whether the parser should be semver-aware. I like the conceptual simplicity of "the name is the name" but this is a binary encoding and if we're going to require validation of semver then we're probably already committing to most of that code complexity anyway.

lann · 2025-07-01T18:45:23Z

We discussed this in a meeting today and decided to simplify a bit:

In the text format fullversion will change to versionsuffix and hold just the part of the full version that is removed by canonicalization
The binary format will use two strings: the canonicalized import name and the versionsuffix

lann · 2025-07-03T15:38:45Z

design/mvp/Binary.md

@@ -399,7 +402,10 @@ Notes:
  `(result (own $R))`, where `$R` is the resource labeled `r`.
 * Validation of `[method]` names requires the first parameter of the function
  to be `(param "self" (borrow $R))`, where `$R` is the resource labeled `r`.
-* `<valid semver>` is as defined by [https://semver.org](https://semver.org/)


<valid semver> wasn't referenced in this file.

design/mvp/Explainer.md

lann · 2025-07-03T16:42:10Z

design/mvp/Explainer.md

 namespace     ::= <words> ':'
 words         ::= <word>
                | <words> '-' <word>
 projection    ::= '/' <label>
-version       ::= '@' <valid semver>
+# FIXME: surrounding alignment


TODO after final review

lukewagner

Great, thanks! Just a few small comments:

design/mvp/Explainer.md

design/mvp/Binary.md

lann · 2025-07-10T18:48:33Z

design/mvp/Explainer.md

+canonversion      ::= [1-9] [0-9]*
+                    | '0.' [1-9] [0-9]*
+                    | '0.0.' [1-9] [0-9]*
+semversuffix      ::= [0-9A-Za-z.+-]*


This just checks for valid semver characters. The "real" validation is covered in prose below, i.e. <canonversion><semversuffix> must match valid semver.

I had a version of this as [.+-][0-9A-Za-z.+-]+ which was less ambiguous but also implied that versionsuffix and the binary 0x01 variant couldn't be used with an empty suffix like you'd get with full version 0.0.1. Given that we're saying we'll remove non-canonical interface versions I think allowing the empty suffix will ultimately be simpler.

lann force-pushed the truncated-versions branch 3 times, most recently from 7b6bd7d to 2f8eda8 Compare June 25, 2025 20:46

lann changed the title ~~WIP: Truncated interface versions~~ Interface version canonicalization Jun 25, 2025

lann mentioned this pull request Jun 25, 2025

Interface version / compatibilty changes #534

Open

lukewagner approved these changes Jun 26, 2025

View reviewed changes

lukewagner reviewed Jun 26, 2025

View reviewed changes

lann force-pushed the truncated-versions branch from 2f8eda8 to 6d56eaf Compare June 30, 2025 19:27

lann force-pushed the truncated-versions branch from 6d56eaf to d3efc82 Compare July 1, 2025 22:49

This comment was marked as resolved.

Sign in to view

lann force-pushed the truncated-versions branch from d3efc82 to 22fbfc5 Compare July 2, 2025 21:50

This comment was marked as resolved.

Sign in to view

lann requested a review from lukewagner July 2, 2025 21:51

lann force-pushed the truncated-versions branch from 22fbfc5 to 8573d55 Compare July 2, 2025 21:55

lann commented Jul 3, 2025

View reviewed changes

design/mvp/Explainer.md Outdated Show resolved Hide resolved

lann force-pushed the truncated-versions branch 2 times, most recently from 28b07b3 to 531cb92 Compare July 3, 2025 16:35

lann marked this pull request as ready for review July 3, 2025 16:37

lann commented Jul 3, 2025

View reviewed changes

lukewagner reviewed Jul 3, 2025

View reviewed changes

design/mvp/Explainer.md Outdated Show resolved Hide resolved

design/mvp/Explainer.md Outdated Show resolved Hide resolved

design/mvp/Explainer.md Show resolved Hide resolved

design/mvp/Explainer.md Show resolved Hide resolved

design/mvp/Binary.md Outdated Show resolved Hide resolved

lann force-pushed the truncated-versions branch from 531cb92 to 2e073ce Compare July 10, 2025 18:20

Add canonical interface name

08b2717

lann force-pushed the truncated-versions branch from 2e073ce to 08b2717 Compare July 10, 2025 18:42

lann commented Jul 10, 2025

View reviewed changes

lann requested a review from lukewagner July 11, 2025 19:55

Interface version canonicalization #536

Are you sure you want to change the base?

Interface version canonicalization #536

Uh oh!

Conversation

lann commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lann commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lukewagner commented Jun 26, 2025

Uh oh!

lukewagner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lukewagner left a comment

Choose a reason for hiding this comment

Uh oh!

alexcrichton commented Jun 27, 2025

Uh oh!

lukewagner commented Jun 30, 2025

Uh oh!

alexcrichton commented Jun 30, 2025

Uh oh!

lann commented Jun 30, 2025

Uh oh!

alexcrichton commented Jun 30, 2025

Uh oh!

lann commented Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lann commented Jul 1, 2025

Uh oh!

This comment was marked as resolved.

This comment was marked as resolved.

lann Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lann Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

lukewagner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lann Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lann commented Jun 25, 2025 •

edited

Loading

lann commented Jun 25, 2025 •

edited

Loading

lann commented Jun 30, 2025 •

edited

Loading

lann Jul 10, 2025 •

edited

Loading