Skip to content

Make more types jl_static_show readably #58512

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
May 29, 2025

Conversation

xal-0
Copy link
Member

@xal-0 xal-0 commented May 23, 2025

Makes more types survive jl_static_show:

  • Symbols
    • Symbols printed in the :var"foo" form use raw string escaping, fixing :var"a\b", :var"a\\", :var"$a", etc.
    • Symbols that require parens use parens (:(=), ...)
  • Signed integers: Except for Int, signed integers print like Int8(1).
  • Floats: floats are printed in a naive but reversible (TODO: double check) way. Inf(16|32|) and NaN(16|32|) are printed, and Float16/Float32 print the type (Float32(1.5)). Float64s are printed with a trailing .0 if it is necessary to disambiguate from Int.

Fixes #52677, #58484 (comment), #58484 (comment), and the specific case mentioned in #58484. Improves the situation for #38902 but does not close it, because a few cases still do not round-trip (inexhaustive list):

  • Non-canonical NaNs
  • BFloat16
  • User-defined primitive types. This one is tricky, because they can have a size different from any type we have literals for.
  • Without don't print field names in static_show of structs #40652 or something similar, precompile statements will still fail with field names.

@xal-0 xal-0 requested a review from topolarity May 23, 2025 19:39
@xal-0 xal-0 added the bugfix This change fixes an existing bug label May 23, 2025
@KristofferC
Copy link
Member

For #40652, can we thread something through to not show the field name in precompile printing contexts? Seems unnecessary to have the correctness of the printing be bike shredded on how people want it to print in the debugger.

@nickrobinson251
Copy link
Contributor

Fixes ... #58484 (comment),

Please can you add tests for Val(Inf) (and maybe Val(-Inf) and Inf and -Inf)?

@topolarity topolarity added backport 1.10 Change should be backported to the 1.10 release backport 1.11 Change should be backported to release-1.11 backport 1.12 Change should be backported to release-1.12 labels May 23, 2025
@topolarity
Copy link
Member

It might be worth printing these floating-point numbers as hexadecimal literals (specifically in a type context):

julia> 0x1p0
1.0

julia> 0x1.8p3
12.0

julia> x = 0x.4p-1
0.125

These can be easy to parse / print bit-accurately, if we have trouble finding floating point <-> decimal conversion routines that we trust to be digit-/bit-accurate.

Of course, if we trust the decimal conversion on all platforms, that's just as good and more human-readable.

n backslashes followed by " => 2n+1 backslashes and "
n backslahes at the end     => 2n backslashes
@KristofferC KristofferC mentioned this pull request May 28, 2025
49 tasks
@xal-0
Copy link
Member Author

xal-0 commented May 28, 2025

Unfortunately we have no hex Float32 literals, and I would like to preserve readability. That's also why I prefer %.17g (with trailing .0 when required) over %.17e: 123.0 is nicer than 1.23000000000000000e+02, 0.5 vs 5.00000000000000000e-01, etc. It's not as nice as Ryu (we still print 0.10000000000000001 instead of 0.1), but it's the best we can do with libc.

xal-0 and others added 3 commits May 28, 2025 11:09
Co-authored-by: Jameson Nash <vtjnash@gmail.com>
Co-authored-by: Jameson Nash <vtjnash@gmail.com>
@topolarity
Copy link
Member

Unfortunately we have no hex Float32 literals

No, but it losslessly casts to Float32 in the same way that Int64 -> Int32 does

That said, I'm all in favor of the readability

@@ -985,11 +1069,14 @@ static size_t jl_static_show_x_(JL_STREAM *out, jl_value_t *v, jl_datatype_t *vt
n += jl_printf(out, "0x%08" PRIx32, *(uint32_t*)v);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you fix-up Pointer types as well?

julia> ccall(:jl_, Cvoid, (Any,), Val(Ptr{Cvoid}(1)))
Base.Val{0x0000000000000001}()

It's a little awkward, because their repr doesn't round-trip either:

julia> Val(Ptr{Cvoid}(1))
Val{Ptr{Nothing} @0x0000000000000001}()

julia> Val{Ptr{Nothing} @0x0000000000000001}()
ERROR: ParseError:
# Error @ REPL[10]:1:18
Val{Ptr{Nothing} @0x0000000000000001}()
#                └─────────────────┘ ── Expected `}`

but regardless of whether it parses properly, I don't think we should be throwing out the type information

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also consider fixing the jl_tvar_type case also. Right now it only prints the name after checking that the parent UnionAll is present, but we should also check that that name is unique, and if not, to find some way to make it unique in printing.

@vtjnash vtjnash merged commit b03ef6b into JuliaLang:master May 29, 2025
5 of 7 checks passed
else if (vt == jl_long_type) {
// Avoid unnecessary Int64(x)/Int32(x)
n += jl_printf(out, "%" PRIdPTR, *(intptr_t*)v);
}
else if (vt == jl_int64_type) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like negative Int128 values are still problematic:

julia> ccall(:jl_, Cvoid, (Any,), typemin(Int128))
Int128(0x80000000000000000000000000000000)
julia> ccall(:jl_, Cvoid, (Any,), Int128(-1))
Int128(0xffffffffffffffffffffffffffffffff)

And UInt128 prints with an explicit conversion it doesn't need:

julia> ccall(:jl_, Cvoid, (Any,), UInt128(0))
UInt128(0x00000000000000000000000000000000)
julia> UInt128(0x00000000000000000000000000000000) === 0x00000000000000000000000000000000
true

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't problematic, since that Int128() printing means a bitcast, not a function call

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure - we can fix this as part of the bit-casting fix-ups, but regardless it doesn't round-trip:

julia> Int128(0x80000000000000000000000000000000)
ERROR: InexactError: convert(Int128, 0x80000000000000000000000000000000)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works just fine: you're just making the mistake of calling eval, when the semantics of this printer have never been to be usable with eval, but rather is a representation for reinterpret.

julia> reinterpret(Int128, 0x80000000000000000000000000000000)
-170141183460469231731687303715884105728

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we interested in an "almost" eval-able human-readable serialization format?

The only trade-off that I can understand is that it's a bit more compact than a proper repr, but more importantly it needs a custom de-serializer which afaict isn't implemented

Since we need a programmatic round-trip, it's not enough IMO to appeal to a de-serializer that "should" be used if it doesn't actually exist

Copy link
Member

@vtjnash vtjnash May 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it is human readable, and thus a useful debugging tool. Someone should implement a de-serializer if they want to use this for generating precompile statements. Just blinding executing code will never give the correct answers anyways, so these little gotcha are just reminders that using eval is a dead-end approach and not fixable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a disingenuous argument @vtjnash -- nobody reads this output for debugging (and we've tried); on the contrary, it is actively used for generating precompile statements. It would be better to remove all the "little gotchas".

Copy link
Member

@topolarity topolarity May 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just blinding executing code will never give the correct answers anyways, so these little gotcha are just reminders that using eval is a dead-end approach and not fixable.

I feel like this is lacking justification - all of the recent serialization problems we've enumerated so far (floating point, integer types, bfloat, restricted inner constructors, symbol escaping, etc.) are basic problems around making sure that we print the correct literal (or the correct literal + a direct, loss-less conversion to a built-in datatype). Of course, the other major issue (still unsolved) is around custom constructors making a reinterpret required to round-trip, but the point stands that most of those are direct fixes to the de-parse (print "1.0" not "1") and not problems inherent to eval.

I'm very open to more fundamental arguments about why eval is broken for this, but the only caveat that I'm aware of is that eval may not map type names (Vector, String, Foo) correctly depending on the evaluation context, or that these may have a different meaning (esp. anonymized typenames like #foo##2) in different executions.

It is true that Serialization does this better by using the PkgId / UUID, but is that the only fundamental thing that we gain by moving away from eval? Is there a more fundamental reason why eval cannot be made to work here? Do we trust Serialization to do all of this correctly, or does that have outstanding correct-ness issues too?

Copy link
Member

@KristofferC KristofferC May 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whenever there is an alternative serialization format to describe a set of methods to compile that is better I'm sure we will all happily switch to that. Until that exists it seems reasonable to do incremental improvements to the existing one, even if it is doomed to never be 100% correct in the end.

kpamnany pushed a commit to RelationalAI/julia that referenced this pull request May 29, 2025
Makes more types survive `jl_static_show` unambiguously:
- Symbols
- Symbols printed in the `:var"foo"` form use raw string escaping,
fixing `:var"a\b"`, `:var"a\\"`, `:var"$a"`, etc.
  - Symbols that require parens use parens (`:(=)`, ...)
- Signed integers: Except for `Int`, signed integers print like
`Int8(1)`.
- Floats: floats are printed in a naive but reversible (TODO: double
check) way. `Inf(16|32|)` and `NaN(16|32|)` are printed, and
`Float16`/`Float32` print the type (`Float32(1.5)`). `Float64`s are
printed with a trailing `.0` if it is necessary to disambiguate from
`Int`.

Fixes JuliaLang#52677,
JuliaLang#58484 (comment),
JuliaLang#58484 (comment),
and the specific case mentioned in JuliaLang#58484. Improves the situation for
round-trip (inexhaustive list):
- Non-canonical NaNs
- BFloat16
- User-defined primitive types. This one is tricky, because they can
have a size different from any type we have literals for.
kpamnany pushed a commit to RelationalAI/julia that referenced this pull request May 29, 2025
Makes more types survive `jl_static_show` unambiguously:
- Symbols
- Symbols printed in the `:var"foo"` form use raw string escaping,
fixing `:var"a\b"`, `:var"a\\"`, `:var"$a"`, etc.
  - Symbols that require parens use parens (`:(=)`, ...)
- Signed integers: Except for `Int`, signed integers print like
`Int8(1)`.
- Floats: floats are printed in a naive but reversible (TODO: double
check) way. `Inf(16|32|)` and `NaN(16|32|)` are printed, and
`Float16`/`Float32` print the type (`Float32(1.5)`). `Float64`s are
printed with a trailing `.0` if it is necessary to disambiguate from
`Int`.

Fixes JuliaLang#52677,
JuliaLang#58484 (comment),
JuliaLang#58484 (comment),
and the specific case mentioned in JuliaLang#58484. Improves the situation for
round-trip (inexhaustive list):
- Non-canonical NaNs
- BFloat16
- User-defined primitive types. This one is tricky, because they can
have a size different from any type we have literals for.
kpamnany pushed a commit to RelationalAI/julia that referenced this pull request May 29, 2025
Makes more types survive `jl_static_show` unambiguously:
- Symbols
- Symbols printed in the `:var"foo"` form use raw string escaping,
fixing `:var"a\b"`, `:var"a\\"`, `:var"$a"`, etc.
  - Symbols that require parens use parens (`:(=)`, ...)
- Signed integers: Except for `Int`, signed integers print like
`Int8(1)`.
- Floats: floats are printed in a naive but reversible (TODO: double
check) way. `Inf(16|32|)` and `NaN(16|32|)` are printed, and
`Float16`/`Float32` print the type (`Float32(1.5)`). `Float64`s are
printed with a trailing `.0` if it is necessary to disambiguate from
`Int`.

Fixes JuliaLang#52677,
JuliaLang#58484 (comment),
JuliaLang#58484 (comment),
and the specific case mentioned in JuliaLang#58484. Improves the situation for
round-trip (inexhaustive list):
- Non-canonical NaNs
- BFloat16
- User-defined primitive types. This one is tricky, because they can
have a size different from any type we have literals for.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 1.10 Change should be backported to the 1.10 release backport 1.11 Change should be backported to release-1.11 backport 1.12 Change should be backported to release-1.12 bugfix This change fixes an existing bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

trace-compile precompile statements incorrect for non-Int signed integer types
6 participants