Make more types jl_static_show readably #58512

xal-0 · 2025-05-23T19:39:47Z

Makes more types survive jl_static_show:

Symbols
- Symbols printed in the :var"foo" form use raw string escaping, fixing :var"a\b", :var"a\\", :var"$a", etc.
- Symbols that require parens use parens (:(=), ...)
Signed integers: Except for Int, signed integers print like Int8(1).
Floats: floats are printed in a naive but reversible (TODO: double check) way. Inf(16|32|) and NaN(16|32|) are printed, and Float16/Float32 print the type (Float32(1.5)). Float64s are printed with a trailing .0 if it is necessary to disambiguate from Int.

Fixes #52677, #58484 (comment), #58484 (comment), and the specific case mentioned in #58484. Improves the situation for #38902 but does not close it, because a few cases still do not round-trip (inexhaustive list):

Non-canonical NaNs
BFloat16
User-defined primitive types. This one is tricky, because they can have a size different from any type we have literals for.
Without don't print field names in static_show of structs #40652 or something similar, precompile statements will still fail with field names.

KristofferC · 2025-05-23T19:58:15Z

For #40652, can we thread something through to not show the field name in precompile printing contexts? Seems unnecessary to have the correctness of the printing be bike shredded on how people want it to print in the debugger.

nickrobinson251 · 2025-05-23T20:11:12Z

Fixes ... #58484 (comment),

Please can you add tests for Val(Inf) (and maybe Val(-Inf) and Inf and -Inf)?

src/rtutils.c

topolarity · 2025-05-27T18:37:38Z

It might be worth printing these floating-point numbers as hexadecimal literals (specifically in a type context):

julia> 0x1p0
1.0

julia> 0x1.8p3
12.0

julia> x = 0x.4p-1
0.125

These can be easy to parse / print bit-accurately, if we have trouble finding floating point <-> decimal conversion routines that we trust to be digit-/bit-accurate.

Of course, if we trust the decimal conversion on all platforms, that's just as good and more human-readable.

n backslashes followed by " => 2n+1 backslashes and " n backslahes at the end => 2n backslashes

xal-0 · 2025-05-28T18:00:57Z

Unfortunately we have no hex Float32 literals, and I would like to preserve readability. That's also why I prefer %.17g (with trailing .0 when required) over %.17e: 123.0 is nicer than 1.23000000000000000e+02, 0.5 vs 5.00000000000000000e-01, etc. It's not as nice as Ryu (we still print 0.10000000000000001 instead of 0.1), but it's the best we can do with libc.

src/rtutils.c

Co-authored-by: Jameson Nash <vtjnash@gmail.com>

topolarity · 2025-05-28T18:32:12Z

Unfortunately we have no hex Float32 literals

No, but it losslessly casts to Float32 in the same way that Int64 -> Int32 does

That said, I'm all in favor of the readability

topolarity · 2025-05-29T14:40:33Z

src/rtutils.c

@@ -985,11 +1069,14 @@ static size_t jl_static_show_x_(JL_STREAM *out, jl_value_t *v, jl_datatype_t *vt
        n += jl_printf(out, "0x%08" PRIx32, *(uint32_t*)v);


Can you fix-up Pointer types as well?

julia> ccall(:jl_, Cvoid, (Any,), Val(Ptr{Cvoid}(1))) Base.Val{0x0000000000000001}()

It's a little awkward, because their repr doesn't round-trip either:

julia> Val(Ptr{Cvoid}(1)) Val{Ptr{Nothing} @0x0000000000000001}() julia> Val{Ptr{Nothing} @0x0000000000000001}() ERROR: ParseError: # Error @ REPL[10]:1:18 Val{Ptr{Nothing} @0x0000000000000001}() # └─────────────────┘ ── Expected `}`

but regardless of whether it parses properly, I don't think we should be throwing out the type information

We should also consider fixing the jl_tvar_type case also. Right now it only prints the name after checking that the parent UnionAll is present, but we should also check that that name is unique, and if not, to find some way to make it unique in printing.

topolarity · 2025-05-29T15:22:52Z

src/rtutils.c

+    else if (vt == jl_long_type) {
+        // Avoid unnecessary Int64(x)/Int32(x)
+        n += jl_printf(out, "%" PRIdPTR, *(intptr_t*)v);
+    }
    else if (vt == jl_int64_type) {


Looks like negative Int128 values are still problematic:

julia> ccall(:jl_, Cvoid, (Any,), typemin(Int128)) Int128(0x80000000000000000000000000000000) julia> ccall(:jl_, Cvoid, (Any,), Int128(-1)) Int128(0xffffffffffffffffffffffffffffffff)

And UInt128 prints with an explicit conversion it doesn't need:

julia> ccall(:jl_, Cvoid, (Any,), UInt128(0)) UInt128(0x00000000000000000000000000000000) julia> UInt128(0x00000000000000000000000000000000) === 0x00000000000000000000000000000000 true

It isn't problematic, since that Int128() printing means a bitcast, not a function call

Sure - we can fix this as part of the bit-casting fix-ups, but regardless it doesn't round-trip:

julia> Int128(0x80000000000000000000000000000000) ERROR: InexactError: convert(Int128, 0x80000000000000000000000000000000)

It works just fine: you're just making the mistake of calling eval, when the semantics of this printer have never been to be usable with eval, but rather is a representation for reinterpret.

julia> reinterpret(Int128, 0x80000000000000000000000000000000) -170141183460469231731687303715884105728

Why are we interested in an "almost" eval-able human-readable serialization format?

The only trade-off that I can understand is that it's a bit more compact than a proper repr, but more importantly it needs a custom de-serializer which afaict isn't implemented

Since we need a programmatic round-trip, it's not enough IMO to appeal to a de-serializer that "should" be used if it doesn't actually exist

Because it is human readable, and thus a useful debugging tool. Someone should implement a de-serializer if they want to use this for generating precompile statements. Just blinding executing code will never give the correct answers anyways, so these little gotcha are just reminders that using eval is a dead-end approach and not fixable.

This is a disingenuous argument @vtjnash -- nobody reads this output for debugging (and we've tried); on the contrary, it is actively used for generating precompile statements. It would be better to remove all the "little gotchas".

Just blinding executing code will never give the correct answers anyways, so these little gotcha are just reminders that using eval is a dead-end approach and not fixable.

I feel like this is lacking justification - all of the recent serialization problems we've enumerated so far (floating point, integer types, bfloat, restricted inner constructors, symbol escaping, etc.) are basic problems around making sure that we print the correct literal (or the correct literal + a direct, loss-less conversion to a built-in datatype). Of course, the other major issue (still unsolved) is around custom constructors making a reinterpret required to round-trip, but the point stands that most of those are direct fixes to the de-parse (print "1.0" not "1") and not problems inherent to eval.

I'm very open to more fundamental arguments about why eval is broken for this, but the only caveat that I'm aware of is that eval may not map type names (Vector, String, Foo) correctly depending on the evaluation context, or that these may have a different meaning (esp. anonymized typenames like #foo##2) in different executions.

It is true that Serialization does this better by using the PkgId / UUID, but is that the only fundamental thing that we gain by moving away from eval? Is there a more fundamental reason why eval cannot be made to work here? Do we trust Serialization to do all of this correctly, or does that have outstanding correct-ness issues too?

Whenever there is an alternative serialization format to describe a set of methods to compile that is better I'm sure we will all happily switch to that. Until that exists it seems reasonable to do incremental improvements to the existing one, even if it is doomed to never be 100% correct in the end.

Makes more types survive `jl_static_show` unambiguously: - Symbols - Symbols printed in the `:var"foo"` form use raw string escaping, fixing `:var"a\b"`, `:var"a\\"`, `:var"$a"`, etc. - Symbols that require parens use parens (`:(=)`, ...) - Signed integers: Except for `Int`, signed integers print like `Int8(1)`. - Floats: floats are printed in a naive but reversible (TODO: double check) way. `Inf(16|32|)` and `NaN(16|32|)` are printed, and `Float16`/`Float32` print the type (`Float32(1.5)`). `Float64`s are printed with a trailing `.0` if it is necessary to disambiguate from `Int`. Fixes JuliaLang#52677, JuliaLang#58484 (comment), JuliaLang#58484 (comment), and the specific case mentioned in JuliaLang#58484. Improves the situation for round-trip (inexhaustive list): - Non-canonical NaNs - BFloat16 - User-defined primitive types. This one is tricky, because they can have a size different from any type we have literals for.

Make more types jl_static_show readably

b6996d2

xal-0 requested a review from topolarity May 23, 2025 19:39

xal-0 added the bugfix This change fixes an existing bug label May 23, 2025

xal-0 added 2 commits May 23, 2025 12:40

Typo fix

e73b687

Typo fix

02f54e4

Fix jl_static_show for negative floats with no fractional part

8a9e425

topolarity added backport 1.10 Change should be backported to the 1.10 release backport 1.11 Change should be backported to release-1.11 backport 1.12 Change should be backported to release-1.12 labels May 23, 2025

kpamnany mentioned this pull request May 23, 2025

Improvements to static-show RelationalAI/julia#239

Draft

3 tasks

vtjnash reviewed May 27, 2025

View reviewed changes