You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: text/0000-symbol-name-mangling-v2.md
+96-13Lines changed: 96 additions & 13 deletions
Original file line number
Diff line number
Diff line change
@@ -644,11 +644,15 @@ Why should we *not* do this?
644
644
645
645
The alternatives considered are:
646
646
647
-
- Keeping the current scheme. It does meet the minimum requirements after all. It also has pretty big downsides.
648
-
- Keeping the current scheme but cleaning it up by making the non-hash part more consistent and more expressive. Keep the hash part as a safeguard against symbol conflicts and the rest as something just for demangling. The downside of this is that the hash would still not be predictable, and symbols would get rather long if they should contain more human-readable information about generic arguments.
649
-
- Define a standardized pretty-printing format for things that end up as symbols, and then encode that via Punycode in order to meet the character set restrictions. This would be rather simple. Symbol names would remain somewhat human-readable (but not very, because all separators would be stripped out). But without some kind of additional compression, symbol names would become rather long.
650
-
- Use the scheme from the previous bullet point but apply the compression scheme described above. We could do this but it wouldn't really be less complex than the Itanium inspired scheme proposed above.
651
-
- Define a standardized pretty-printing format for things that end up as symbols, compress with zstd (specially trained for Rust symbols) and encode the result as base63. This is rather simple but loses all human-readability. It's unclear how well this would compress. It would pull the zstd specification into the mangling scheme specification, as well as the pre-training dictionary.
647
+
1. Keeping the current scheme. It does meet the minimum requirements after all. However, the general consensus seems to be that it confusing and leads to situations where people are unpleasantly surprised when they come across (demangled) symbol names in backtraces or profilers.
648
+
649
+
2. Keeping the current scheme but cleaning it up by making the non-hash part more consistent and more expressive. Keep the hash part as a safeguard against symbol conflicts and the rest as something just for demangling. The downside of this is that the hash would still not be predictable, and symbols would get rather long if they should contain more human-readable information about generic arguments.
650
+
651
+
2. Define a standardized pretty-printing format for things that end up as symbols, and then encode that via Punycode in order to meet the character set restrictions. This would be rather simple. Symbol names would remain somewhat human-readable (but not very, because all separators would be stripped out). But without some kind of additional compression, symbol names would become rather long.
652
+
653
+
3. Use the scheme from the previous bullet point but apply the compression scheme described above. We could do this but it wouldn't really be less complex than the Itanium inspired scheme proposed above.
654
+
655
+
4. Define a standardized pretty-printing format for things that end up as symbols, compress with zstd (specially trained for Rust symbols) and encode the result as base63. This is rather simple but loses all human-readability. It's unclear how well this would compress. It would pull the zstd specification into the mangling scheme specification, as well as the pre-trained dictionary.
652
656
653
657
The Itanium mangling (and by extension the scheme proposed here) could be considered somewhat arcane. But it is well-known from C++ and provides a good trade-off between readability, complexity, and length of generated symbols.
654
658
@@ -677,7 +681,7 @@ Itanium mangling).
677
681
678
682
# Appendix A - Suggested Demangling
679
683
680
-
This RFC suggests that names are demangling to a form that matches Rust syntax as it is used in source code and compiler error messages:
684
+
This RFC suggests that names are demangling to a form that matches Rust syntax as it is used in source code, compiler error messages and `rustdoc`:
681
685
682
686
- Path components should be separated by `::`.
683
687
@@ -687,14 +691,93 @@ This RFC suggests that names are demangling to a form that matches Rust syntax a
687
691
688
692
- The list of generic arguments should be demangled as `<T1, T2, T3>`.
689
693
690
-
- Identifiers and trait impl path roots can have a numeric disambiguator (the `<disambiguator>` production). The syntactic version of the numeric disambiguator maps to a numeric index. If the disambiguator is not present, this index is 0. If it is of the form `s_` then the index is 1. If it is of the form `s<hex-digit>_` then the index is `<hex-digit> + 2`. The suggested demangling of a disambiguator is `'<index>`. However, for better readability, these disambiguators should usually be omitted in the demangling altogether. Disambiguators with index zero can always emitted.
691
-
The exception here are closures. Since these do not have a name, the disambiguator is the only thing identifying them. The suggested demangling for closures is thus `{closure}'<index>`.
694
+
- Identifiers and trait impl path roots can have a numeric disambiguator (the `<disambiguator>` production). The syntactic version of the numeric disambiguator maps to a numeric index. If the disambiguator is not present, this index is 0. If it is of the form `s_` then the index is 1. If it is of the form `s<base-62-digit>_` then the index is `<base-62-digit> + 2`. The suggested demangling of a disambiguator is `[<index>]`. However, for better readability, these disambiguators should usually be omitted in the demangling altogether. Disambiguators with index zero can always emitted.
695
+
696
+
The exception here are closures. Since these do not have a name, the disambiguator is the only thing identifying them. The suggested demangling for closures is thus `{closure}[<index>]`.
692
697
698
+
- In a lossless demangling, identifiers from the value namespace should be marked with a `'` suffix in order to avoid conflicts with identifiers from the type namespace. In a user-facing demangling, where such conflicts are acceptable, the suffix can be omitted.
693
699
694
700
# Appendix B - Interesting Examples
695
701
696
-
TODO
697
-
- specializing impls
698
-
- impl Trait
699
-
- closure environment as a type parameter
700
-
- various examples of compression
702
+
We assume that all examples are defined in a crate named `mycrate[xxx]`.
703
+
704
+
705
+
### Free-standing Item
706
+
707
+
```rust
708
+
modfoo {
709
+
modbar {
710
+
fnbaz() {}
711
+
}
712
+
}
713
+
```
714
+
- unmangled: `mycrate::foo::bar::baz`
715
+
- mangled: `_RN3foo3bar3bazVE`
716
+
717
+
718
+
### Item Defined In Inherent Method
719
+
720
+
```rust
721
+
structFoo<T>(T);
722
+
723
+
impl<T> Foo<T> {
724
+
pubfnbar<U>(_:U) {
725
+
staticQUUX:u32=0;
726
+
// ...
727
+
}
728
+
}
729
+
```
730
+
- unmangled: `mycrate::Foo::bar::QUUX`
731
+
- mangled: `_RNNM11mycrate_xxx3FooE3barV4QUUXVE`
732
+
733
+
734
+
### Item Defined In Trait Method
735
+
736
+
```rust
737
+
structFoo<T>(T);
738
+
739
+
impl<T> CloneforFoo<T> {
740
+
fnclone<U>(_:U) {
741
+
staticQUUX:u32=0;
742
+
// ...
743
+
}
744
+
}
745
+
```
746
+
- unmangled: `<mycrate::Foo as std::clone::Clone>::clone::QUUX`
0 commit comments