Skip to content

Hex-encode defmt symbols to avoid compatibility issues. #878

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

Dirbaio
Copy link
Contributor

@Dirbaio Dirbaio commented Oct 28, 2024

Hex-encode defmt symbols to avoid compatibility issues.

defmt currently puts json as-is in symbol names, which can contain special
characters not normally found in symbol names like quotes ", braces {}
and spaces .

This can cause compatibility issues with language features or tools that don't
expect this. For example it breaks sym in asm!.

This is a breaking change of the wire format, so this PR increases the
format numbre. defmt-decoder is updated to be able to decode the new format,
while keeping ability to decode older formats.

  • decoder: improve version mismatch error message, don't mention probe-run.
  • Hex-encode defmt symbols to avoid compatibility issues.

@Dirbaio
Copy link
Contributor Author

Dirbaio commented Oct 28, 2024

  • the cargo-semver-check ci error seems bogus? it's from a previous already-merged PR
  • How to fix backcompat CI tests? it's expected that they fail, this is a wire format version bump.

@jonathanpallant
Copy link
Contributor

I've pinged the team internally so we can work out how to fix those tests. We're on version 4 of the wire format already, so we must have a process for bumping it.

@@ -38,7 +41,11 @@ pub(super) enum SymbolTag {

impl Symbol {
pub fn demangle(raw: &str) -> anyhow::Result<Self> {
serde_json::from_str(raw)
let mut raw = Cow::from(raw);
if let Some(s) = raw.strip_prefix("__defmt_") {
Copy link
Contributor

@jonathanpallant jonathanpallant Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This magic string exists twice (once here, once in the macro), and it doesn't say a lot about the encoding used.

  • Perhaps it should be __defmt_hex_, allowing for some future __defmt_base64_ or similar.
  • Is there somewhere it can be defined once and imported in the other place? If not, we should document the canonical form in a third place (like the book) and have the other two places refer to it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it should be _defmt_hex, allowing for some future _defmt_base64 or similar.

changed!

Is there somewhere it can be defined once and imported in the other place? If not, we should document the canonical form in a third place (like the book) and have the other two places refer to it.

there's no crate that's a shared dep of the macro and the decoder. I've added a section to the book.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's no crate that's a shared dep of the macro and the decoder. I've added a section to the book.

There is! It's the defmt-parser. I think it makes sense to put the magic string there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh.

defmt git:(main) cargo tree -p defmt-macros | grep parser
├── defmt-parser v0.3.4 (/Users/jonathan/Documents/knurling-rs/defmt/parser)defmt git:(main) cargo tree -p defmt-decoder | grep parser
├── defmt-parser v0.3.4 (/Users/jonathan/Documents/knurling-rs/defmt/parser)

defmt currently puts json as-is in symbol names, which can contain special
characters not normally found in symbol names like quotes `"`, braces `{}`
and spaces ` `.

This can cause compatibility issues with language features or tools that don't
expect this. For example it breaks `sym` in `asm!`.

This is a *breaking change* of the wire format, so this PR increases the
format numbre. `defmt-decoder` is updated to be able to decode the new format,
while keeping ability to decode older formats.
@Urhengulas Urhengulas added the breaking change fix / feature / improvement involves a breaking change and needs to wait until next minor version label Nov 5, 2024
@Urhengulas
Copy link
Member

  • the cargo-semver-check ci error seems bogus? it's from a previous already-merged PR

Yes, that's a known problem. I marked the PR as breaking change, which is not true, but skips the check.

* How to fix backcompat CI tests? it's expected that they fail, this is a wire format version bump.

Just bump the commit hash

const REVISION_UNDER_TEST: &str = "0e92d3a88aa472377b964979f522829d961d8986";
.

Copy link
Contributor

@jonathanpallant jonathanpallant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit, and one thought.

By hex encoding the symbols, you stop people from scanning the output of objdump -t and seeing the log strings, modules, etc.

Is there some other encoding which would solve the problem of invalid characters (", {, @, etc), but which retains the ability for a human to read the text? I'm thinking of something like C trigraphs perhaps. Or (shudder) C++/Rust symbol mangling.

- The double-quote character `'"'` causes issues with escaping if you use it with `sym` inside an `asm!()` call.

To deal with this, as of Wire Format Version 5, strings are encoded to bytes as UTF-8, and then the bytes are hex-encoded.
The symbol is prefixed with `__defmt_hex_` to denote it's is hex-encoded, and to allow for future expansion.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The symbol is prefixed with `__defmt_hex_` to denote it's is hex-encoded, and to allow for future expansion.
The symbol is prefixed with `__defmt_hex_` to denote that it's hex-encoded, and to allow for future expansion.

@jonathanpallant
Copy link
Contributor

jonathanpallant commented Nov 14, 2024

We should probably fix the input section name at the same time (i.e. the link_section = "...."). Currently this is JSON, unless you are on macOS when we hex encode it. We should probably always hex encode it, or hash it, or use the disambiguator. I think we do still need to put somewhere there (i.e. don't throw them all into .defmt.error et al) so that the garbage collection can throw out symbols that aren't referenced (which happens at the input section level).

@jonathanpallant
Copy link
Contributor

@Dirbaio did you have any thoughts on whether there's a more readable encoding than plain hex encoding here?

@jonathanpallant
Copy link
Contributor

I gave it some thought. What about mangling::mangle

main.rs:

use std::io::Read;

fn main() {
    let mut stdin = std::io::stdin();
    let mut buffer = Vec::new();
    stdin.read_to_end(&mut buffer).expect("reading stdin");
    let result = mangling::mangle(buffer);
    println!("{}", result);
}
mangler git:(master) ✗ echo '{ name: "Smith" }' | cargo run
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.00s
     Running `target/debug/mangler`
_02_7b204name03_3a20225Smith04_22207d0a

@Urhengulas Urhengulas added the pr waits on: author Pull Request requires changes from the author label Jan 8, 2025
@jonathanpallant jonathanpallant added this to the Wire Format Version 5 milestone Mar 5, 2025
@chrysn
Copy link
Contributor

chrysn commented Jun 3, 2025

I just tried out this branch exploring how well defmt can be Just Another Dependency that libraries can impl trait for without too much feature gating ado. My go-to test for "will anything break" is having Python library crates compile for various platforms, and so far, adding defmt populates the content of -Wl,--version-script=/tmp/…/list with data that trips up a cc based linker. With this branch, it still trips at the same stage, but later in the file, where it says _defmt_encoding_ = rzcobs (or any of the other equals signs). So, would it make sense to apply the same encoding there? (It's a new wire format anyway, I'd guess that the absence of a _defmt_version_ = %d encoded wire format indicator works just as well to trip up old tools).


Unrelatedly: If we want something else than hex (IMO, why not hex), going the shuddering route of Rust mangling has the advantage that users who pipe their symbol names through a demangler will also see fully demangled defmt data.

@jonathanpallant
Copy link
Contributor

Yes, good point, we'll need to Mangle All The Things.

pipe their symbol names through a demangler will also see fully demangled defmt data

I'm mentally uprating my idea from "bizarre" to "huh, actually..."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change fix / feature / improvement involves a breaking change and needs to wait until next minor version pr waits on: author Pull Request requires changes from the author
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants