Skip to content

Should the doc comment string literal be raw? #502

@lmmx

Description

@lmmx

I have been scratching my head over seeing something working in a part of a codebase that I thought I had shown in my unit test to not be possible, and I think it comes down to the interplay of 3 sources of tokens (quote, proc-macro2, and rustc).

I realised what might be going on when I read this GitHub issue on syn dtolnay/syn#771

syn user issue report:

master/src/attr.rs#L119-L150
The above code tells me that doc comments like /// foobar are transformed into #[doc = r"foobar"].

However, The Rust Reference says:

/// Foo turns into #[doc="Foo"] and /** Bar */ turns into #[doc="Bar"].

Why does this crate parse doc comments as raw string, unlike the rust compiler?

dtolnay:

The reference is wrong. ;)

If the reference is indeed wrong, it seems like proc-macro2 is also wrong?

  • edit: I guess this was meant more in the sense that "the reference is not accurate to the implementation of the Rust compiler" - so I should clarify this more as: "it seems like proc-macro2 does not conform to the Rust compiler's behaviour"

For instance in #242 the following debug output of a doc comment is shown:

Attribute {
    pound_token: Pound,
    style: Outer,
    bracket_token: Bracket,
    path: Path {
        leading_colon: None,
        segments: [
            PathSegment {
                ident: Ident(
                    doc,
                ),
                arguments: None,
            },
        ],
    },
    tokens: TokenStream [
        Punct {
            op: '=',
            spacing: Alone,
        },
        Literal {
            lit: " foo ", // <------------ (*)
        },
    ],

This appears to be a regular string (i.e. it is " foo " not r" foo ")

        Literal {
            lit: " foo ", // <------------ (*)
        },

This seems to explain the discrepancy with what I see in the debug output from my unit test (where I use a quote! value which produces a r"" raw string literal):

tests::test_struct_with_field_doc_comments
The attr is ... Any([Ident { sym: doc }, Punct { char: '=', spacing: Alone }, Literal { lit: r" The user's unique identifier", span: bytes(1..33) }])) }
  • The lit has a r"..." string literal value.
  • This was produced from quote! { ... } with a /// The user's unique identifier inside a struct (doc comment on a struct field)

My intuition was that this was an error in the implementation of quote!, but as pointed out in that syn issue, this matches the behaviour of the Rust compiler (playground)

Update

I debugged my issue more deeply and confirmed I am looking at a distinction between r"" and "" type strings from #[doc = ...] attributes, and I can see that they become different 'variants' (not sure what to call them) of the TokenTree::Literal type in the TokenStream. This is what appears to be causing my issues:

Macro usage:

  Raw TokenStream: TokenStream [Literal { kind: Str, symbol: " I have a docstring", suffix: None, span: #0 bytes(381..403) }]
  lit_content type: proc_macro2::TokenStream
    Token 0 type: proc_macro2::TokenTree
    Token 0 debug: Literal { kind: Str, symbol: " I have a docstring", suffix: None, span: #0 bytes(381..403) }

Unit test usage:

  Raw TokenStream: TokenStream [Literal { lit: r" Hello world", span: bytes(1..16) }]
  lit_content type: proc_macro2::TokenStream
    Token 0 type: proc_macro2::TokenTree
    Token 0 debug: Literal { lit: r" Hello world", span: bytes(1..16) }

I can get my unit test to match the real world by artificially writing the docstring as a #[doc = "..."] attribute (and that's how I'm going to resolve this), but I felt like I should at least pass on word of this issue upstream 🙂

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions