|
| 1 | +--- |
| 2 | +layout: post |
| 3 | +title: "Procedural Macros in Rust 2018" |
| 4 | +author: Alex Crichton |
| 5 | +--- |
| 6 | + |
| 7 | +Perhaps my favorite feature in the Rust 2018 edition is [procedural macros]. |
| 8 | +Procedural macros have had a long and storied history in Rust (and will continue |
| 9 | +to have a storied future!), and now is perhaps one of the best times to get |
| 10 | +involved with them because the 2018 edition has so dramatically improved the |
| 11 | +experience both defining and using them. |
| 12 | + |
| 13 | +Here I'd like to explore what procedural macros are, what they're capable of, |
| 14 | +notable new features, and some fun use cases of procedural macros. I might even |
| 15 | +convince you that this is Rust 2018's best feature as well! |
| 16 | + |
| 17 | +### What is a procedural macro? |
| 18 | + |
| 19 | +First defined over two years ago in [RFC 1566], procedural macros are, in |
| 20 | +layman's terms, a function that takes a piece of syntax at compile time and |
| 21 | +produces a new bit of syntax. Procedural macros in Rust 2018 come in one of |
| 22 | +three flavors: |
| 23 | + |
| 24 | +* **`#[derive]` mode macros** have actually been stable since [Rust 1.15] |
| 25 | + and bring all the goodness and ease of use of `#[derive(Debug)]` to |
| 26 | + user-defined traits as well, such as [Serde]'s `#[derive(Deserialize)]`. |
| 27 | + |
| 28 | +* **Function-like macros** are newly stable to the 2018 edition and allow |
| 29 | + defining macros like `env!("FOO")` or `format_args!("...")` in a |
| 30 | + crate.io-based library. You can think of these as sort of "`macro_rules!` |
| 31 | + macros" on steroids. |
| 32 | + |
| 33 | +* **Attribute macros**, my favorite, are also new in the 2018 edition |
| 34 | + and allow you to provide lightweight annotations on Rust functions which |
| 35 | + perform syntactical transformations over the code at compile time. |
| 36 | + |
| 37 | +Each of these flavors of macros can be defined in a crate with `proc-macro = |
| 38 | +true` [specified in its manifest][manifest]. When used, a procedural macro is |
| 39 | +loaded by the Rust compiler and executed as the invocation is expanded. This |
| 40 | +means that Cargo's in control of versioning for procedural macros and you can |
| 41 | +use them with all same ease of use you'd expect from other Cargo dependencies! |
| 42 | + |
| 43 | +### Defining a procedural macro |
| 44 | + |
| 45 | +Each of the three types of procedural macros are [defined in a slightly different |
| 46 | +fashion][proc-ref], and here we'll single out attribute macros. First we'll flag |
| 47 | +`Cargo.toml`: |
| 48 | + |
| 49 | +```toml |
| 50 | +[lib] |
| 51 | +proc-macro = true |
| 52 | +``` |
| 53 | + |
| 54 | +and then in `src/lib.rs` we can write our macro: |
| 55 | + |
| 56 | +```rust |
| 57 | +extern crate proc_macro; |
| 58 | +use proc_macro::TokenStream; |
| 59 | + |
| 60 | +#[proc_macro_attribute] |
| 61 | +pub fn hello(attr: TokenStream, item: TokenStream) -> TokenStream { |
| 62 | + // ... |
| 63 | +} |
| 64 | +``` |
| 65 | + |
| 66 | +We can then write some unit tests in `tests/smoke.rs`: |
| 67 | + |
| 68 | +```rust |
| 69 | +#[my_crate::hello] |
| 70 | +fn wrapped_function() {} |
| 71 | + |
| 72 | +#[test] |
| 73 | +fn works() { |
| 74 | + wrapped_function(); |
| 75 | +} |
| 76 | +``` |
| 77 | + |
| 78 | +... and that's it! When we execute `cargo test` Cargo will compile our |
| 79 | +procedural macro. Afterwards it will compile our unit test which loads the macro |
| 80 | +at compile time, executing the `hello` function and compiling the resulting |
| 81 | +syntax. |
| 82 | + |
| 83 | +Right off the bat we can see a few important properties of procedural macros: |
| 84 | + |
| 85 | +* The input/output is this fancy `TokenStream` type we'll talk about more in a |
| 86 | + bit |
| 87 | +* We're *executing aribtrary code* at compile time, which means we can do just |
| 88 | + about anything! |
| 89 | +* Procedural macros are incorporated with the module system, meaning no more |
| 90 | + they can be imported just like any other name. |
| 91 | + |
| 92 | +Before we take a look at implementing a procedural macro, let's first dive into |
| 93 | +some of these points. |
| 94 | + |
| 95 | +### Macros and the module system |
| 96 | + |
| 97 | +First stabilized in [Rust 1.30] \(noticing a trend with 1.15?\) macros are now |
| 98 | +integrated with the module system in Rust. This mainly means that you no longer |
| 99 | +need the clunky `#[macro_use]` attribute when importing macros! Instead of this: |
| 100 | + |
| 101 | +```rust |
| 102 | +#[macro_use] |
| 103 | +extern crate log; |
| 104 | + |
| 105 | +fn main() { |
| 106 | + debug!("hello, "); |
| 107 | + info!("world!"); |
| 108 | +} |
| 109 | +``` |
| 110 | + |
| 111 | +you can do: |
| 112 | + |
| 113 | +```rust |
| 114 | +use log::info; |
| 115 | + |
| 116 | +fn main() { |
| 117 | + log::debug!("hello, "); |
| 118 | + info!("world!"); |
| 119 | +} |
| 120 | +``` |
| 121 | + |
| 122 | +Integration with the module system solves one of the most confusing parts about |
| 123 | +macros historically. They're now imported and namespaced just as you would any |
| 124 | +other item in Rust! |
| 125 | + |
| 126 | +The benefits are not only limited to bang-style `macro_rules` macros, as you can |
| 127 | +now transform code that looks like this: |
| 128 | + |
| 129 | +```rust |
| 130 | +#[macro_use] |
| 131 | +extern crate serde_derive; |
| 132 | + |
| 133 | +#[derive(Deserialize)] |
| 134 | +struct Foo { |
| 135 | + // ... |
| 136 | +} |
| 137 | +``` |
| 138 | + |
| 139 | +into |
| 140 | + |
| 141 | +```rust |
| 142 | +use serde::Deserialize; |
| 143 | + |
| 144 | +#[derive(Deserialize)] |
| 145 | +struct Foo { |
| 146 | + // ... |
| 147 | +} |
| 148 | +``` |
| 149 | + |
| 150 | +and you don't even need to explicitly depend on `serde_derive` in `Cargo.toml`! |
| 151 | +All you need is: |
| 152 | + |
| 153 | +```toml |
| 154 | +[dependencies] |
| 155 | +serde = { version = '1.0.82', features = ['derive'] } |
| 156 | +``` |
| 157 | + |
| 158 | +### What's inside a `TokenStream`? |
| 159 | + |
| 160 | +This mysterious `TokenStream` type comes from the [compiler-provided |
| 161 | +`proc_macro` crate][pm]. When it was first added all you could do with a |
| 162 | +[`TokenStream`] was call convert it to or from a string using `to_string()` or `parse()`. |
| 163 | +As of Rust 2018, you can act on the tokens in a [`TokenStream`] directly. |
| 164 | + |
| 165 | +A [`TokenStream`] is effectively "just" an iterator over [`TokenTree`]. All |
| 166 | +syntax in Rust falls into one of these four categories, the four variants of |
| 167 | +[`TokenTree`]: |
| 168 | + |
| 169 | +* `Ident` is any identifier like `foo` or `bar`. This also contains keywords |
| 170 | + such as `self` and `super`. |
| 171 | +* `Literal` include things like `1`, `"foo"`, and `'b'`. All literals are one |
| 172 | + token and represent constant values in a program. |
| 173 | +* `Punct` represents some form of punctuation that's not a delimiter. For |
| 174 | + example `.` is a `Punct` token in the field access of `foo.bar`. |
| 175 | + Multi-character punctuation like `=>` is represented as two `Punct` tokens, |
| 176 | + one for `=` and one for `>`, and the `Spacing` enum says that the `=` is |
| 177 | + adjacent to the `>`. |
| 178 | +* `Group` is where the term "tree" is most relevant, as `Group` represents a |
| 179 | + delimited sub-token-stream. For example `(a, b)` is a `Group` with parentheses |
| 180 | + as delimiters, and the internal token stream is `a, b`. |
| 181 | + |
| 182 | +While this is conceptually simple, this may sound like there's not much we can |
| 183 | +do with this! It's unclear, for example, how we might parse a function from a |
| 184 | +`TokenStream`. The minimality of `TokenTree` is crucial, however, for |
| 185 | +stabilization. It would be infeasible to stabilize the Rust AST because that |
| 186 | +means we could never change it. (imagine if we couldn't have added the `?` |
| 187 | +operator!) |
| 188 | + |
| 189 | +By using `TokenStream` to communicate with procedural macros, the compiler is |
| 190 | +able to add new language syntax while also being able to compile |
| 191 | +and work with older procedural macros. Let's see now, though, how we can |
| 192 | +actually get useful information out of a `TokenStream`. |
| 193 | + |
| 194 | +### Parsing a `TokenStream` |
| 195 | + |
| 196 | +If `TokenStream` is just a simple iterator, then we've got a long way to go from |
| 197 | +that to an actual parsed function. Although the code is already lexed for us |
| 198 | +we still need to write a whole Rust parser! Thankfully though the community has |
| 199 | +been hard at work to make sure writing procedural macros in Rust is as smooth as |
| 200 | +can be, so you need look no further than the [`syn` crate][syn]. |
| 201 | + |
| 202 | +With the [`syn`][syn] crate we can parse any Rust AST as a one-liner: |
| 203 | + |
| 204 | +```rust |
| 205 | +#[proc_macro_attribute] |
| 206 | +pub fn hello(attr: TokenStream, item: TokenStream) -> TokenStream { |
| 207 | + let input = syn::parse_macro_input!(item as syn::ItemFn); |
| 208 | + let name = &input.ident; |
| 209 | + let abi = &input.abi; |
| 210 | + // ... |
| 211 | +} |
| 212 | +``` |
| 213 | + |
| 214 | +The [`syn`][syn] crate not only comes with the ability to parse built-in syntax |
| 215 | +but you can also easily write a recursive descent parser for your own syntax. |
| 216 | +The [`syn::parse` module][spm] has more information about this capability. |
| 217 | + |
| 218 | +### Producing a `TokenStream` |
| 219 | + |
| 220 | +Not only do we take a `TokenStream` as input with a procedural macro, but we |
| 221 | +also need to produce a `TokenStream` as output. This output is typically |
| 222 | +required to be valid Rust syntax, but like the input it's just list of tokens |
| 223 | +that we need to build somehow. |
| 224 | + |
| 225 | +Technically the only way to create a `TokenStream` is via its `FromIterator` |
| 226 | +implementation, which means we'd have to create each token one-by-one and |
| 227 | +collect it into a `TokenStream`. This is quite tedious, though, so let's take a |
| 228 | +look at [`syn`][syn]'s sibling crate: [`quote`]. |
| 229 | + |
| 230 | +The [`quote`] crate is a quasi-quoting implementation for Rust which primarily |
| 231 | +provides a convenient macro for us to use: |
| 232 | + |
| 233 | +```rust |
| 234 | +use quote::quote; |
| 235 | + |
| 236 | +#[proc_macro_attribute] |
| 237 | +pub fn hello(attr: TokenStream, item: TokenStream) -> TokenStream { |
| 238 | + let input = syn::parse_macro_input!(item as syn::ItemFn); |
| 239 | + let name = &input.ident; |
| 240 | + |
| 241 | + // Our input function is always equivalent to returning 42, right? |
| 242 | + let result = quote! { |
| 243 | + fn #name() -> u32 { 42 } |
| 244 | + }; |
| 245 | + result.into() |
| 246 | +} |
| 247 | +``` |
| 248 | + |
| 249 | +The [`quote!` macro] allows you to write mostly-Rust syntax and interpolate |
| 250 | +variables quickly from the environment with `#foo`. This removes much of the |
| 251 | +tedium of creating a `TokenStream` token-by-token and allows quickly cobbling |
| 252 | +together various pieces of syntax into one return value. |
| 253 | + |
| 254 | +### Tokens and `Span` |
| 255 | + |
| 256 | +Perhaps the greatest feature of procedural macros in Rust 2018 is the ability to |
| 257 | +customize and use [`Span`] information on each token, giving us the ability for |
| 258 | +amazing syntactical error messages from procedural macros: |
| 259 | + |
| 260 | +``` |
| 261 | +error: expected `fn` |
| 262 | + --> src/main.rs:3:14 |
| 263 | + | |
| 264 | +3 | my_annotate!(not_fn foo() {}); |
| 265 | + | ^^^^^^ |
| 266 | +``` |
| 267 | + |
| 268 | +as well as completely custom error messages: |
| 269 | + |
| 270 | +``` |
| 271 | +error: imported methods must have at least one argument |
| 272 | + --> invalid-imports.rs:12:5 |
| 273 | + | |
| 274 | +12 | fn f1(); |
| 275 | + | ^^^^^^^^ |
| 276 | +``` |
| 277 | + |
| 278 | +A [`Span`] can be thought of as a pointer back into an original source file, |
| 279 | +typically saying something like "the `Ident` token` foo` came from file |
| 280 | +`bar.rs`, line 4, column 5, and was 3 bytes long". This information is |
| 281 | +primarily used by the compiler's diagnostics with warnings and error messages. |
| 282 | + |
| 283 | +In Rust 2018 each [`TokenTree`] has a [`Span`] associated with it. This means that |
| 284 | +if you preserve the [`Span`] of all input tokens into the output then even |
| 285 | +though you're producing brand new syntax the compiler's error messages are still |
| 286 | +accurate! |
| 287 | + |
| 288 | +For example, a small macro like: |
| 289 | + |
| 290 | +```rust |
| 291 | +#[proc_macro] |
| 292 | +pub fn make_pub(item: TokenStream) -> TokenStream { |
| 293 | + let result = quote! { |
| 294 | + pub #item |
| 295 | + }; |
| 296 | + result.into() |
| 297 | +} |
| 298 | +``` |
| 299 | + |
| 300 | +when invoked as: |
| 301 | + |
| 302 | +```rust |
| 303 | +my_macro::make_pub! { |
| 304 | + static X: u32 = "foo"; |
| 305 | +} |
| 306 | +``` |
| 307 | + |
| 308 | +is invalid because we're returning a string from a function that should return a |
| 309 | +`u32`, and the compiler will helpfully diagnose the problem as: |
| 310 | + |
| 311 | +``` |
| 312 | +error[E0308]: mismatched types |
| 313 | + --> src/main.rs:1:37 |
| 314 | + | |
| 315 | +1 | my_macro::make_pub!(static X: u32 = "foo"); |
| 316 | + | ^^^^^ expected u32, found reference |
| 317 | + | |
| 318 | + = note: expected type `u32` |
| 319 | + found type `&'static str` |
| 320 | +
|
| 321 | +error: aborting due to previous error |
| 322 | +
|
| 323 | +``` |
| 324 | + |
| 325 | +And we can see here that although we're generating brand new syntax, the |
| 326 | +compiler can preserve span information to continue to provide targeted |
| 327 | +diagnostics about code that we've written. |
| 328 | + |
| 329 | +### Procedural Macros in the Wild |
| 330 | + |
| 331 | +Ok up to this point we've got a pretty good idea about what procedural macros |
| 332 | +can do and the various capabilities they have in the 2018 edition. As such a |
| 333 | +long-awaited feature, the ecosystem is already making use of these new |
| 334 | +capabilities! If you're interested, some projects to keep your eyes on are: |
| 335 | + |
| 336 | +* [`syn`][syn], [`quote`], and [`proc-macro2`] are your go-to libraries for |
| 337 | + writing procedural macros. They make it easy to define custom parsers, parse |
| 338 | + existing syntax, create new syntax, work with older versions of Rust, and much |
| 339 | + more! |
| 340 | + |
| 341 | +* [Serde] and its derive macros for `Serialize` and `Deserialize` are likely the |
| 342 | + most used macros in the ecosystem. They sport an [impressive amount of |
| 343 | + configuration][serde-attr] and are a great example of how small annotations |
| 344 | + can be so powerful. |
| 345 | + |
| 346 | +* The [`wasm-bindgen` project][wbg] uses attribute macros to easily define |
| 347 | + interfaces in Rust and import interfaces from JS. The `#[wasm_bindgen]` |
| 348 | + lightweight annotation makes it easy to understand what's coming in and out, |
| 349 | + as well as removing lots of conversion boilerplate. |
| 350 | + |
| 351 | +* The [`gobject_gen!` macro][gnome-class] is an experimental IDL for the GNOME |
| 352 | + project to define GObject objects safely in Rust, eschewing manually writing |
| 353 | + all the glue necessary to talk to C and interface with other GObject |
| 354 | + instances in Rust. |
| 355 | + |
| 356 | +* The [Rocket framework][rocket] has recently switched over to procedural |
| 357 | + macros, and showcases some of nightly-only features of procedural macros like |
| 358 | + custom diagnostics, custom span creation, and more. Expect to see these |
| 359 | + features stabilize in 2019! |
| 360 | + |
| 361 | +That's just a *taste* of the power of procedural macros and some example usage |
| 362 | +throughout the ecosystem today. We're only 6 weeks out from the original release |
| 363 | +of procedural macros on stable, so we've surely only scratched the surface as |
| 364 | +well! I'm really excited to see where we can take Rust with procedural macros by |
| 365 | +empowering all kinds of lightweight additions and extensions to the language! |
| 366 | + |
| 367 | +[procedural macros]: https://doc.rust-lang.org/reference/procedural-macros.html |
| 368 | +[RFC 1566]: https://github.com/rust-lang/rfcs/blob/master/text/1566-proc-macros.md |
| 369 | +[Rust 1.15]: https://blog.rust-lang.org/2017/02/02/Rust-1.15.html |
| 370 | +[Serde]: https://serde.rs |
| 371 | +[manifest]: https://doc.rust-lang.org/cargo/reference/manifest.html |
| 372 | +[proc-ref]: https://doc.rust-lang.org/stable/reference/procedural-macros.html |
| 373 | +[pm]: https://doc.rust-lang.org/proc_macro/ |
| 374 | +[`TokenStream`]: https://doc.rust-lang.org/stable/proc_macro/struct.TokenStream.html |
| 375 | +[`TokenTree`]: https://doc.rust-lang.org/stable/proc_macro/enum.TokenTree.html |
| 376 | +[Rust 1.30]: https://blog.rust-lang.org/2018/10/25/Rust-1.30.0.html |
| 377 | +[syn]: https://crates.io/crates/syn |
| 378 | +[spm]: https://docs.rs/syn/0.15/syn/parse/index.html |
| 379 | +[`quote`]: https://docs.rs/quote/0.6/quote/ |
| 380 | +[`quote!` macro]: https://docs.rs/quote/0.6/quote/macro.quote.html |
| 381 | +[`Span`]: https://doc.rust-lang.org/proc_macro/struct.Span.html |
| 382 | +[`proc-macro2`]: https://docs.rs/proc-macro2/0.4/proc_macro2/ |
| 383 | +[serde-attr]: https://serde.rs/attributes.html |
| 384 | +[wbg]: https://github.com/rustwasm/wasm-bindgen |
| 385 | +[gnome-class]: https://gitlab.gnome.org/federico/gnome-class |
| 386 | +[rocket]: https://rocket.rs/ |
0 commit comments