Skip to content

Commit 200c252

Browse files
authored
Merge pull request #348 from alexcrichton/pm2018
Procedural Macros in Rust 2018
2 parents 2a9c501 + 30ab85e commit 200c252

File tree

1 file changed

+386
-0
lines changed

1 file changed

+386
-0
lines changed
Lines changed: 386 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,386 @@
1+
---
2+
layout: post
3+
title: "Procedural Macros in Rust 2018"
4+
author: Alex Crichton
5+
---
6+
7+
Perhaps my favorite feature in the Rust 2018 edition is [procedural macros].
8+
Procedural macros have had a long and storied history in Rust (and will continue
9+
to have a storied future!), and now is perhaps one of the best times to get
10+
involved with them because the 2018 edition has so dramatically improved the
11+
experience both defining and using them.
12+
13+
Here I'd like to explore what procedural macros are, what they're capable of,
14+
notable new features, and some fun use cases of procedural macros. I might even
15+
convince you that this is Rust 2018's best feature as well!
16+
17+
### What is a procedural macro?
18+
19+
First defined over two years ago in [RFC 1566], procedural macros are, in
20+
layman's terms, a function that takes a piece of syntax at compile time and
21+
produces a new bit of syntax. Procedural macros in Rust 2018 come in one of
22+
three flavors:
23+
24+
* **`#[derive]` mode macros** have actually been stable since [Rust 1.15]
25+
and bring all the goodness and ease of use of `#[derive(Debug)]` to
26+
user-defined traits as well, such as [Serde]'s `#[derive(Deserialize)]`.
27+
28+
* **Function-like macros** are newly stable to the 2018 edition and allow
29+
defining macros like `env!("FOO")` or `format_args!("...")` in a
30+
crate.io-based library. You can think of these as sort of "`macro_rules!`
31+
macros" on steroids.
32+
33+
* **Attribute macros**, my favorite, are also new in the 2018 edition
34+
and allow you to provide lightweight annotations on Rust functions which
35+
perform syntactical transformations over the code at compile time.
36+
37+
Each of these flavors of macros can be defined in a crate with `proc-macro =
38+
true` [specified in its manifest][manifest]. When used, a procedural macro is
39+
loaded by the Rust compiler and executed as the invocation is expanded. This
40+
means that Cargo's in control of versioning for procedural macros and you can
41+
use them with all same ease of use you'd expect from other Cargo dependencies!
42+
43+
### Defining a procedural macro
44+
45+
Each of the three types of procedural macros are [defined in a slightly different
46+
fashion][proc-ref], and here we'll single out attribute macros. First we'll flag
47+
`Cargo.toml`:
48+
49+
```toml
50+
[lib]
51+
proc-macro = true
52+
```
53+
54+
and then in `src/lib.rs` we can write our macro:
55+
56+
```rust
57+
extern crate proc_macro;
58+
use proc_macro::TokenStream;
59+
60+
#[proc_macro_attribute]
61+
pub fn hello(attr: TokenStream, item: TokenStream) -> TokenStream {
62+
// ...
63+
}
64+
```
65+
66+
We can then write some unit tests in `tests/smoke.rs`:
67+
68+
```rust
69+
#[my_crate::hello]
70+
fn wrapped_function() {}
71+
72+
#[test]
73+
fn works() {
74+
wrapped_function();
75+
}
76+
```
77+
78+
... and that's it! When we execute `cargo test` Cargo will compile our
79+
procedural macro. Afterwards it will compile our unit test which loads the macro
80+
at compile time, executing the `hello` function and compiling the resulting
81+
syntax.
82+
83+
Right off the bat we can see a few important properties of procedural macros:
84+
85+
* The input/output is this fancy `TokenStream` type we'll talk about more in a
86+
bit
87+
* We're *executing aribtrary code* at compile time, which means we can do just
88+
about anything!
89+
* Procedural macros are incorporated with the module system, meaning no more
90+
they can be imported just like any other name.
91+
92+
Before we take a look at implementing a procedural macro, let's first dive into
93+
some of these points.
94+
95+
### Macros and the module system
96+
97+
First stabilized in [Rust 1.30] \(noticing a trend with 1.15?\) macros are now
98+
integrated with the module system in Rust. This mainly means that you no longer
99+
need the clunky `#[macro_use]` attribute when importing macros! Instead of this:
100+
101+
```rust
102+
#[macro_use]
103+
extern crate log;
104+
105+
fn main() {
106+
debug!("hello, ");
107+
info!("world!");
108+
}
109+
```
110+
111+
you can do:
112+
113+
```rust
114+
use log::info;
115+
116+
fn main() {
117+
log::debug!("hello, ");
118+
info!("world!");
119+
}
120+
```
121+
122+
Integration with the module system solves one of the most confusing parts about
123+
macros historically. They're now imported and namespaced just as you would any
124+
other item in Rust!
125+
126+
The benefits are not only limited to bang-style `macro_rules` macros, as you can
127+
now transform code that looks like this:
128+
129+
```rust
130+
#[macro_use]
131+
extern crate serde_derive;
132+
133+
#[derive(Deserialize)]
134+
struct Foo {
135+
// ...
136+
}
137+
```
138+
139+
into
140+
141+
```rust
142+
use serde::Deserialize;
143+
144+
#[derive(Deserialize)]
145+
struct Foo {
146+
// ...
147+
}
148+
```
149+
150+
and you don't even need to explicitly depend on `serde_derive` in `Cargo.toml`!
151+
All you need is:
152+
153+
```toml
154+
[dependencies]
155+
serde = { version = '1.0.82', features = ['derive'] }
156+
```
157+
158+
### What's inside a `TokenStream`?
159+
160+
This mysterious `TokenStream` type comes from the [compiler-provided
161+
`proc_macro` crate][pm]. When it was first added all you could do with a
162+
[`TokenStream`] was call convert it to or from a string using `to_string()` or `parse()`.
163+
As of Rust 2018, you can act on the tokens in a [`TokenStream`] directly.
164+
165+
A [`TokenStream`] is effectively "just" an iterator over [`TokenTree`]. All
166+
syntax in Rust falls into one of these four categories, the four variants of
167+
[`TokenTree`]:
168+
169+
* `Ident` is any identifier like `foo` or `bar`. This also contains keywords
170+
such as `self` and `super`.
171+
* `Literal` include things like `1`, `"foo"`, and `'b'`. All literals are one
172+
token and represent constant values in a program.
173+
* `Punct` represents some form of punctuation that's not a delimiter. For
174+
example `.` is a `Punct` token in the field access of `foo.bar`.
175+
Multi-character punctuation like `=>` is represented as two `Punct` tokens,
176+
one for `=` and one for `>`, and the `Spacing` enum says that the `=` is
177+
adjacent to the `>`.
178+
* `Group` is where the term "tree" is most relevant, as `Group` represents a
179+
delimited sub-token-stream. For example `(a, b)` is a `Group` with parentheses
180+
as delimiters, and the internal token stream is `a, b`.
181+
182+
While this is conceptually simple, this may sound like there's not much we can
183+
do with this! It's unclear, for example, how we might parse a function from a
184+
`TokenStream`. The minimality of `TokenTree` is crucial, however, for
185+
stabilization. It would be infeasible to stabilize the Rust AST because that
186+
means we could never change it. (imagine if we couldn't have added the `?`
187+
operator!)
188+
189+
By using `TokenStream` to communicate with procedural macros, the compiler is
190+
able to add new language syntax while also being able to compile
191+
and work with older procedural macros. Let's see now, though, how we can
192+
actually get useful information out of a `TokenStream`.
193+
194+
### Parsing a `TokenStream`
195+
196+
If `TokenStream` is just a simple iterator, then we've got a long way to go from
197+
that to an actual parsed function. Although the code is already lexed for us
198+
we still need to write a whole Rust parser! Thankfully though the community has
199+
been hard at work to make sure writing procedural macros in Rust is as smooth as
200+
can be, so you need look no further than the [`syn` crate][syn].
201+
202+
With the [`syn`][syn] crate we can parse any Rust AST as a one-liner:
203+
204+
```rust
205+
#[proc_macro_attribute]
206+
pub fn hello(attr: TokenStream, item: TokenStream) -> TokenStream {
207+
let input = syn::parse_macro_input!(item as syn::ItemFn);
208+
let name = &input.ident;
209+
let abi = &input.abi;
210+
// ...
211+
}
212+
```
213+
214+
The [`syn`][syn] crate not only comes with the ability to parse built-in syntax
215+
but you can also easily write a recursive descent parser for your own syntax.
216+
The [`syn::parse` module][spm] has more information about this capability.
217+
218+
### Producing a `TokenStream`
219+
220+
Not only do we take a `TokenStream` as input with a procedural macro, but we
221+
also need to produce a `TokenStream` as output. This output is typically
222+
required to be valid Rust syntax, but like the input it's just list of tokens
223+
that we need to build somehow.
224+
225+
Technically the only way to create a `TokenStream` is via its `FromIterator`
226+
implementation, which means we'd have to create each token one-by-one and
227+
collect it into a `TokenStream`. This is quite tedious, though, so let's take a
228+
look at [`syn`][syn]'s sibling crate: [`quote`].
229+
230+
The [`quote`] crate is a quasi-quoting implementation for Rust which primarily
231+
provides a convenient macro for us to use:
232+
233+
```rust
234+
use quote::quote;
235+
236+
#[proc_macro_attribute]
237+
pub fn hello(attr: TokenStream, item: TokenStream) -> TokenStream {
238+
let input = syn::parse_macro_input!(item as syn::ItemFn);
239+
let name = &input.ident;
240+
241+
// Our input function is always equivalent to returning 42, right?
242+
let result = quote! {
243+
fn #name() -> u32 { 42 }
244+
};
245+
result.into()
246+
}
247+
```
248+
249+
The [`quote!` macro] allows you to write mostly-Rust syntax and interpolate
250+
variables quickly from the environment with `#foo`. This removes much of the
251+
tedium of creating a `TokenStream` token-by-token and allows quickly cobbling
252+
together various pieces of syntax into one return value.
253+
254+
### Tokens and `Span`
255+
256+
Perhaps the greatest feature of procedural macros in Rust 2018 is the ability to
257+
customize and use [`Span`] information on each token, giving us the ability for
258+
amazing syntactical error messages from procedural macros:
259+
260+
```
261+
error: expected `fn`
262+
--> src/main.rs:3:14
263+
|
264+
3 | my_annotate!(not_fn foo() {});
265+
| ^^^^^^
266+
```
267+
268+
as well as completely custom error messages:
269+
270+
```
271+
error: imported methods must have at least one argument
272+
--> invalid-imports.rs:12:5
273+
|
274+
12 | fn f1();
275+
| ^^^^^^^^
276+
```
277+
278+
A [`Span`] can be thought of as a pointer back into an original source file,
279+
typically saying something like "the `Ident` token` foo` came from file
280+
`bar.rs`, line 4, column 5, and was 3 bytes long". This information is
281+
primarily used by the compiler's diagnostics with warnings and error messages.
282+
283+
In Rust 2018 each [`TokenTree`] has a [`Span`] associated with it. This means that
284+
if you preserve the [`Span`] of all input tokens into the output then even
285+
though you're producing brand new syntax the compiler's error messages are still
286+
accurate!
287+
288+
For example, a small macro like:
289+
290+
```rust
291+
#[proc_macro]
292+
pub fn make_pub(item: TokenStream) -> TokenStream {
293+
let result = quote! {
294+
pub #item
295+
};
296+
result.into()
297+
}
298+
```
299+
300+
when invoked as:
301+
302+
```rust
303+
my_macro::make_pub! {
304+
static X: u32 = "foo";
305+
}
306+
```
307+
308+
is invalid because we're returning a string from a function that should return a
309+
`u32`, and the compiler will helpfully diagnose the problem as:
310+
311+
```
312+
error[E0308]: mismatched types
313+
--> src/main.rs:1:37
314+
|
315+
1 | my_macro::make_pub!(static X: u32 = "foo");
316+
| ^^^^^ expected u32, found reference
317+
|
318+
= note: expected type `u32`
319+
found type `&'static str`
320+
321+
error: aborting due to previous error
322+
323+
```
324+
325+
And we can see here that although we're generating brand new syntax, the
326+
compiler can preserve span information to continue to provide targeted
327+
diagnostics about code that we've written.
328+
329+
### Procedural Macros in the Wild
330+
331+
Ok up to this point we've got a pretty good idea about what procedural macros
332+
can do and the various capabilities they have in the 2018 edition. As such a
333+
long-awaited feature, the ecosystem is already making use of these new
334+
capabilities! If you're interested, some projects to keep your eyes on are:
335+
336+
* [`syn`][syn], [`quote`], and [`proc-macro2`] are your go-to libraries for
337+
writing procedural macros. They make it easy to define custom parsers, parse
338+
existing syntax, create new syntax, work with older versions of Rust, and much
339+
more!
340+
341+
* [Serde] and its derive macros for `Serialize` and `Deserialize` are likely the
342+
most used macros in the ecosystem. They sport an [impressive amount of
343+
configuration][serde-attr] and are a great example of how small annotations
344+
can be so powerful.
345+
346+
* The [`wasm-bindgen` project][wbg] uses attribute macros to easily define
347+
interfaces in Rust and import interfaces from JS. The `#[wasm_bindgen]`
348+
lightweight annotation makes it easy to understand what's coming in and out,
349+
as well as removing lots of conversion boilerplate.
350+
351+
* The [`gobject_gen!` macro][gnome-class] is an experimental IDL for the GNOME
352+
project to define GObject objects safely in Rust, eschewing manually writing
353+
all the glue necessary to talk to C and interface with other GObject
354+
instances in Rust.
355+
356+
* The [Rocket framework][rocket] has recently switched over to procedural
357+
macros, and showcases some of nightly-only features of procedural macros like
358+
custom diagnostics, custom span creation, and more. Expect to see these
359+
features stabilize in 2019!
360+
361+
That's just a *taste* of the power of procedural macros and some example usage
362+
throughout the ecosystem today. We're only 6 weeks out from the original release
363+
of procedural macros on stable, so we've surely only scratched the surface as
364+
well! I'm really excited to see where we can take Rust with procedural macros by
365+
empowering all kinds of lightweight additions and extensions to the language!
366+
367+
[procedural macros]: https://doc.rust-lang.org/reference/procedural-macros.html
368+
[RFC 1566]: https://github.com/rust-lang/rfcs/blob/master/text/1566-proc-macros.md
369+
[Rust 1.15]: https://blog.rust-lang.org/2017/02/02/Rust-1.15.html
370+
[Serde]: https://serde.rs
371+
[manifest]: https://doc.rust-lang.org/cargo/reference/manifest.html
372+
[proc-ref]: https://doc.rust-lang.org/stable/reference/procedural-macros.html
373+
[pm]: https://doc.rust-lang.org/proc_macro/
374+
[`TokenStream`]: https://doc.rust-lang.org/stable/proc_macro/struct.TokenStream.html
375+
[`TokenTree`]: https://doc.rust-lang.org/stable/proc_macro/enum.TokenTree.html
376+
[Rust 1.30]: https://blog.rust-lang.org/2018/10/25/Rust-1.30.0.html
377+
[syn]: https://crates.io/crates/syn
378+
[spm]: https://docs.rs/syn/0.15/syn/parse/index.html
379+
[`quote`]: https://docs.rs/quote/0.6/quote/
380+
[`quote!` macro]: https://docs.rs/quote/0.6/quote/macro.quote.html
381+
[`Span`]: https://doc.rust-lang.org/proc_macro/struct.Span.html
382+
[`proc-macro2`]: https://docs.rs/proc-macro2/0.4/proc_macro2/
383+
[serde-attr]: https://serde.rs/attributes.html
384+
[wbg]: https://github.com/rustwasm/wasm-bindgen
385+
[gnome-class]: https://gitlab.gnome.org/federico/gnome-class
386+
[rocket]: https://rocket.rs/

0 commit comments

Comments
 (0)