Skip to content

RFC: proc macro include! #3200

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
175 changes: 175 additions & 0 deletions text/0000-proc-macro-include.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,175 @@
- Feature Name: `proc_macro_include`
- Start Date: 2021-11-24
- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000)
- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000)

# Summary
[summary]: #summary

Proc macros can now effectively `include!` other files and process their contents.
This both allows proc macros to communicate that they read external files,
and to maintain spans into the external file for more useful error messages.

# Motivation
[motivation]: #motivation

- `include!` and `include_str!` are no longer required to be compiler built-ins,
and could be implemented as proc macros.
- Help incremental builds and build determinism, by proc macros telling rustc which files they read.
- Improve proc macro sandboxability and cacheability, by offering a way to implement this class of
file-reading macros without using OS APIs directly.

# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation

## For users of proc macros

Nothing changes! You'll just see nicer errors and fewer rebuilds
from procedural macros which read external files.

## For writers of proc macros

Three new functions are provided in the `proc_macro` interface crate:

```rust
/// Read the contents of a file as a `TokenStream` and add it to build dependency info.
///
/// The build system executing the compiler will know that the file was accessed during compilation,
/// and will be able to rerun the build when the contents of the file changes.
///
/// May fail for a number of reasons, for example, if the string contains unbalanced delimiters
/// or characters not existing in the language.
///
/// If the file fails to be read, this is not automatically a fatal error. The proc macro may
/// gracefully handle the missing file, or emit a compile error noting the missing dependency.
///
/// Source spans are constructed for the read file. If you use the spans of this token stream,
/// any resulting errors will correctly point at the tokens in the read file.
///
/// NOTE: some errors may cause panics instead of returning `io::Error`.
/// We reserve the right to change these errors into `io::Error`s later.
fn include<P: AsRef<str>>(path: P) -> Result<TokenStream, std::io::Error>;

/// Read the contents of a file as a string and add it to build dependency info.
///
/// The build system executing the compiler will know that the file was accessed during compilation,
/// and will be able to rerun the build when the contents of the file changes.
///
/// If the file fails to be read, this is not automatically a fatal error. The proc macro may
/// gracefully handle the missing file, or emit a compile error noting the missing dependency.
///
/// NOTE: some errors may cause panics instead of returning `io::Error`.
/// We reserve the right to change these errors into `io::Error`s later.
fn include_str<P: AsRef<str>>(path: P) -> Result<String, std::io::Error>;

/// Read the contents of a file as raw bytes and add it to build dependency info.
///
/// The build system executing the compiler will know that the file was accessed during compilation,
/// and will be able to rerun the build when the contents of the file changes.
///
/// If the file fails to be read, this is not automatically a fatal error. The proc macro may
/// gracefully handle the missing file, or emit a compile error noting the missing dependency.
///
/// NOTE: some errors may cause panics instead of returning `io::Error`.
/// We reserve the right to change these errors into `io::Error`s later.
fn include_bytes<P: AsRef<str>>(path: P) -> Result<Vec<u8>, std::io::Error>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense for include_bytes to return Literal as well, or would that not be possible?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should work because Literal can be a byte string.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, yeah, I overlooked that possibility.

The main limitation is that the only current interface for getting the contents out of a Literal is to ToString it. syn does have a .value() for LitByteStr as well as LitStr, though, so I guess it's workable.

It's probably not good to short term require debug escaping a binary file to reparse the byte string literal if a proc macro is going to post process the file... but if it's just including the literal, it can put the Literal in the token stream, and we can offer ways to extract (byte) string literals without printing the string literal in the future.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The one limitation which needs to be solved is how do spans work. Do we just say that the byte string literal contains the raw bytes of the file (even though that would be illegal in a normal byte string, and invalid UTF-8), maybe as a new "kind" of byte string, so span offsets are mapped directly with the source file? Or are there multiple span positions (representing a \xNN in the byte string) which map to a single byte in the source file?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, what bytes are not allowed in byte string literals? Does the literal itself have to be valid UTF-8?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A Rust source file must be valid UTF-8. Thus, the contents of a byte string literal in the source must be valid UTF-8.

Bytes that are not < 0x80 thus must be escaped to appear in a byte string literal.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And then another question that's worth making explicit: what does it even mean for rustc to report a span into a binary file?

I think binary includes are better served by a different API that lets rustc point into generated code, rather than trying to point into an opaque binary file.

```

As an example, consider a potential implementation of [`core::include`](https://doc.rust-lang.org/stable/core/macro.include.html):

```rust
#[proc_macro]
pub fn include(input: TokenStream) -> TokenStream {
let mut iter = input.into_iter();

let result = 'main: if let Some(tt) = iter.next() {
let TokenTree::Literal(lit) = tt &&
let LiteralValue::Str(path) = lit.value() else {
Diagnostic::spanned(tt.span(), Level::Error, "argument must be a string literal").emit();
break 'main TokenStream::new();
}

match proc_macro::include(&path) {
Ok(token_stream) => token_stream,
Err(err) => {
Diagnostic::spanned(Span::call_site(), Level::Error, format_args!("couldn't read {path}: {err}")).emit();
TokenStream::new()
}
}
} else {
Diagnostic::spanned(Span::call_site(), Level::Error, "include! takes 1 argument").emit();
TokenStream::new()
}

if let Some(_) = iter.next() {
Diagnostic::spanned(Span::call_site(), Level::Error, "include! takes 1 argument").emit();
}

result
}
```

(RFC note: this example uses unstable and even unimplemented features for clarity.
However, this RFC in no way requires these features to be useful on its own.)

# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation

If a file read is unsuccessful, an encoding of the responsible `io::Error` is passed over the RPC bridge.
If a file is successfully read but fails to lex, `ErrorKind::Other` is returned.

None of these three APIs should ever cause compilation to fail.
It is the responsibility of the proc macro to fail compilation if a failed file read is fatal.

The author is unsure of the technical details required to implement this in the compiler.

# Drawbacks
[drawbacks]: #drawbacks

This is more API surface for the `proc_macro` crate, and the `proc_macro` bridge is already complicated.
Additionally, this is likely to lead to more proc macros which read external files.
Moving the handling of `include!`-like macros later in the compiler pipeline
(read: dependent on name resolution)
likely is also significantly more complicated than the current `include!` implementation.

# Alternatives
[rationale-and-alternatives]: #rationale-and-alternatives

- [`proc_macro::tracked_path`](https://doc.rust-lang.org/stable/proc_macro/tracked_path/fn.path.html) (unstable)

This just tells the proc_macro driver that the proc macro has a dependency on the given path.
This is sufficient for tracking the file, as the proc macro can just also read the file itself,
but lacks the ability to require the proc macro go through this API, or to provide spans for errors.

Meaningfully, it'd be nice to be able to sandbox proc macros in wasm à la [watt](https://crates.io/crates/watt)
while still having proc macros capable of reading the filesystem (in a proc_macro driver controlled manner).

- Status quo

Proc macros can continue to read files and use `include_str!` to indicate a build dependency.
This is error prone, easy to forget to do, and all around not a great experience.

# Prior art
[prior-art]: #prior-art

No known prior art.

# Unresolved questions
[unresolved-questions]: #unresolved-questions

- It would be nice for `include` to allow emitting a useful lexer error directly.
This is not currently provided for by the proposed API.
- Unknown unknowns.

# Future possibilities
[future-possibilities]: #future-possibilities

Future expansion of the proc macro APIs are almost entirely orthogonal from this feature.
As such, here is a small list of potential uses for this API:

- Processing a Rust-lexer-compatible DSL
- Multi-file parser specifications for toolchains like LALRPOP or pest
- Larger scale Rust syntax experimentations
- Pre-processing `include!`ed assets
- Embedding compiled-at-rustc-time shaders
- Escaping text at compile time for embedding in a document format