-
Notifications
You must be signed in to change notification settings - Fork 18
Description
See also #69 (comment)
We have this CI failures on Windows https://github.com/posit-dev/air/actions/runs/12042018483/job/33574939950?pr=69
With this test
---- formatter::r::specs::r::value::string_value_r stdout ----
thread 'formatter::r::specs::r::value::string_value_r' panicked at C:\Users\runneradmin\.cargo\git\checkouts\biome-d49ce8898b420a50\2648fa4\crates\biome_formatter\src\builders.rs:394:5:
The content 'r"("some raw string
business")"' contains an unsupported '\r' line terminator character but text must only use line feeds '\n' as line separator. Use '\n' instead of '\r' and '\r\n' to insert a line break in strings.
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Which formats a file containing this string
r"("some raw string
business")"
Note that this is a multiline string. The raw string there isn't critical to the reprex, it's just what we happened to have there.
This fails because:
- On windows, git checks the file out with CRLF line endings. Note this puts a
\r\n
inside the string contents. - The formatter fails inside
located_token_text()
for a string because of this no-\r assertion
Both biome and ruff use this assertion, so I imagine it is for a good reason.
The way both of them handle this is by using variants of normalize_string()
here:
https://github.com/biomejs/biome/blob/cd1c8ec4249e8df8d221393586d664537c9fddb2/crates/biome_formatter/src/token/string.rs#L48
Note that the docs say one of the things it does is replace \r\n
with just \n
.
They call this at fmt()
time for a string node, before it actually drops through to located_token_text()
, so they get a chance to "fix up" the string here. We could probably create our own version of this and strip out everything except the \r\n
conversion, until we decide to do more normalization.
Why do they even do this?
I imagine the idea is that typically the line ending doesn't really matter, it becomes part of the Trivia set, but is not part of an actual node. But here, the line ending gets parsed into the actual string contents. I assume they want to ensure that the actual nodes of the tree look the same across OSes regardless of line endings, and standardizing on \n
inside multiline strings is the way to achieve that.