-
Notifications
You must be signed in to change notification settings - Fork 13
Add newline to the parsing grammar to resolve some bugs #56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
5760850
c70543a
e3e5371
8ec331c
2107449
3fd5a6f
6185b90
f52e6f6
cec678b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,6 +5,7 @@ pub(crate) enum TokenType { | |
Letter, | ||
Number, | ||
Comment, | ||
Newline, | ||
Unknown, | ||
} | ||
|
||
|
@@ -16,6 +17,8 @@ impl From<char> for TokenType { | |
TokenType::Number | ||
} else if c == '(' || c == ';' || c == ')' { | ||
TokenType::Comment | ||
} else if c == '\n' { | ||
TokenType::Newline | ||
} else { | ||
TokenType::Unknown | ||
} | ||
|
@@ -53,14 +56,14 @@ impl<'input> Lexer<'input> { | |
{ | ||
let start = self.current_position; | ||
let mut end = start; | ||
let mut line_endings = 0; | ||
|
||
for letter in self.rest().chars() { | ||
if !predicate(letter) { | ||
break; | ||
} | ||
if letter == '\n' { | ||
line_endings += 1; | ||
// Newline defines the command to be complete. | ||
break; | ||
} | ||
end += letter.len_utf8(); | ||
} | ||
|
@@ -69,7 +72,6 @@ impl<'input> Lexer<'input> { | |
None | ||
} else { | ||
self.current_position = end; | ||
self.current_line += line_endings; | ||
Some(&self.src[start..end]) | ||
} | ||
} | ||
|
@@ -175,6 +177,23 @@ impl<'input> Lexer<'input> { | |
}, | ||
}) | ||
} | ||
|
||
fn tokenize_newline(&mut self) -> Option<Token<'input>> { | ||
let start = self.current_position; | ||
let line = self.current_line; | ||
let value = "\n"; | ||
self.current_position += 1; | ||
self.current_line += 1; | ||
Some(Token { | ||
kind: TokenType::Newline, | ||
value, | ||
span: Span { | ||
start, | ||
line, | ||
end: start + 1, | ||
}, | ||
}) | ||
} | ||
|
||
fn finished(&self) -> bool { self.current_position >= self.src.len() } | ||
|
||
|
@@ -219,6 +238,9 @@ impl<'input> Iterator for Lexer<'input> { | |
TokenType::Number => { | ||
return Some(self.tokenize_number().expect(MSG)) | ||
}, | ||
TokenType::Newline => { | ||
return Some(self.tokenize_newline().expect(MSG)) | ||
}, | ||
TokenType::Unknown => self.current_position += 1, | ||
} | ||
} | ||
|
@@ -253,11 +275,22 @@ mod tests { | |
|
||
#[test] | ||
fn skip_whitespace() { | ||
let mut lexer = Lexer::new(" \n\r\t "); | ||
let mut lexer = Lexer::new(" \r\t "); | ||
|
||
lexer.skip_whitespace(); | ||
|
||
assert_eq!(lexer.current_position, lexer.src.len()); | ||
assert_eq!(lexer.current_line, 0); | ||
} | ||
|
||
#[test] | ||
fn respect_newlines() { | ||
let mut lexer = Lexer::new("\n\rM30garbage"); | ||
|
||
let token = lexer.tokenize_newline().unwrap(); | ||
|
||
assert_eq!(token.kind, TokenType::Newline); | ||
assert_eq!(lexer.current_position, 1); | ||
assert_eq!(lexer.current_line, 1); | ||
} | ||
|
||
|
@@ -371,4 +404,44 @@ mod tests { | |
|
||
assert_eq!(got.value, "+3.14"); | ||
} | ||
|
||
#[test] | ||
fn two_multi() { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would probably be more readable if we rewrite the test to be something like this: let expected = vec![
("G", 0),
("0", 0),
("X", 0),
...
];
let actual: Vec<_> = Lexer::new("...")
.map(|tok| (tok.value, tok.span.line))
.collect();
assert_eq!(actual, expected); There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For long g-code example tests, that's probably the way to go. Maybe we could create some helper methods that make testing easier. We're really missing these long-form g-code verification tests, so I took this one from the pull request @dr0ps made. How about we keep this test as is for now, and rework these kinds of tests in the future? |
||
let mut lexer = Lexer::new("G0 X1\nG1 Y2"); | ||
|
||
let got = lexer.next().unwrap(); | ||
assert_eq!(got.value, "G"); | ||
assert_eq!(got.span.line, 0); | ||
|
||
let got = lexer.next().unwrap(); | ||
assert_eq!(got.value, "0"); | ||
assert_eq!(got.span.line, 0); | ||
|
||
let got = lexer.next().unwrap(); | ||
assert_eq!(got.value, "X"); | ||
assert_eq!(got.span.line, 0); | ||
|
||
let got = lexer.next().unwrap(); | ||
assert_eq!(got.value, "1"); | ||
assert_eq!(got.span.line, 0); | ||
|
||
let got = lexer.next().unwrap(); | ||
assert_eq!(got.value, "\n"); | ||
|
||
let got = lexer.next().unwrap(); | ||
assert_eq!(got.value, "G"); | ||
assert_eq!(got.span.line, 1); | ||
|
||
let got = lexer.next().unwrap(); | ||
assert_eq!(got.value, "1"); | ||
assert_eq!(got.span.line, 1); | ||
|
||
let got = lexer.next().unwrap(); | ||
assert_eq!(got.value, "Y"); | ||
assert_eq!(got.span.line, 1); | ||
|
||
let got = lexer.next().unwrap(); | ||
assert_eq!(got.value, "2"); | ||
assert_eq!(got.span.line, 1); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the only way this function can be called is when we're looking at a
\n
character, would it be enough to add something likedebug_assert!(self.rest().starts_with("\n"))
and return aToken<'input>
?Otherwise, we should actually check for the newline character and return
None
if it isn't found.Either way, I think we should try to use the newline characters from the original stream instead of writing
let value = "\n"
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what you're getting at is "this code has a weird smell", and I think you're right. 😁 But I'm honestly not sure how to construct this better. If we don't explicitly set
value = "\n"
, we'd have tochomp()
or somehow otherwise pull the newline from the string, which is a computational expense that seems superfluous.As for your suggestion
I don't think I understand. The
debug_assert
is something I could add; seems reasonable. But how would you return theToken<'input>
?