My Last Resort - I Need Help Debugging my Program
I can't figure out what is going on in my program, I'm having a hard time wrapping my mind around what steps its taking to get to the wrong result.
I have this function which should make my parser ignore C-style comments, ie /* */. When I type /* * */ or any number of * in the comment my program is detecting the * characters inside the comment itself, and when it detects a * it also detects the last /:
// Handle C-style comments
private void HandleComment() {
// Consume until closing characters, multi-line comments are supported
while (Peek() != '*' && PeekAhead() != '/' && !IsAtEnd()) {
if (Peek() == '\n') line++;
Advance();
}
Advance();
Advance();
}
I figured I would need to Advance() twice in order to consume both the characters that denote the end of the comment.
Peek(), PeekAhead(), Advance() and IsAtEnd() are functions from the book, Crafting Interpreters, here they are:
// Peek at the next char
private char Peek() {
if (IsAtEnd()) return '\0';
return source[current];
}
// Peek after next char
private char PeekAhead() {
if (current + 1 >= source.Length) return '\0';
return source[current+1];
}
private char Advance() {
return source[current++];
}
private bool IsAtEnd() {
return current >= source.Length;
}
The source variable is just a string that contains the text contents of a file or the input from Console.ReadLine().
I based the comment logic on the string logic here:
// Handle string lexemes
private void HandleString() {
// Consume until closing quote, multi-line strings are supported
while (Peek() != '"' && !IsAtEnd()) {
if (Peek() == '\n') line++;
Advance();
}
// Handle unterminated strings
if (IsAtEnd()) {
DotLox.Error(line, "Unterminated string.");
return;
}
// Consume
Advance();
// Tokenize string and store value without quotes
string value = source[(start+1)..(current-1)];
AddToken(TokenType.STRING, value);
}
I almost forgot to share this crucial bit of logic here:
private void ScanToken() {
char c = Advance();
// Match characters to tokens
switch(c) {
// Division and comments
case '/':
if (Match('/')) {
// Consume comment but don't turn it into a token
while (Peek() != '\n' && !IsAtEnd()) Advance();
} else if (Match('*')) {
// Handle C-style comments
HandleComment();
} else {
// Turn lone slash into a token
AddToken(TokenType.SLASH);
}
break;
}
}
The problem is fairly simple:
You are breaking the loop if either the current character is a
*or if the next character is a/. In other words, you don't require the full*/token.Instead, you'll want something like
That is so weird, I thought everything in the while statement would have to be true at the same time.
That is the case. But if you are at a
*, then the originalwhilestatement readswhile (false && …) {and immediately breaks. The remaining checks are not evaluated, since they cannot affect the outcome when the first check failsOh right, forgot
&&did that. Thank you, it looks like your solution worked!This would also work:
This is wrong. At least as I would expect advance to behave. While you do advance, you are returning the old value, not the new one. ++current would return the new value.
Also, it would help if you said what the wrong behaviour was.
Thank you, I'll see if that change improves the logic. The wrong behaviour is right here:
Probably this guy.
If you encounter an asterisk or a slash, then it stops. You want to stop when you encounter an asterisk and a slash, so:
I also find your detection of when to remove the comment a bit scary, but that might be because you haven't shown the Match method.
Oh yes I forgot about that one,
Match()comparessource[current]to the parameter and then consumes it if true, so if the program encounters/*it'll consume the/with theMatch()method and then it will also consume the*before going into theHandleComment()logic.It's not strictly speaking a bug, but the behavior of
Advanceis a bit confusing. Based on how you use it in the code snippets, I would suggest changing it toSo that it only does what it says, and then changing the place where you read the next char to
Alternatively, you could rename it to something like
ReadChar, which more closely matches its current behavior, since a read is expected to return the data (character) at the current position, and then advance the positionIts designed to consume and increment, the naming is a bit weird but its what the author used in the book I was following. I could also rename it to
Consume().