lexical analysis

Jump to bottom

Bahul Jain edited this page Jan 26, 2016 · 7 revisions

#Notes

###Function

Classify program sub-strings (lexemes) to according to their role (token class)
Communicate the tokens to the parser

###Token Class

Identifier: string of letters or digits, starting with a letter
Integer: a non-empty sequence of digits
Keyword: "else" or "if" or "begin" or...
White space: a non-empty sequence of blanks, newlines, and tabs

###Regular Language

Empty String: Epsilon
Single character strings: 'a', 'b', 'c'....
Union: A + B
Concatenate: AB
Iteration: A*

###Regular Expression

Integer:
digit = '0'+'1'+'2'...'9'
digits = digit.digit* or digit+ (has to have at least 1 digit)
Identifier:
letter = [a-zA-Z]
identifier = letter.(digit + letter)*
Whitespace: ' '+'\n'+'\t'

###Additional Tips

sometimes look ahead required

JSJS © 2016