Skip to content

lexical analysis

Bahul Jain edited this page Jan 26, 2016 · 7 revisions

#Notes

###Function

  • Classify program sub-strings (lexemes) to according to their role (token class)
  • Communicate the tokens to the parser

###Token Class

  • Identifier: string of letters or digits, starting with a letter
  • Integer: a non-empty sequence of digits
  • Keyword: "else" or "if" or "begin" or...
  • White space: a non-empty sequence of blanks, newlines, and tabs

###Regular Language

  • Empty String: Epsilon
  • Single character strings: 'a', 'b', 'c'....
  • Union: A + B
  • Concatenate: AB
  • Iteration: A*

###Regular Expression

  • Integer:
  • digit = '0'+'1'+'2'...'9'
  • digits = digit.digit* or digit+ (has to have at least 1 digit)
  • Identifier:
  • letter = [a-zA-Z]
  • identifier = letter.(digit + letter)*
  • Whitespace: ' '+'\n'+'\t'

###Additional Tips

  • sometimes look ahead required
Clone this wiki locally