CPSC 3520 - DAY 12 FEBRUARY 22, 2018 ================================================================================ REGULAR EXPRESSIONS ------------------- Regular expressions are commonly used in bash, grep, and vi. Most importantly, they form the basis for the token recognizer implemented in flex. Patterns to be recognized by flex in the input are denoted using extended regular expressions. FLEX (associate with regular expressions) ----------------------------------------- +-----------------+----------------------------------------------------------+ | character | action (matches) | +-----------------+----------------------------------------------------------+ | x | match the character 'x' | | . | match any character except newlin | | [xyz] | match either an 'x' a 'y' or a 'z' | | [j-o] | match any letter from 'j' through 'o' | | [~A-Z] | match any character except an uppercase letter or newln | | [~A-Z\n] | match any character except an uppercase letter or newln | | R* | match zero or more R, where r is a regular expression | | R+ | match one or more R | | R? | match zero or one R | | R[2,5] | match from two to five R | | "[xyz]" | match the enclosed string literally, i.e match [xyz] | | \X | match the c interpretation of \X if X is a,b,f,n,t,v | | (R) | match R using parenthesis to denote precedence | | R | S | match regular expression R or regular expression S | +-----------------+----------------------------------------------------------+ Most characters, including all letters and digits are considered regular expressions that match themselves. Flex input file structure -The regular expressions to be matched against input are specified by the user in a flex source file (with the extension .in). -This file consists of regular expressions to be matched against the input file together with corresponding flex actions. -This file is translated by flex into a c program which reads an input character stream, and where possible, converts the inputs into tokens that are converted into regular expressions. Flex Output -The output of flex is a file, lex.yy.c, containing c for a scanner. -The scanning function within this file is denoted yylex. -The process is shown below: *.in --> lex.yy.cc -lex.yy.c may be compiled into a freestanding executable using gcc with the -lfl library specification. bison and Tokens for Parsing ---------------------------- -flex creates function yylex(). -the bison produces the parser as the function yyparse. -the parser requires flex-based generation of tokens. -specifically, yyparse exects the integer representation of tokens to be returned from the flex-generated scanner. The Pragmatics of Using bison ----------------------------- -specify the grammar in one or more bison grammar files. -write or generate a lexical analyzer to process source code input and pass tokens to the parser. -flex will generate a lexical analyzer to produce source code input and pass tokens to the parser. -write a function that calls the bison-generated parser. -write error reporting routines. -run bison on the grammar generated from yylex. -By convention, bison input files have the extension 'y'. -definitely use the '-d' flag to avoid being a frustrated mess. Conveying the Grammar to Bison ------------------------------ bison generates the parser from input in the form of a bison grammar file. -A nonterminal symbol in the formal grammar is represented in bison input as an identifier, which, by bison convention, must be in lower case.