mario::konrad
programming / C++ / sailing / nerd stuff
Tutorial: Flex++ and Bison++
© 2005 / Mario Konrad

Introduction

To write a scanner or parser is not a very difficult thing. Until a certain complexity and/or size it’s maybe ok to write it manually. Beyond that size or complexity the utilization of a scanner or parser generator is recommended. There are many tools doing this, lex/yacc to name some of the best known (or their GNU coutnerparts flex and bison).

flex++ and bison++ are more improved ones. They produce C++ code and are more customizable. The bad thing is, there is not much documentation and tutorials concerning those utilities. This is what this document is for. It will show a basic example of a scanner, continues with a parser (scanner and parser working together) and finally shows an example how to write programs having multiple parsers.

Note: all source codes provided by this page are free to copy, modify, distribute, use or to treat in any manner you like. Build and run the programs on your own risk. The tutorial is copyright of Mario Konrad.

Note: this is not a tutorial about fundamentals of scanners, parsers or compiler building. It just covers the tools mentioned. For the other topics, please read CS literature.

Requirements

Simple Flex++ Example

This is a rather basic example of a simple scanner. Don’t expect too much.

calc.l:

%name Scanner
%define IOSTREAM

DIGIT   [0-9]
DIGIT1  [1-9]

%%

"+"               { cout << "operator <" << yytext[0] << ">" << endl; }
"-"               { cout << "operator <" << yytext[0] << ">" << endl; }
"="               { cout << "operator <" << yytext[0] << ">" << endl; }
{DIGIT1}{DIGIT}*  { cout << "  number <" << yytext    << ">" << endl; }
.                 { cout << " UNKNOWN <" << yytext[0] << ">" << endl; }

%%

int main(int argc, char ** argv)
{
    Scanner scanner;
    scanner.yylex();
    return 0;
}

Build it with the following sequence:

$ flex++ calc.l
$ g++ -o calc lex.yy.cc

The result is a program called calc. To test it out simply type:

$ echo "1+2-3=" | ./calc
$ echo "1+@2-3=" | ./calc

If you look at the source code, there isn’t really much to say about it. It’s a plain and simple scanner that knows the tokens +, -, = and positive integer numbers.

The more interesting thing about the source is, it demonstrates the ability of flex++ to generate a scanner class (called Scanner in this example) which is used to scan the input (stdin or cin respectively):

int main(int argc, char ** argv)
{
    Scanner scanner;
    scanner.yylex();
    return 0;
}

Parser with Flex++ and Bison++

This is a quite simple example of a parser using a scanner. There are tree files:

parser.y:

%name CalcParser
%define LSP_NEEDED
%define ERROR_BODY = 0
%define LEX_BODY = 0

%header{
#include <iostream>
#include <fstream>
%}

%union {
    int i_type;
    char c_type;
}

%token UNKNOWN
%token <c_type> PLUS MINUS EQUALS
%token <i_type> NUMBER

%type <i_type> number subexpression addexpression
%type <i_type> expression calculation

%start calculation

%%

calculation
: expression EQUALS { $$ = $1; }
;

expression
: addexpression { $$ = $1; }
;

addexpression
: subexpression { $$ = $1; }
| addexpression PLUS subexpression  { $$ = $1 + $3; }
;

subexpression
: number    { $$ = $1; }
| subexpression MINUS number    { $$ = $1 - $3; }
;

number
: NUMBER    { $$ = $1; }
;

%%

scanner.l:

%name CalcScanner
%define IOSTREAM
%define LEX_PARAM YY_CalcParser_STYPE *val, YY_CalcParser_LTYPE *loc
%define MEMBERS public: int line, column;
%define CONSTRUCTOR_INIT : line(1), column(1)

%header{
#include<sstream>
#include "parser.h"
%}

DIGIT   [0-9]
DIGIT1  [1-9]

%%

" "               {
                    ++column;
                  }
"\t"              {
                    column += 8;
                  }
"+"               {
                    ++column;
                    val->c_type = yytext[0];
                    return CalcParser::PLUS;
                  }
"-"               {
                    ++column;
                    val->c_type = yytext[0];
                    return CalcParser::MINUS;
                  }
"="               {
                    ++column;
                    val->c_type = yytext[0];
                    return CalcParser::EQUALS;
                  }
{DIGIT1}{DIGIT}*  {
                    column += strlen(yytext);
                    std::istringstream(yytext) >> val->i_type;
                    return CalcParser::NUMBER;
                  }
.                 {
                    ++column;
                    val->c_type = yytext[0];
                    return CalcParser::UNKNOWN;
                  }
<<EOF>>           {
                    yyterminate();
                  }

%%

calc.cc:

#include "parser.h"
#include "scanner.h"
#include <iostream>
using namespace std;

class Parser : public CalcParser
{
    private:
        CalcScanner scanner;
    public:
        Parser();
        virtual void yyerror(char * msg);
        virtual int yylex();
};

Parser::Parser()
{
}

void Parser::yyerror(char * msg)
{
    cerr << yylloc.first_line << ":" << yylloc.first_column <<  ": "
        << msg << " : <" << yylloc.text << ">" <<  endl;
}

int Parser::yylex()
{
    yylloc.first_line = scanner.line;
    yylloc.first_column = scanner.column;
    int token = scanner.yylex(&yylval, &yylloc);
    yylloc.last_line = scanner.line;
    yylloc.last_column = scanner.column;
    yylloc.text = (char *)scanner.yytext;
    return token;
}

int main(int argc, char ** argv)
{
    Parser parser;
    parser.yyparse();
    return 0;
}

It is an example of a very simple algebraic calculator, knowing only integer values, additions and subtractions.

Multiple Parsers

The previous example were nothing special, business as usual. To build a multiple-parser-application is a bit more tricky. Luckily flex++ and bison++ are more customizable as the original tools lex and yacc.

Let’s build an application that can utilize a parser for algegraic calculations as well as UPN calculation (UPN in german = IPN in english: inverse polnish notation).

EBNF

For the algebraic calculator:

calculation   ::= expression "=" .
expression    ::= addexpression .
addexpression ::= subexpressoin
                  | (addexpression "+" subexpression) .
subexpression ::= number
                  | (subexpression "-" number) .
number        ::= [1-9][0-9]* .

For the UPN calculator:

calculation ::= expression .
expression  ::= number
                | (expression expression "+")
                | (expression expression "-") .
number      ::= [1-9][0-9]* .

number is not in pure EBNF, but it’s easier to describe it as a regular expression.

Design

The parser generator shall generate the parsers for both calculators. The application however likes to use them dynamically. The following diagram is UML and shows the class design of the classes provided by the generator as well as the other classes.

Files

Build

It is recommended to use the Makefile:

$ make

The build process goes as follows:

$ bison++ -d -hAlgCalcParser.h -oAlgCalcParser.cc AlgCalcParser.y
$ flex++ -hAlgCalcScanner.h -oAlgCalcScanner.cc AlgCalcScanner.l
$ bison++ -d -hUPNCalcParser.h -oUPNCalcParser.cc UNPCalcParser.y
$ flex++ -hUPNCalcScanner.h -oUPNCalcScanner.cc UPNCalcScanner.l
$ g++ -c calc.cc
$ g++ -c AlgCalcParser.cc
$ g++ -c AlgCalcScanner.cc
$ g++ -c AlgParser.cc
$ g++ -c UPNCalcParser.cc
$ g++ -c UPNCalcScanner.cc
$ g++ -c UPNParser.cc
$ g++ -o calc calc.o AlgCalcParser.o AlgCalcScanner.o AlgParser.o UPNCalcParser.o UPNCalcScanner.o UPNParser.o

Test

Now we have a calulator that is able to handle algebraic as well as inverse polnish notation. Just tell the calculator which one you like:

$ echo "1+2-4=" | ./calc a
$ echo "1 2 + 3-" | ./calc u

Resources

flex++ Skeletons:

Download Flex++ and Bison++ packages, or directly from here: