In Linux, bison is a tool used to generate a grammar analyzer program. It can convert the grammar rules provided by the user into a grammar analyzer; bison needs to be used in conjunction with flex (lexical analyzer) to process Complex file parsing work. Starting from the production of a given grammar, bison will go through the algorithm and finally construct an action table, and then use this action table to parse the sentence.
#The operating environment of this tutorial: linux7.3 system, Dell G3 computer.
Lex was completed in 1975 by Mike Lesk and Eric Schmidt, who was still an intern at AT&T (Schmidt did (More), is a lexical analyzer generation program that can be used alone or in conjunction with Johnson's yacc. Lex is very famous, but its efficiency is too low and it has bugs. Around 1987, Vern Paxson of Lawrence Berkeley Laboratory rewrote Lex in C and named it FLex (the Fast Lexical Analyzer Generator), based on the Berkeley license. Flex is now a project of SourceForge, still based on the Berkeley license.
[Flex](https://github.com/westes/flex "Flex") is the free (but non-GNU) implementation of the original unix version of lex. Lexical scan generator for c/c.
(Note: Schmidt was the CEO of Google)
The predecessor of bison is yacc. Yacc was written by S.C. Johnson of Bell Labs from 1975 to 1978 based on the LR syntax analysis theory of Knuth. Around 1985, UC Berkeley graduate student Bob Corbett implemented Berkeley yacc using improved internal algorithms. Richard Stallman from FSF rewrote Berkeley yacc and used it for the GNU project, adding many features to form today's GNU Bison. Bison is now maintained as an FSF project and released under the GNU Public License. [Bison](http://www.gnu.org/software/bison/manual/) is a free grammar generator compatible with yacc.
The early Unix Lex/YACC developed into FLex/Bison. The new version of the program is upward compatible (that is, compatible with the old version). Flex and Bison are now used.
From the perspective of use, Flex and Bison are tools used to generate two programs, the lexical analyzer and the syntax analyzer under Linux. Processing structured input, generally used in combination to handle complex file parsing work.
#bison can convert the grammar rules provided by the user into a grammar analyzer. To put it simply, starting from the production of a given grammar, bison will go through the algorithm and finally construct an action table, and then use this action table to parse the sentence. Specifically, bison reads the production of the grammar provided by the user, generates a LALR(1) action table in C language format, and includes it into a C function named yyparse. The function of this function is to use this action table To parse the token stream, this token stream is obtained by scanning the source program with the lexical analyzer generated by flex.
The flex file defines the pattern (which is the soybean, which is the mung bean...), and divides the output into segments of tokens through flex processing (lexical analysis) (picking out the input beans one by one), Thereby executing different actions (grinding soybeans into soy milk (action), making mung bean cakes from mung beans (action))...
The tokens generated by flex can be fed to Bison for processing (easier and easier to debug), or of course not. bison and just handle it yourself (like the example below). However, using bison can handle complex logic more conveniently, making it simple to write and easy to debug.
Encoding practice: character counter
//本例中仅仅使用flex以及少量手写代码(main中),来完成字符串统计功能。 Yolandas-MacBook-Pro:flex-bison liuyuanyuan$ cat fb1-1.l /* 统计输入字符串*/ %{ int chars = 0; int words = 0; int lines =0; %} %% [a-zA-Z]+ { words++; chars += strlen(yytext); } \n { chars++; lines++;} . { chars++; } %% int main(int args, char **argv) { yylex(); printf("lines=%6d words=%6d chars=%6d\n", lines, words, chars); return 0; } //Linux 系统上用 -lfl 选项编译, Mac 的编译选项是 -ll Yolandas-MacBook-Pro:flex-bison liuyuanyuan$ gcc -ll lex.yy.c -o fb1-1 Yolandas-MacBook-Pro:flex-bison liuyuanyuan$ ./fb1-1 hello this is yolanda bye. lines= 3 words= 5 chars= 28
/* P1: declarations(定义段) */ %{ %} %% /* P2: translation rules(规则段) */ %% /* P3: auxiliary functions(用户辅助程序段,c函数)*/
Definition section includes text blocks, definitions, internal declarations, etc.
C language header files, function and variable declarations are generally placed between %{...%}, and the content of this part will be directly copied to the beginning of the generated .c file.
Contains %option option
%option noyywrap /* 定义段中包含了option选项*/ %{ #include "cal.tab.h" extern int yylval; %}
Rule segment The part between %%...%% is a series of matching patterns (regular expression) and action (C code).
When the flex scanner runs, it matches the input to the pattern of the rule segment, and each time it finds a match (the matched input is called a token), it executes the operations associated with that pattern. C code.
pattern(正则表达式) { action(c代码) } example: [0-9]+ {yylval = atoi(yytest); return NUMBER;}
User auxiliary program section is C code and will be copied to the c file as it is. Generally, some auxiliary functions are defined here.
int terror(chr *s) { printf("%s\n", s); return 0; }
/*P1: declarations 定义段*/ %{ %} %% /*P2: grammar rules 规则段(rule-action)*/ A: a1 { 语义动作1 } | a2 { 语义动作2 } | … | an { 语义动作n } | b //没有{…},则使用缺省的语义动作 ; //产生式结束标记 //语义动作一般是在产生式右部分析完,归约动作进行前执行。 A→ a1 | a2 | … | an | b %% /* P3: supporting C routines 用户辅助程序段(C函数) */
Definition section Can be divided into two parts:
1. The part between %{ and %}, written in C language, including header file include, macro definition, global variable definition, function declaration, etc.;
2. Make some relevant declarations about the terminal symbols and non-terminal symbols of the grammar.
Commonly used Bison tag declarations include: %token %union %start %type %left %right %nonassoc, etc.
%token 定义文法中使用了哪些终结符。定义形式: %token TOKEN1 TOKEN2
终结符一般全大写;(如 TOKEN1 TOKEN2)
一行可定义多个终结符,空格分隔;
%left、%right、%nonassoc 也是定义文法中使用了哪些终结符。定义形式与%token类似。
先定义的优先级低,最后定义的优先级最高,同时定义的优先级想通过。
%left表示左结合,%right表示右结合;
%nonassoc 表示不可结合(即它定义的终结符不能连续出现。例如
%left AA BB
%nonassoc CC
%right DD
表示优先级关系为:AA=BB注意:也可以于%prec来结合使用来改变token的优先级和结合性 例如: :'-' expr %prec NEG { $$ = -$2; };
%union 声明语法符号的语义值类型的集合,
bison中每个符号包括记号和非终结符都有一个不同数据类型的语义值,并使用yylval变量在移进和归约中传递这些值,YYSTYPE(宏定义)为yylval的类型,默认为int;
我们使用%union来重新定义符号的类型
%union
{
int iValue; /* integer value */
char sIndex; /* symbol table index */
nodeType *nPtr; /* node pointer */
};
%type 定义非终结符其属性值的类型。
%type sym1 sym2
%typesym3
%start 指定文法的开始的文法符号(非终结符),是最终需要规约而成的符号。
定义形式为:%start startsym , 其中startsym为文法的开始符号。
如果不使用%start定义文法开始符号,则默认在第二部分规则段中定义的第一条生产式规则的左部非终结符为开始符号。
规则段 由rule(语法规则)和action(包括C代码)组成。
bison规则基本是BNF,做了一点简化以易于输入。
规则中目标或非终端符放在左边,后跟一个冒号:然后是产生式的右边,之后是对应的动作(用{}包含)。
%% program: program expr '\n' { printf("%d\n", $2); } ; expr: INTEGER {$$ = $1; } | expr '+' expr { $$ = $1 + $3; } | expr '-' expr { $$ = $1 - $3} ; %%
注意:$1表示右边的第一个标记的值,$2表示右边的第二个标记的值,依次类推。$$表示归约后的值。
用户辅助程序段 为C代码,会被原样复制到c文件中,这里一般自定义一些函数。
1 macOS下flex/bison安装
brew install flex brew install bison # macos下flex/bison安装简单方便,另需安装gcc用于编译c语言程序。 brew install gcc
2 flex词法文件:calc.l
%option noyywrap %{ /* * 一个简单计算器的Lex词法文件 */ void yyerror(char*); #include "calc.tab.h" %} %% /* a-z为变量 */ [a-z] { yylval = *yytext - 'a'; return VARIABLE; } /* 整数 */ [0-9]+ { yylval = atoi(yytext); return INTEGER; } /* 运算符 */ [-+()=/*\n] {return *yytext;} /* 空白被忽略 */ [ \t] ; /* 其他字符都是非法的 */ . yyerror("invalid input"); %%
3 bison语法文件:calc.y
%token INTEGER VARIABLE %left '+' '-' %left '*' '/' %{ #include <stdio.h> void yyerror(char*); int yylex(void); int sym[26]; %} %% program: program statement '\n' | ; statement: expr {printf("%d\n", $1);} |VARIABLE '=' expr {sym[$1] = $3;} ; expr: INTEGER |VARIABLE{$$ = sym[$1];} |expr '+' expr {$$ = $1 + $3;} |expr '-' expr {$$ = $1 - $3;} |expr '*' expr {$$ = $1 * $3;} |expr '/' expr {$$ = $1 / $3;} |'('expr')' {$$ = $2;} ; %% void yyerror(char* s) { fprintf(stderr, "%s\n", s); } int main(void) { printf("A simple calculator.\n"); yyparse(); return 0; }</stdio.h>
4 Makefile 文件:
all: clean calc calc: calc.l calc.y bison -d calc.y flex calc.l cc -o $@ calc.tab.c lex.yy.c -lm clean: rm -f calc \ lex.yy.c calc.tab.c calc.tab.h
5 执行
(可以直接执行make也就是执行Makefile中定义的内容,也可以单步执行)
Yolandas-MBP:calcSimple liuyuanyuan$ ls -l total 32 -rw-r--r--@ 1 liuyuanyuan staff 157 8 13 09:53 Makefile -rw-r--r--@ 1 liuyuanyuan staff 507 8 13 10:01 calc.l -rw-r--r--@ 1 liuyuanyuan staff 731 8 13 23:04 calc.y Yolandas-MBP:calcSimple liuyuanyuan$ make rm -f calc \ lex.yy.c calc.tab.c calc.tab.h bison -d calc.y flex calc.l cc -o calc calc.tab.c lex.yy.c -lm Yolandas-MBP:calcSimple liuyuanyuan$ ls -l total 272 -rw-r--r--@ 1 liuyuanyuan staff 157 8 13 09:53 Makefile -rwxr-xr-x 1 liuyuanyuan staff 24600 8 14 12:00 calc -rw-r--r--@ 1 liuyuanyuan staff 507 8 13 10:01 calc.l -rw-r--r-- 1 liuyuanyuan staff 42011 8 14 12:00 calc.tab.c -rw-r--r-- 1 liuyuanyuan staff 2143 8 14 12:00 calc.tab.h -rw-r--r--@ 1 liuyuanyuan staff 731 8 13 23:04 calc.y -rw-r--r-- 1 liuyuanyuan staff 44590 8 14 12:00 lex.yy.c Yolandas-MBP:calcSimple liuyuanyuan$ ./calc A simple calculator. 1+2 3
The above is the detailed content of what is linux bison. For more information, please follow other related articles on the PHP Chinese website!