Bison
The Yacc-compatible Parser Generator
10 September 2021, Bison Version 3.8.1
by Charles Donnelly and Richard Stallman
目录1.1 Languages and Context-Free Grammars
Here is a simple C function subdivided into tokens:
int /* keyword 'int' */ square (int x) /* identifier, open-paren, keyword 'int', identifier, close-paren */ { /* open-brace */ return x * x; /* keyword 'return', identifier, asterisk, identifier, semicolon */ } /* close-brace */
The syntactic groupings of C include the expression, the statement, the
declaration, and the function definition. These are represented in the
grammar of C by nonterminal symbols 'expression', 'statement', 'declaration'
and 'function definition'. The full grammar uses dozens of additional
language constructs, each with its own nonterminal symbol, in order to
express the meanings of these four. The example above is a function
definition; it contains one declaration, and one statement. In the statement,
each 'x' is an expression and so is 'x * x'.One nonterminal symbol must be distinguished as the special one which defines
a complete utterance in the language. It is called the start symbol. In a
compiler, this means a complete input program. In the C language, the
nonterminal symbol 'sequence of definitions and declarations' plays this
role.
5 The Bison Parser Algorithm
As Bison reads tokens, it pushes them onto a stack along with their
semantic values. The stack is called the parser stack. Pushing a token is
traditionally called shifting.For example, suppose the infix calculator has read '1 + 5 *', with a '3' to
come. The stack will have four elements, one for each token that was shifted.But the stack does not always have an element for each token read. When the
last n tokens and groupings shifted match the components of a grammar
rule, they can be combined according to that rule. This is called
reduction. Those tokens and groupings are replaced on the stack by a
single grouping whose symbol is the result (left hand side) of that rule.
Running the rule's action is part of the process of reduction, because this
is what computes the semantic value of the resulting grouping.The parser tries, by shifts and reductions, to reduce the entire input down
to a single grouping whose symbol is the grammar's start-symbol (see 1.1
[Languages and Context-Free Grammars]).This kind of parser is known in the literature as a bottom-up parser.
5.1 Lookahead Tokens
The Bison parser does not always reduce immediately as soon as the last n
tokens and groupings match a rule.Here is a simple case where lookahead is needed. These three rules define
expressions which contain binary addition operators and postfix unary
factorial operators ('!'), and allow parentheses for grouping.expr: term '+' expr | term ; term: '(' expr ')' | term '!' | "number" ;
Suppose that the tokens '1 + 2' have been read and shifted; what should be
done?
- If the following token is '!', then it must be shifted immediately so
that '2 !' can be reduced to make a term. If instead the parser were to
reduce before shifting, '1 + 2' would become an expr. It would then be
impossible to shift the '!' because doing so would produce on the stack the
sequence of symbols expr '!'. No rule allows that sequence.- If the following token is ')', then the first three tokens must be reduced
to form an expr. This is the only valid course, because shifting the ')'
would produce a sequence of symbols term ')', and no rule allows this.
阅读理解 5.1
这个例子也是一个关于 优先级(和结合性) 的很好的例子。在规则的定义中,优先级(和结
合性)低的操作数(符号)作为规则的子规则先被计算。换句话说,优先级(和结合性)的实
现,就是将高优先级和低优先级的规则分开,而将高优先级(结合性)的规则(符号)作为低
优先级的规则(符号)的子规则(符号)。
它在这个例子中的体现,包括了:
- 加法(左结合性)中左侧的操作数,在规则递归中先被算出,即term在语法树中低于expr。
- 阶乘(优先级高于加法),在规则递归中先被算出,即由term构成expr,而不是反过来。
同时,look ahead是为了保证当前的Recude不影响后面的Reduce。而后面的Reduction可能正好
就作用于优先级(和结合性)更高(在语法树中更底层)的规则。换句话说,look ahead也就是为
了使优先级(和结合性)更高的规则有机会先于更低的规则被执行;在这种情况下,高优先级的规
则作为一个单独且唯一可行的规则,在look ahead中保证了它的操作数的完整性(即不会被先出现
在文本中但是优先级(和结合性)更低的规则"划走")。
5.2 Shift/Reduce Conflicts
Suppose we are parsing a language which has if-then and if-then-else
statements, with a pair of rules like this:if_stmt: "if" expr "then" stmt | "if" expr "then" stmt "else" stmt ;
When the "else" token is read and becomes the lookahead token, the contents
of the stack (assuming the input is valid) are just right for reduction by
the first rule. But it is also legitimate to shift the "else", because that
would lead to eventual reduction by the second rule.This situation, where either a shift or a reduction would be valid, is
called a shift/reduce conflict. Bison is designed to resolve these
conflicts by choosing to shift, unless otherwise directed by operator
precedence declarations. (It would ideally be cleaner to write an unambiguous
grammar, but that is very hard to do in this case.) This particular ambiguity
was first encountered in the specifications of Algol 60 and is called the
"dangling else" ambiguity.if e1 then if e2 then s1 • else s2
5.3.1 When Precedence is Needed
Consider the following ambiguous grammar fragment:
expr: expr '-' expr | expr '*' expr | expr '<' expr | '(' expr ')' ;...
Suppose the parser has seen the tokens '1', '-' and '2', should it reduce
them via the rule for the subtraction operator?To decide which one Bison should do, we must consider the results. If the
next operator token op is shifted, then it must be reduced first in
order to permit another opportunity to reduce the difference. The result is
(in effect) '1 - (2 op 3)'. On the other hand, if the subtraction is reduced
before shifting op, the result is '(1 - 2) op 3'. Clearly, then, the
choice of shift or reduce should depend on the relative precedence of the
operators '-' and op: '*' should be shifted first, but not '<'.What about input such as '1 - 2 - 5', should this be '(1 - 2) - 5' or should
it be '1 - (2 - 5)'? For most operators we prefer the former, which is called
left association. The latter alternative, right association, is desirable for
assignment operators. The choice of left or right association is a matter of
whether the parser chooses to shift or reduce when the stack contains '1 - 2'
and the lookahead token is '-': shifting makes right-associativity.
阅读理解 5.3.1
我们使用的数据结构是栈,所以,规约是从右往左进行的,也就是从栈顶开始的。这决定了:
- 优先级(和结合性)更高的规则的子规则可以先(于栈内低优先级规则的规约)入栈;
因为在出栈的时候,它会先于处于栈内更底层的低优先级规则被规约。即为shift。 - 而在遇到优先级更低的规则时,则需要先将栈内高优先级的规则先做规约。即为reduce。
或者,反过来,这其实决定了:为了可以这样做,并将Bison的规则简化为仅以一次look ahead
决定shift or reduce,需要将高优先级的规则设计为独立且唯一的子规则。
于是,在look ahead的时候:
- 遇到独立且唯一的(子)规则:做移进
- 遇到在语法树中更高层的(父)规则:做规约,再做移进
5.3.2 Specifying Operator Precedence
%left
left-associative%right
right-associative%nonassoc
, declares that it is an error to find the same operator
twice in a/an row/expression.%precedence
without associativity
The relative precedence of different operators is controlled by the order in
which they are declared. The first precedence/associativity declaration in
the file declares the operators whose precedence is lowest, the next such
declaration declares the operators whose precedence is a little higher, and
so on.
阅读理解 5.3.2
- 结合性标签同时指定了优先级。结合性相当于是在处理同一优先级内的符号之间的优先级。
- Bison规定了将低优先级的符号写在相对的上面,高优先级的符号写在相对下面,与这些符号
在语法树中对应节点的层级是相对应的。
5.3.3 Specifying Precedence Only
Since POSIX Yacc defines only %left, %right, and %nonassoc, which all defines
precedence and associativity, little attention is paid to the fact that
precedence cannot be defined without defining associativity. Yet, sometimes,
when trying to solve a conflict, precedence suffices. In such a case, using
%left, %right, or %nonassoc might hide future (associativity related)
conflicts that would remain hidden.The dangling else ambiguity
if e1 then if e2 then s1 • else s2
can be
solved explicitly.The conflict involves the reduction of the rule 'IF expr THEN stmt', which
precedence is by default that of its last token (THEN), and the shifting
of the token ELSE. The usual disambiguation (attach the else to the closest
if), shifting must be preferred, i.e., the precedence of ELSE must be
higher than that of THEN. But neither is expected to be involved in an
associativity related conflict, which can be specified as follows.%precedence THEN %precedence ELSE
The unary-minus is another typical example where associativity is usually
over-specified.
阅读理解 5.3.3
结合性(和优先级)标签解决移进/规约冲突的token识别机制:
- 规约:token为当前栈内的已经被移进过的最后一个token
- 移进:token为look ahead的那个token,移进意味着,后续出栈将优先被弹出栈并规约
5.3.5 How Precedence Works
The first effect of the precedence declarations is to assign precedence
levels to the terminal symbols declared. The second effect is to assign
precedence levels to certain rules: each rule gets its precedence from the
last terminal symbol mentioned in the components. (You can also specify
explicitly the precedence of a rule. See 5.4 [Context-Dependent Precedence])Finally, the resolution of conflicts works by comparing the precedence of the
rule being considered with that of the lookahead token. If the
token's precedence is higher, the choice is to shift. If the rule's
precedence is higher, the choice is to reduce. If they have equal precedence,
the choice is made based on the associativity of that precedence level. The
verbose output file made by -v (see 9 [Invoking Bison]) says how each
conflict was resolved.Not all rules and not all tokens have precedence. If either the rule or the
lookahead token has no precedence, then the default is to shift.
5.3.6 Using Precedence For Non Operators
标签:优先级,precedence,expr,rule,token,Bison From: https://www.cnblogs.com/qqiwei/p/18244643Alternatively, you may give both tokens the same precedence, in which case
associativity is used to solve the conflict. To preserve the shift action,
use right associativity:%right "then" "else"
Neither solution is perfect however. Since Bison does not provide, so far,
scoped precedence, both force you to declare the precedence of these keywords
with respect to the other operators in your grammar. Therefore, instead of
being warned about new conflicts you would be unaware of (e.g., a shift/
reduce conflict due to 'if test then 1 else 2 + 3' being ambiguous: 'if test
then 1 else (2 + 3)' or '(if test then 1 else 2) + 3'?), the conflict will be
already “fixed”.