close

Se connecter

Se connecter avec OpenID

Chapter 12

IntégréTéléchargement
Chapter 12 - 3
Parsing Techniques
1
Section 12.3 Parsing Techniques
• We know (via a theorem) that the contextfree languages are exactly those
languages that are accepted by PDAs.
• When a context-free language can be
recognized by a deterministic final-state
PDA, it is called a deterministic contextfree language.
2
LL(k) grammar
• An LL(k) grammar has the property that a parser
can be constructed to scan an input string from
left to right and build a leftmost derivation by
examining next k input symbols to determine the
unique production for each derivation step.
• If a language has an LL(k) grammar, it is called
an LL(k) language.
• LL(k) languages are deterministic context-free
languages, but there are deterministic contextfree languages that are not LL(k). (See text for
an example on page 730.)
3
Example
Consider the language {anb | n ∊ N}.
(1)
It has the LL(1) grammar S → aS | b.
A parser can examine one input letter to decide whether
to use S → aS or S → b for the next derivation step.
(2)
It has the LL(2) grammar S → aaS | ab | b.
A parser can examine two input letters to determine
whether to use S → aaS or S → ab for the next
derivation step. Notice that the grammar is not LL(1).
(3)
Quiz. Find an LL(3) grammar that is not LL(2).
Solution. S → aaaS | aab | ab | b.
4
Example/Quiz
• Why is the following grammar
S → AB
A → aAb | Λ
B → bB | Λ.
for {anbn + k | n, k ∊ N} an-LL(1) grammar?
• Answer: Any derivation starts with S ➾ AB. The
next derivation step uses one of the productions
A → aAb or A → Λ, depending on whether the
scanned input letter is a or not. The argument is
the same for B-productions.
5
Example
•
The following grammar for {anbn + k | n, k ∊ N} is not LL(k) for every k.
S → aSb | T
T → bT | Λ.
•
•
•
It is not LL(1): Let the input be ab. The first letter a tells us to start with S ➾ aSb.The
letter a in aSb matches the first input letter, so we look ahead to the second input
letter b.This tells us we must use S → T to get S ➾ aSb ➾ aTb.
The look ahead remains at b, so we can’t determine whether it is the last b of the
input string.
So we don’t know whether to choose T → bT or T → Λ for the next step. Thus the
grammar is not LL(1).
It is not LL(2): Let the input be aabb. The first two input letters aa, tell us S ➾ aSb.
The letter a in aSb matches the first input, so we look ahead to the next two-letter
substring ab.We must use S → aSb to get S ➾ aSb ➾ aaSbb.
Now the look ahead becomes bb, so we must use S → T to get S ➾ aSb ➾ aaSbb ➾
aaTbb.
The lookahead remains at bb, so we can’t determine whether these are the last two
b’s of the input string. So we don’t know whether to choose T → bT or T → Λ for the
next step. Thus the grammar is not LL(2).
Quiz (on your time): Find a general argument to show the grammar is not LL(k).
6
Example
• The language {an + k bn | k, n ∊ N} is not deterministic
context-free. So it has no LL(k) grammar for every k.
• Proof: Any PDA for the language must keep a count of
the a’s with the stack so that when the b’s come along
the stack can be popped with each b. But there might
still be a’s on the stack (e.g., when k > 0), so there must
be a nondeterministic state transition to a final state from
the popping state. i.e., We need two instructions like, (i,
b, a, pop, i) and (i, Λ, a, pop, final).
• Note: The previous two examples show that {ambn | m ≤
n} is LL(1) but {ambn | m ≥ n} is not LL(k) for any k.
7
Grammar Transformations
• Left factoring: Sometimes we can “left-factor” an LL(k)
grammar to obtain an equivalent LL(n) grammar where n
< k.
• Example. The grammar S → aaS | ab | b is LL(2) but not
LL(1). But we can factor out the common prefix a from
productions S → aaS | ab to obtain
S → aT
T → aS | b.
This gives the new grammar:
S → aT | b
T → aS | b.
8
Quiz
• Find an LL(k) grammar where k is as small as
possible that is equivalent to the following
grammar.
S → abS | abcT | ab
T → cT | c.
• Solution: The given grammar is LL(3) and it can
be left-factored to become an LL(1) grammar as
follows:
S → abR
R → S | cT | Λ
T → cU
U → T | Λ.
9
Removing left recursion
• A left-recursive grammar is one which has
a derivation of the form
A ➾+ Ax
for some nonterminal A and sentential
form x.
10
Left-recursive grammars are not
LL(k) for any k
• Example. The language {ban | n ∊ N} has a grammar S → Sa | b,
which is left-recursive. It is not LL(k) for any k. Consider the
following cases:
• LL(1) case: If the input string is ba, the lookahead is b. So we don’t
know whether there are any a’s to the right of b. Therefore we don’t
know which production to start the derivation.
• LL(2) case: If the input string is baa, the lookahead is ba. So the
derivation starts with S ➾ Sa. But the a in Sa denotes the a at the
right end of the derived string. So the input string could be ba or
baa. Therefore we don’t know which production to choose next.
• LL(k) case: If the input string is ba…a with k a’s, the lookahead is
ba…a with k – 1 a’s. So the derivation after k –1 steps is S ➾ Sa ➾
Saa ➾ … ➾ Sa…a. Now, the input string could be ba…a (length k)
or ba…a (length k + 1). Therefore we don’t know which production to
choose next.
11
removing left recursion
• Sometimes we can obtain an LL(k) grammar by
removing left recursion.
• Algorithm Idea for direct left recursion:
Transform: A → Aw | Au | Av | a | b.
To:
A → aB | bB
B → wB | uB | vB | Λ.
• Example/Quiz. Remove left recursion.
S → Sa | b.
• Solution. S → bT
T → aT | Λ.
It is LL(1).
12
Example/Quiz
• Example/Quiz. Remove left recursion.
S → Saa | aab | aac.
• Solution: S → aabT | aacT
T → aaT | Λ.
It is LL(3).
• Quiz: Rewrite the previous solution as LL(1).
• Solution: S → aaU
U → bT | cT
T → aaT | Λ.
13
Example
• Example (Removing indirect left recursion). Consider the
following grammar.
S → Ab | a
A → Sa | b.
The grammar is left-recursive because of the indirect left
recursion S ➾ Ab ➾ Sab.
To remove the indirect left recursion, replace A in S → Ab
by the right side of A → Sa | b to obtain S → Sab | bb.
The grammar becomes S → Sab | bb | a. Remove the
left recursion:
S → bbT | aT
T → abT | Λ.
14
Quiz
• Remove left recursion from the grammar:
S → Ab | a
A → SAa | b.
• Solution: Replace A in S → Ab | a by the right
side of A → SAa | b to obtain
S → SAab | bb | a
A → SAa | b.
Now remove the direct left recursion:
S → bbT | aT
T → AabT | Λ
A → SAa | b.
15
Top-Down Parsing of LL
Languages
• LL(k) grammars have top-down parsing algorithms
because a leftmost derivation can be constructed by
starting at the start symbol and proceeding to the desired
string.
• Example/Quiz. Consider the following LL(1) grammar.
S → aSC | b
C → cC | d.
The string aabcdd has the following leftmost derivation,
where each step is uniquely determined by the current
lookahead symbol.
S ➾ aSC ➾ aaSCC ➾ aabCC ➾ aabcCC ➾ aabcdC ➾
aabcdd.
16
Recursive Descent LL(1) Parsing
• A procedure is associated with each nonterminal. We’ll
use the following procedure for LL(1) grammars to match
a symbol with the lookahead symbol.
match(x): if lookahead = x
then lookahead := next input symbol
else error
fi.
• Example. For the preceding example here are two
possible recursive descent procedures:
S: if lookahead = a then match(a); S; C else match(b) fi.
C: if lookahead = c then matct(c); C else match(d) fi.
17
Quiz
• Write recursive descent procedures for the
following LL(1) grammar.
S → aaM
M → bT | cT
T → aaT | Λ.
• Solution:
S: match(a); match(a); M.
M: if lookahead = b then match(b); T else match(c); T fi
T: if lookahead = a then match(a); match(a); T fi
18
Table-Driven LL(1) Parsing
• We’ll start with an example showing how to use
the table to parse a string. Then we’ll discuss
how to construct the table.
• Example. Consider the following LL(1) grammar
and its corresponding parse table.
Grammar: S → aSb | Λ.
a
b
$
Table: S S → aSb S → Λ
S→Λ
19
Parse of the string aabb
20
LL(1) Parse Table Construction
• To construct the table we need two types of sets
constructed from the grammar.
• First Sets: For a sentential form w the set First(w)
consists of any terminal at the left end of w or at the left
end of any sentential form derived from w, including ٨ if
w is ٨ or w derives ٨.
• Example/Quiz. Given the grammar
S → aS | Tb
T → cT | ٨.
Use the definition to find the First sets for aS, cT, T, Tb
and S.
21
Algorithm to Construct First Sets
• First(٨) = {٨}.
• First(aw) = First(a) = {a} for a terminal a.
• If A → w1 | …| wn, then First(A) =
First(w1) ⋃ … ⋃ First(wn).
• Case for First(Aw) where w ≠ ٨ :
If ٨ ∉ First(A), then First(Aw) = First(A).
If ٨ ∈ First(A), then First(Aw) = (First(A) {٨}) ⋃ First(w).
22
Example/Quiz
• Quiz: What happens if a grammar is left-recursive such
as S → Sa | b?
• Answer: The algorithm defines First(Sa) and First(S) in
terms of each other.
• Example/Quiz. Given the grammar
S → abTU | bUT
T → aT | b
U → cU | ٨.
Find the First sets for T, U, TU, and UT.
23
Follow Sets
• Follow Sets: If A is a nonterminal, then Follow(A) is the set of
terminals to the right of A in sentential forms derived from the start
symbol S, where we include $ in Follow(S).
• Example. Given the grammar S → aSb | c | ٨, we have Follow(S) =
{$, b}.
• Example/Quiz. Given the grammar
S → aSb | Tc | T
T → dT | ٨.
Find the Follow sets for S and T.
• Solution:
The derivation S ➾ aSb implies that b ∈ Follow(S). So we have
Follow(S) = {$, b}.
The derivations S ➾ aSb ➾ aTb and S ➾ Tc imply that b, c ∈
Follow(T). The derivation S ➾ T implies that anything following S
also follows T. So we have Follow(T) = {$, b, c}.
24
Algorithm to Construct Follow Sets
Algorithm to Construct Follow Sets (x and y denote
sentential forms)
• $ ∈ Follow(S).
• If A → xB, then Follow(A) ⊂ Follow(B).
• If A → xBy, then (First(y) - {٨}) ⊂ Follow(B).
• If A → xBy and ٨ ∈ First(y), then Follow(A) ⊂
Follow(B).
Notice: Follow(B) is constructed by examining the
productions with B on the right side.
25
Example/Quiz
• Given the grammar
S → aTU | bUT
T → aT | b
U → cU | ٨.
Find the Follow sets for S, T, and U.
• Solution:
Follow(S) = {$}.
Follow(T) = {$, c}.
Follow(U) = {$, a, b}.
26
Algorithm to Construct LL(1) Parse
Table
Let P[x, y] represent the contents of row x column
y of the table. For each production A → w, do the
following actions.
(1) For each terminal a ∈ First(w) put A → w in P[A,
a].
(2) If ٨ ∈ First(w) then for each symbol t ∈
Follow(A) put A → w in P[A, t].
Remark: If the grammar has no ٨-productions,
then there is no need to construct Follow sets.
27
Example
• Construct the parse table for the following LL(1) grammar.
S → AB | ٨
A → aAb | ٨
B → bB | c.
• Solution: Calculate the First sets for the right-hand sides of the
productions:
First(AB) = (First(A) - {٨}) ⋃ First(B) = {a} ⋃ {b, c} = {a, b, c}.
First(٨) = {٨}, First(aAb) = {a}, First(bB) = {b}, First(c) = {c}.
Calculate the Follow sets for the nonterminals:
Follow(S) = {$}, Follow(A) = {b, c}, Follow(B) = {$}.
Construct the table:
28
LL(k) Facts and Notes
• In 1969 Kurki-Suoni showed that the LL(k)
languages form an infinite hierarchy:
For each k there is an LL(k + 1) language that is
not LL(k).
• Example. The language defined by the following
grammar is LL(k + 1) but has no LL(k) grammar,
where a…a stands for a k-length string of a’s.
S → aSA | ٨
A → a…abS | c.
29
Example/Quiz
• Why is the following grammar LL(2) but not LL(1)?
S → aSA | ٨
A → abS | c.
• Answer: Consider the string aab. A derivation must start with S ➾
aSA. Now the lookahead is at the second a in aab, but there are two
choices to pick from: S → aSA and S → ٨. So the grammar is not
LL(1). But two lookahead letters allow the derivation to see the
substring ab or ac, so that S → aSA can be chosen for the next step.
30
The End of Chapter 12 - 3
31
Auteur
Document
Catégorie
Uncategorized
Affichages
4
Taille du fichier
2 272 KB
Étiquettes
1/--Pages
signaler