# Transforming grammars to \(LL(1)\)

## Table of Contents

## 1 \(LL(1)\) Parsers (short recap)

\(LL(1)\) parsers are **top-down** parsers that can avoid backtracking by looking 1 token ahead of the current token they are trying to derive. This means that LL(1) grammars contain no ambiguity.

Not all languages and grammars are LL(1), but some non-LL(1) grammars can be converted to LL(1) (when they represent languages that have a LL(1) grammar representation). The algorithm for this transformation can end in an infinite loop, but whenever it terminates it always results in an LL(1) grammar for the same language as the original grammar.

## 2 Transformation algorithm

The following algorithm is run iteratively until the grammar converges:

- Eliminate ambiguous epsilon-derivations
- Eliminate left-recursion
- Factor out ambiguous prefixes

### 2.1 Epsilon-derivation elimination

- Foreach rule \(A\) → ε which causes ambiguity
- Remove the rule
- For every Rule \(x\) = \(B\) → α where α contains K instance of the non-terminal \(A\)
- Remove \(x\)
- Add 2
^{k}rules, one for each possible combination of replacing some \(A\) with ε (deleting \(A\))

#### 2.1.1 Example

S → AB

A → a | ε

B → CC | A | b

C → c

In this example, the rule A → ε causes ambiguity - given the input 'a', at least two parse trees can be constructed. For example:

To solve this, we remove the rule A → ε by adding rules that represent the epsilon derivation (we actually preform the epsilon derivation "offline"):

S → AB | **B**

A → a

B → CC | **a** | b | **ε**

C → c

We introduced another kind of conflict (given the input 'a', do we use the rule S → AB or the rule S → B?), which will be dealt with later on in the section on ambiguous prefixes.

### 2.2 Left Recursion elimination

#### 2.2.1 Direct Left Recursion

Consider rules of the form A → Aα | β, where α and β are any mix of terminals and non-terminals in the grammar and β cannot be left recursive (does not start with non-terminal A).

- For every rule A → Aα | β
- Remove the rule
- Add two new rules:
- A → β A'
- A' → α A' | ε

#### 2.2.2 Indirect Left Recursion

Consider rule sets of the form:

A_{1} → A_{2} α_{1} | β_{1
}
A_{2} → A_{3} α_{2} | β_{2
}
…

A_{n} → A_{1} α_{n} | β_{n}

- Foreach derivation rule A
_{i}→ A_{j}α where i > j- Replace A
_{j}on the RHS (right-hand-side) with all possible derivations of A_{j}in the grammar - Eliminate any local left-recursion introduced by the previous step

- Replace A

#### 2.2.3 Example

S → A | a

A → Bc | b

B → Ab | c

Let A_{1}=S, A_{2}=A, A_{3}=B:

A_{1} → A_{2} | a

A_{2} → A_{3} c | b

A_{3} → A_{2} b | c

- Remove indirect recursion in A
_{3 }A_{1}→ A_{2}| a

A_{2}→ A_{3}c | b

A_{3}→ A_{3}cb | bb | c - Removing Direct Left Recursion A
_{3}→ A_{3}cb

A_{3}→ bbA_{3}’ | cA_{3}’

A_{3}’ → cbA_{3}’ | ε

The resulting grammar:

S → A | a

A → Bc | b

B → bbB’ | cB’

B’ → cbB’ | ε

### 2.3 Factorization (a.k.a. Left Factoring)

To solve problems with ambiguity due to common prefixes, one must factor them out when they lead to multiple possible parses.

#### 2.3.1 Example

expr → subexpr | addexpr

addexpr → num + exper | num

subexpr → num - exper | num

Note that since both **addexpr** and **subexpr** derive to an expression beginning with **num**, the grammar contains ambiguous rules. For example, given the input `1 - 2 + 3`

, the parser will be unable to decide whether to apply the **addexpr** rule or the **subexpr** rule (as it will only see the token `1`

).

This can be solved by factoring out the common prefix **num**:

expr → num afterNum

afterNum → ε | + expr | - expr