07 GNFA & State Elimination

Why State Elimination?

We know Regex → NFA (Thompson's construction). But what about the reverse? Given a DFA or NFA, how do you find a regex for its language? You need state elimination — the algorithm that converts any finite automaton to a regular expression. This closes the equivalence triangle.

The approach: simplify the automaton by removing states one at a time, replacing each removed state's transitions with regex labels that capture all the paths that used to go through it. When only start and accept remain, the label between them IS the regex.

GNFA: Generalized NFA

Definition. A Generalized NFA (GNFA) is a 5-tuple $G = (Q, \Sigma, \delta, q_s, q_a)$ where:

$Q$ — finite set of states
$\Sigma$ — alphabet
$\delta: (Q \setminus \{q_a\}) \times (Q \setminus \{q_s\}) \to \mathcal{R}$ — transition function mapping to regular expressions
$q_s$ — start state
$q_a$ — single accept state ($q_a \neq q_s$)

Three Key Differences from Standard NFA

Transition labels are regular expressions, not single symbols. An arrow can be labeled $ab^* \cup c$.
Exactly one accept state (not a set). And $q_s \neq q_a$.
No arrows INTO $q_s$, no arrows OUT OF $q_a$. The start state has only outgoing arrows; the accept state has only incoming arrows.

If a transition doesn't exist, its label is $\emptyset$ (matches nothing — effectively "no arrow"). Self-loops have a regex label too.

Step 1: Convert DFA/NFA to GNFA

Preparation:

Add a new start state $q_s$ with an $\varepsilon$-transition to the old start.
Add a new accept state $q_a$ with $\varepsilon$-transitions from all old accept states.
Old accept states lose their accepting status.
If there are multiple transitions between the same pair of states, merge them with $\cup$.
Label all transitions with their symbols (each symbol label becomes a regex).
Missing transitions get label $\emptyset$.

Now we have a clean GNFA: one start (no incoming arrows), one accept (no outgoing arrows), regex labels on all transitions.

Step 2: The Elimination Formula

The heart of the algorithm. When we remove ("rip out") a state $q_{\text{rip}}$, every pair of remaining states $(q_i, q_j)$ that used to route through $q_{\text{rip}}$ needs its transition updated.

$$\delta'(q_i, q_j) = \delta(q_i, q_j) \cup \delta(q_i, q_{\text{rip}}) \circ \delta(q_{\text{rip}}, q_{\text{rip}})^* \circ \delta(q_{\text{rip}}, q_j)$$

Let's decode this with an analogy.

The "Bulldozing a City" Analogy

Imagine $q_i$, $q_{\text{rip}}$, and $q_j$ are cities connected by roads (labeled with regexes). We're about to bulldoze $q_{\text{rip}}$. Before we do, we need to build a new direct road from $q_i$ to $q_j$ that captures ALL the ways you could have traveled through $q_{\text{rip}}$.

The new label on $q_i \to q_j$ has two parts:

Direct path: $\delta(q_i, q_j)$ — the old direct road (if any)
Detour through $q_{\text{rip}}$: $\delta(q_i, q_{\text{rip}})$ (enter) $\circ\;$ $\delta(q_{\text{rip}}, q_{\text{rip}})^*$ (loop around) $\circ\;$ $\delta(q_{\text{rip}}, q_j)$ (exit)

The new road = old direct road OR (enter the city, loop any number of times, exit).

digraph { rankdir=LR; bgcolor="transparent"; node [fontname="Helvetica" fontsize=10]; edge [fontname="Helvetica" fontsize=9]; subgraph cluster_before { style=dashed; color="#9a9590"; label="Before ripping q_rip"; fontname="Helvetica"; fontsize=9; labeljust=l; qi [shape=circle label="q_i" style=filled fillcolor="#d0e8ff" width=0.45]; qrip [shape=circle label="q_rip" style=filled fillcolor="#e0e0e0" width=0.5]; qj [shape=circle label="q_j" style=filled fillcolor="#d0e8ff" width=0.45]; qi -> qj [label="R₁ (direct)"]; qi -> qrip [label="R₂ (enter)"]; qrip -> qrip [label="R₃ (loop)"]; qrip -> qj [label="R₄ (exit)"]; } subgraph cluster_after { style=dashed; color="#9a9590"; label="After ripping q_rip"; fontname="Helvetica"; fontsize=9; labeljust=l; qi2 [shape=circle label="q_i" style=filled fillcolor="#d0e8ff" width=0.45]; qj2 [shape=circle label="q_j" style=filled fillcolor="#d0e8ff" width=0.45]; qi2 -> qj2 [label="R₁ ∪ R₂·R₃*·R₄"]; } }

Important Details

$\emptyset^* = \varepsilon$. If $q_{\text{rip}}$ has no self-loop, the loop part is $\emptyset^* = \varepsilon$, which disappears from the concatenation. The detour simplifies to just $R_2 \circ R_4$.
Missing transitions = $\emptyset$. If there's no arrow from $q_i$ to $q_{\text{rip}}$, then $R_2 = \emptyset$, and the detour = $\emptyset \circ \ldots = \emptyset$, which disappears from the union. The direct path alone survives.
Parenthesization matters! When substituting complex regex into the formula, always parenthesize. $ab \cup c$ is NOT the same as $a(b \cup c)$. Forgetting parentheses is the #1 error in state elimination.
Update ALL pairs. When ripping $q_{\text{rip}}$, update $\delta'(q_i, q_j)$ for EVERY pair $(q_i, q_j)$ where $q_i \neq q_{\text{rip}}$ and $q_j \neq q_{\text{rip}}$. This includes self-loops ($q_i = q_j$).

Worked Example

Convert this DFA to a regex. The DFA accepts strings over $\{0,1\}$ ending in 1.

State diagram:

digraph { rankdir=LR; bgcolor="transparent"; node [shape=circle fontname="Helvetica" fontsize=11 width=0.4 fixedsize=true]; edge [fontname="Helvetica" fontsize=10]; __start [shape=none label="" width=0]; __start -> q0; q0 [label="q0" style=filled fillcolor="#d0e8ff"]; q1 [label="q1" shape=doublecircle style=filled fillcolor="#c8f7c5"]; q0 -> q0 [label="0"]; q0 -> q1 [label="1"]; q1 -> q0 [label="0"]; q1 -> q1 [label="1"]; }

Transition table:

	0	1
$\to q_0$	$q_0$	$q_1$
$*\; q_1$	$q_0$	$q_1$

Step 1: Convert to GNFA

Add new start $q_s$ and new accept $q_a$:

digraph { rankdir=LR; bgcolor="transparent"; node [shape=circle fontname="Helvetica" fontsize=11 width=0.4 fixedsize=true]; edge [fontname="Helvetica" fontsize=10]; __start [shape=none label="" width=0]; __start -> qs; qs [label="q_s" style=filled fillcolor="#d0e8ff"]; q0 [label="q_0" style=filled fillcolor="#d0e8ff"]; q1 [label="q_1" style=filled fillcolor="#d0e8ff"]; qa [shape=doublecircle label="q_a" style=filled fillcolor="#c8f7c5"]; qs -> q0 [label="ε"]; q0 -> q0 [label="0"]; q0 -> q1 [label="1"]; q1 -> q0 [label="0"]; q1 -> q1 [label="1"]; q1 -> qa [label="ε"]; }

Transition table as regexes (rows = from, cols = to, missing = $\emptyset$):

	$q_0$	$q_1$	$q_a$
$q_s$	$\varepsilon$	$\emptyset$	$\emptyset$
$q_0$	$0$	$1$	$\emptyset$
$q_1$	$0$	$1$	$\varepsilon$

4 states: $q_s, q_0, q_1, q_a$. We need to rip all middle states until only $q_s$ and $q_a$ remain.

Step 2: Rip $q_0$

Update all pairs from $\{q_s, q_1\}$ to $\{q_1, q_a\}$ (excluding $q_0$).

$\delta'(q_s, q_1)$:

$\delta(q_s, q_1) \cup \delta(q_s, q_0) \circ \delta(q_0, q_0)^* \circ \delta(q_0, q_1)$

$= \emptyset \cup \varepsilon \circ 0^* \circ 1 = 0^*1$

$\delta'(q_s, q_a)$:

$\delta(q_s, q_a) \cup \delta(q_s, q_0) \circ \delta(q_0, q_0)^* \circ \delta(q_0, q_a)$

$= \emptyset \cup \varepsilon \circ 0^* \circ \emptyset = \emptyset$

$\delta'(q_1, q_1)$:

$\delta(q_1, q_1) \cup \delta(q_1, q_0) \circ \delta(q_0, q_0)^* \circ \delta(q_0, q_1)$

$= 1 \cup 0 \circ 0^* \circ 1 = 1 \cup 00^*1$

$\delta'(q_1, q_a)$:

$\delta(q_1, q_a) \cup \delta(q_1, q_0) \circ \delta(q_0, q_0)^* \circ \delta(q_0, q_a)$

$= \varepsilon \cup 0 \circ 0^* \circ \emptyset = \varepsilon$

After ripping $q_0$, we have 3 states: $q_s, q_1, q_a$.

	$q_1$	$q_a$
$q_s$	$0^*1$	$\emptyset$
$q_1$	$1 \cup 00^*1$	$\varepsilon$

Step 3: Rip $q_1$

Only one pair left: $\delta'(q_s, q_a)$.

$\delta(q_s, q_a) \cup \delta(q_s, q_1) \circ \delta(q_1, q_1)^* \circ \delta(q_1, q_a)$

$= \emptyset \cup 0^*1 \circ (1 \cup 00^*1)^* \circ \varepsilon$

$= 0^*1(1 \cup 00^*1)^*$

Result: $R = 0^*1(1 \cup 00^*1)^*$

Verify: This says: start with zero or more 0s, then a 1, then repeat (another 1, or some 0s followed by a 1). Every accepted string ends in 1. Every binary string ending in 1 can be decomposed this way.

Simplification: Note that $00^* = 0^+$, and $1 \cup 0^+1$ means "optionally some 0s, then a 1," which is $0^*1$. So the regex simplifies to $0^*1(0^*1)^* = (0^*1)^+$, which directly reads as "one or more blocks of (any number of 0s followed by a 1)." That's exactly "binary strings ending in 1."

State Elimination Walkthrough

Click through to see each elimination step for the DFA above.

Common Mistakes

Forgetting to parenthesize. When you substitute $R_2 = a \cup b$ into $R_2 \circ R_4$, it becomes $(a \cup b) \circ R_4$, NOT $a \cup b \circ R_4$ (which means $a \cup (b \circ R_4)$ by precedence).
Not updating self-loops. When ripping $q_{\text{rip}}$, the formula also applies when $q_i = q_j$. A state can gain or modify its self-loop.
Skipping the GNFA setup. You MUST add the fresh $q_s$ and $q_a$ first. Without them, the final 2-state GNFA doesn't have the right structure.
Thinking order matters. The ripping order affects the FORM of the resulting regex (some orders give simpler expressions) but not the LANGUAGE. Any order works.

Summary

DFA/NFA → Regex via state elimination:

Convert to GNFA: add fresh start $q_s$ and fresh accept $q_a$
Rip states one at a time using: $$\delta'(q_i, q_j) = \underbrace{\delta(q_i, q_j)}_{\text{direct}} \cup \underbrace{\delta(q_i, q_{\text{rip}}) \circ \delta(q_{\text{rip}}, q_{\text{rip}})^* \circ \delta(q_{\text{rip}}, q_j)}_{\text{enter} \cdot \text{loop}^* \cdot \text{exit}}$$
When 2 states remain: the label from $q_s$ to $q_a$ is the regex

Remember: $\emptyset^* = \varepsilon$. Always parenthesize substitutions. Update ALL pairs including self-loops.

← Regex Index Pumping Lemma →

	0	1
\(\to q_0\)	\(q_0\)	\(q_1\)
\(*\; q_1\)	\(q_0\)	\(q_1\)

	\(q_0\)	\(q_1\)	\(q_a\)
\(q_s\)	\(\varepsilon\)	\(\emptyset\)	\(\emptyset\)
\(q_0\)	\(0\)	\(1\)	\(\emptyset\)
\(q_1\)	\(0\)	\(1\)	\(\varepsilon\)

	\(q_1\)	\(q_a\)
\(q_s\)	\(0^*1\)	\(\emptyset\)
\(q_1\)	\(1 \cup 00^*1\)	\(\varepsilon\)

Three Key Differences from Standard NFA

The "Bulldozing a City" Analogy

Important Details

Step 1: Convert to GNFA

Step 2: Rip \(q_0\)

Step 3: Rip \(q_1\)

State Elimination Walkthrough