07 — GNFA & State Elimination
Converting any NFA/DFA to a regular expression by removing states one by one
Why State Elimination?

We know Regex → NFA (Thompson's construction). But what about the reverse? Given a DFA or NFA, how do you find a regex for its language? You need state elimination — the algorithm that converts any finite automaton to a regular expression. This closes the equivalence triangle.

The approach: simplify the automaton by removing states one at a time, replacing each removed state's transitions with regex labels that capture all the paths that used to go through it. When only start and accept remain, the label between them IS the regex.

GNFA: Generalized NFA
Definition. A Generalized NFA (GNFA) is a 5-tuple \(G = (Q, \Sigma, \delta, q_s, q_a)\) where:
  1. \(Q\) — finite set of states
  2. \(\Sigma\) — alphabet
  3. \(\delta: (Q \setminus \{q_a\}) \times (Q \setminus \{q_s\}) \to \mathcal{R}\) — transition function mapping to regular expressions
  4. \(q_s\) — start state
  5. \(q_a\) — single accept state (\(q_a \neq q_s\))

Three Key Differences from Standard NFA

  1. Transition labels are regular expressions, not single symbols. An arrow can be labeled \(ab^* \cup c\).
  2. Exactly one accept state (not a set). And \(q_s \neq q_a\).
  3. No arrows INTO \(q_s\), no arrows OUT OF \(q_a\). The start state has only outgoing arrows; the accept state has only incoming arrows.

If a transition doesn't exist, its label is \(\emptyset\) (matches nothing — effectively "no arrow"). Self-loops have a regex label too.

Step 1: Convert DFA/NFA to GNFA
Preparation:
  1. Add a new start state \(q_s\) with an \(\varepsilon\)-transition to the old start.
  2. Add a new accept state \(q_a\) with \(\varepsilon\)-transitions from all old accept states.
  3. Old accept states lose their accepting status.
  4. If there are multiple transitions between the same pair of states, merge them with \(\cup\).
  5. Label all transitions with their symbols (each symbol label becomes a regex).
  6. Missing transitions get label \(\emptyset\).

Now we have a clean GNFA: one start (no incoming arrows), one accept (no outgoing arrows), regex labels on all transitions.

Step 2: The Elimination Formula

The heart of the algorithm. When we remove ("rip out") a state \(q_{\text{rip}}\), every pair of remaining states \((q_i, q_j)\) that used to route through \(q_{\text{rip}}\) needs its transition updated.

$$\delta'(q_i, q_j) = \delta(q_i, q_j) \cup \delta(q_i, q_{\text{rip}}) \circ \delta(q_{\text{rip}}, q_{\text{rip}})^* \circ \delta(q_{\text{rip}}, q_j)$$

Let's decode this with an analogy.

The "Bulldozing a City" Analogy

Imagine \(q_i\), \(q_{\text{rip}}\), and \(q_j\) are cities connected by roads (labeled with regexes). We're about to bulldoze \(q_{\text{rip}}\). Before we do, we need to build a new direct road from \(q_i\) to \(q_j\) that captures ALL the ways you could have traveled through \(q_{\text{rip}}\).

The new label on \(q_i \to q_j\) has two parts:

The new road = old direct road OR (enter the city, loop any number of times, exit).

digraph { rankdir=LR; bgcolor="transparent"; node [fontname="Helvetica" fontsize=10]; edge [fontname="Helvetica" fontsize=9]; subgraph cluster_before { style=dashed; color="#9a9590"; label="Before ripping q_rip"; fontname="Helvetica"; fontsize=9; labeljust=l; qi [shape=circle label="q_i" style=filled fillcolor="#d0e8ff" width=0.45]; qrip [shape=circle label="q_rip" style=filled fillcolor="#e0e0e0" width=0.5]; qj [shape=circle label="q_j" style=filled fillcolor="#d0e8ff" width=0.45]; qi -> qj [label="R₁ (direct)"]; qi -> qrip [label="R₂ (enter)"]; qrip -> qrip [label="R₃ (loop)"]; qrip -> qj [label="R₄ (exit)"]; } subgraph cluster_after { style=dashed; color="#9a9590"; label="After ripping q_rip"; fontname="Helvetica"; fontsize=9; labeljust=l; qi2 [shape=circle label="q_i" style=filled fillcolor="#d0e8ff" width=0.45]; qj2 [shape=circle label="q_j" style=filled fillcolor="#d0e8ff" width=0.45]; qi2 -> qj2 [label="R₁ ∪ R₂·R₃*·R₄"]; } }

Important Details

Worked Example

Convert this DFA to a regex. The DFA accepts strings over \(\{0,1\}\) ending in 1.

State diagram:

digraph { rankdir=LR; bgcolor="transparent"; node [shape=circle fontname="Helvetica" fontsize=11 width=0.4 fixedsize=true]; edge [fontname="Helvetica" fontsize=10]; __start [shape=none label="" width=0]; __start -> q0; q0 [label="q0" style=filled fillcolor="#d0e8ff"]; q1 [label="q1" shape=doublecircle style=filled fillcolor="#c8f7c5"]; q0 -> q0 [label="0"]; q0 -> q1 [label="1"]; q1 -> q0 [label="0"]; q1 -> q1 [label="1"]; }

Transition table:

01
\(\to q_0\)\(q_0\)\(q_1\)
\(*\; q_1\)\(q_0\)\(q_1\)

Step 1: Convert to GNFA

Add new start \(q_s\) and new accept \(q_a\):

digraph { rankdir=LR; bgcolor="transparent"; node [shape=circle fontname="Helvetica" fontsize=11 width=0.4 fixedsize=true]; edge [fontname="Helvetica" fontsize=10]; __start [shape=none label="" width=0]; __start -> qs; qs [label="q_s" style=filled fillcolor="#d0e8ff"]; q0 [label="q_0" style=filled fillcolor="#d0e8ff"]; q1 [label="q_1" style=filled fillcolor="#d0e8ff"]; qa [shape=doublecircle label="q_a" style=filled fillcolor="#c8f7c5"]; qs -> q0 [label="ε"]; q0 -> q0 [label="0"]; q0 -> q1 [label="1"]; q1 -> q0 [label="0"]; q1 -> q1 [label="1"]; q1 -> qa [label="ε"]; }

Transition table as regexes (rows = from, cols = to, missing = \(\emptyset\)):

\(q_0\)\(q_1\)\(q_a\)
\(q_s\)\(\varepsilon\)\(\emptyset\)\(\emptyset\)
\(q_0\)\(0\)\(1\)\(\emptyset\)
\(q_1\)\(0\)\(1\)\(\varepsilon\)

4 states: \(q_s, q_0, q_1, q_a\). We need to rip all middle states until only \(q_s\) and \(q_a\) remain.

Step 2: Rip \(q_0\)

Update all pairs from \(\{q_s, q_1\}\) to \(\{q_1, q_a\}\) (excluding \(q_0\)).

\(\delta'(q_s, q_1)\):

\(\delta(q_s, q_1) \cup \delta(q_s, q_0) \circ \delta(q_0, q_0)^* \circ \delta(q_0, q_1)\)

\(= \emptyset \cup \varepsilon \circ 0^* \circ 1 = 0^*1\)

\(\delta'(q_s, q_a)\):

\(\delta(q_s, q_a) \cup \delta(q_s, q_0) \circ \delta(q_0, q_0)^* \circ \delta(q_0, q_a)\)

\(= \emptyset \cup \varepsilon \circ 0^* \circ \emptyset = \emptyset\)

\(\delta'(q_1, q_1)\):

\(\delta(q_1, q_1) \cup \delta(q_1, q_0) \circ \delta(q_0, q_0)^* \circ \delta(q_0, q_1)\)

\(= 1 \cup 0 \circ 0^* \circ 1 = 1 \cup 00^*1\)

\(\delta'(q_1, q_a)\):

\(\delta(q_1, q_a) \cup \delta(q_1, q_0) \circ \delta(q_0, q_0)^* \circ \delta(q_0, q_a)\)

\(= \varepsilon \cup 0 \circ 0^* \circ \emptyset = \varepsilon\)

After ripping \(q_0\), we have 3 states: \(q_s, q_1, q_a\).

\(q_1\)\(q_a\)
\(q_s\)\(0^*1\)\(\emptyset\)
\(q_1\)\(1 \cup 00^*1\)\(\varepsilon\)

Step 3: Rip \(q_1\)

Only one pair left: \(\delta'(q_s, q_a)\).

\(\delta(q_s, q_a) \cup \delta(q_s, q_1) \circ \delta(q_1, q_1)^* \circ \delta(q_1, q_a)\)

\(= \emptyset \cup 0^*1 \circ (1 \cup 00^*1)^* \circ \varepsilon\)

\(= 0^*1(1 \cup 00^*1)^*\)

Result: \(R = 0^*1(1 \cup 00^*1)^*\)

Verify: This says: start with zero or more 0s, then a 1, then repeat (another 1, or some 0s followed by a 1). Every accepted string ends in 1. Every binary string ending in 1 can be decomposed this way.

Simplification: Note that \(00^* = 0^+\), and \(1 \cup 0^+1\) means "optionally some 0s, then a 1," which is \(0^*1\). So the regex simplifies to \(0^*1(0^*1)^* = (0^*1)^+\), which directly reads as "one or more blocks of (any number of 0s followed by a 1)." That's exactly "binary strings ending in 1."

State Elimination Walkthrough

Click through to see each elimination step for the DFA above.

Common Mistakes
  1. Forgetting to parenthesize. When you substitute \(R_2 = a \cup b\) into \(R_2 \circ R_4\), it becomes \((a \cup b) \circ R_4\), NOT \(a \cup b \circ R_4\) (which means \(a \cup (b \circ R_4)\) by precedence).
  2. Not updating self-loops. When ripping \(q_{\text{rip}}\), the formula also applies when \(q_i = q_j\). A state can gain or modify its self-loop.
  3. Skipping the GNFA setup. You MUST add the fresh \(q_s\) and \(q_a\) first. Without them, the final 2-state GNFA doesn't have the right structure.
  4. Thinking order matters. The ripping order affects the FORM of the resulting regex (some orders give simpler expressions) but not the LANGUAGE. Any order works.
Summary
DFA/NFA → Regex via state elimination:
  1. Convert to GNFA: add fresh start \(q_s\) and fresh accept \(q_a\)
  2. Rip states one at a time using: $$\delta'(q_i, q_j) = \underbrace{\delta(q_i, q_j)}_{\text{direct}} \cup \underbrace{\delta(q_i, q_{\text{rip}}) \circ \delta(q_{\text{rip}}, q_{\text{rip}})^* \circ \delta(q_{\text{rip}}, q_j)}_{\text{enter} \cdot \text{loop}^* \cdot \text{exit}}$$
  3. When 2 states remain: the label from \(q_s\) to \(q_a\) is the regex
Remember: \(\emptyset^* = \varepsilon\). Always parenthesize substitutions. Update ALL pairs including self-loops.