Converting any NFA/DFA to a regular expression by removing states one by one
Why State Elimination?
We know Regex → NFA (Thompson's construction). But what about the reverse? Given a DFA or NFA, how do you find a regex for its language? You need state elimination — the algorithm that converts any finite automaton to a regular expression. This closes the equivalence triangle.
The approach: simplify the automaton by removing states one at a time, replacing each removed state's transitions with regex labels that capture all the paths that used to go through it. When only start and accept remain, the label between them IS the regex.
GNFA: Generalized NFA
Definition. A Generalized NFA (GNFA) is a 5-tuple \(G = (Q, \Sigma, \delta, q_s, q_a)\) where:
\(Q\) — finite set of states
\(\Sigma\) — alphabet
\(\delta: (Q \setminus \{q_a\}) \times (Q \setminus \{q_s\}) \to \mathcal{R}\) — transition function mapping to regular expressions
\(q_s\) — start state
\(q_a\) — single accept state (\(q_a \neq q_s\))
Three Key Differences from Standard NFA
Transition labels are regular expressions, not single symbols. An arrow can be labeled \(ab^* \cup c\).
Exactly one accept state (not a set). And \(q_s \neq q_a\).
No arrows INTO \(q_s\), no arrows OUT OF \(q_a\). The start state has only outgoing arrows; the accept state has only incoming arrows.
If a transition doesn't exist, its label is \(\emptyset\) (matches nothing — effectively "no arrow"). Self-loops have a regex label too.
Step 1: Convert DFA/NFA to GNFA
Preparation:
Add a new start state \(q_s\) with an \(\varepsilon\)-transition to the old start.
Add a new accept state \(q_a\) with \(\varepsilon\)-transitions from all old accept states.
Old accept states lose their accepting status.
If there are multiple transitions between the same pair of states, merge them with \(\cup\).
Label all transitions with their symbols (each symbol label becomes a regex).
Missing transitions get label \(\emptyset\).
Now we have a clean GNFA: one start (no incoming arrows), one accept (no outgoing arrows), regex labels on all transitions.
Step 2: The Elimination Formula
The heart of the algorithm. When we remove ("rip out") a state \(q_{\text{rip}}\), every pair of remaining states \((q_i, q_j)\) that used to route through \(q_{\text{rip}}\) needs its transition updated.
Imagine \(q_i\), \(q_{\text{rip}}\), and \(q_j\) are cities connected by roads (labeled with regexes). We're about to bulldoze \(q_{\text{rip}}\). Before we do, we need to build a new direct road from \(q_i\) to \(q_j\) that captures ALL the ways you could have traveled through \(q_{\text{rip}}\).
The new label on \(q_i \to q_j\) has two parts:
Direct path: \(\delta(q_i, q_j)\) — the old direct road (if any)
\(\emptyset^* = \varepsilon\). If \(q_{\text{rip}}\) has no self-loop, the loop part is \(\emptyset^* = \varepsilon\), which disappears from the concatenation. The detour simplifies to just \(R_2 \circ R_4\).
Missing transitions = \(\emptyset\). If there's no arrow from \(q_i\) to \(q_{\text{rip}}\), then \(R_2 = \emptyset\), and the detour = \(\emptyset \circ \ldots = \emptyset\), which disappears from the union. The direct path alone survives.
Parenthesization matters! When substituting complex regex into the formula, always parenthesize. \(ab \cup c\) is NOT the same as \(a(b \cup c)\). Forgetting parentheses is the #1 error in state elimination.
Update ALL pairs. When ripping \(q_{\text{rip}}\), update \(\delta'(q_i, q_j)\) for EVERY pair \((q_i, q_j)\) where \(q_i \neq q_{\text{rip}}\) and \(q_j \neq q_{\text{rip}}\). This includes self-loops (\(q_i = q_j\)).
Worked Example
Convert this DFA to a regex. The DFA accepts strings over \(\{0,1\}\) ending in 1.
Verify: This says: start with zero or more 0s, then a 1, then repeat (another 1, or some 0s followed by a 1). Every accepted string ends in 1. Every binary string ending in 1 can be decomposed this way.
Simplification: Note that \(00^* = 0^+\), and \(1 \cup 0^+1\) means "optionally some 0s, then a 1," which is \(0^*1\). So the regex simplifies to \(0^*1(0^*1)^* = (0^*1)^+\), which directly reads as "one or more blocks of (any number of 0s followed by a 1)." That's exactly "binary strings ending in 1."
State Elimination Walkthrough
Click through to see each elimination step for the DFA above.
Common Mistakes
Forgetting to parenthesize. When you substitute \(R_2 = a \cup b\) into \(R_2 \circ R_4\), it becomes \((a \cup b) \circ R_4\), NOT \(a \cup b \circ R_4\) (which means \(a \cup (b \circ R_4)\) by precedence).
Not updating self-loops. When ripping \(q_{\text{rip}}\), the formula also applies when \(q_i = q_j\). A state can gain or modify its self-loop.
Skipping the GNFA setup. You MUST add the fresh \(q_s\) and \(q_a\) first. Without them, the final 2-state GNFA doesn't have the right structure.
Thinking order matters. The ripping order affects the FORM of the resulting regex (some orders give simpler expressions) but not the LANGUAGE. Any order works.
Summary
DFA/NFA → Regex via state elimination:
Convert to GNFA: add fresh start \(q_s\) and fresh accept \(q_a\)
Rip states one at a time using:
$$\delta'(q_i, q_j) = \underbrace{\delta(q_i, q_j)}_{\text{direct}} \cup \underbrace{\delta(q_i, q_{\text{rip}}) \circ \delta(q_{\text{rip}}, q_{\text{rip}})^* \circ \delta(q_{\text{rip}}, q_j)}_{\text{enter} \cdot \text{loop}^* \cdot \text{exit}}$$
When 2 states remain: the label from \(q_s\) to \(q_a\) is the regex
Remember: \(\emptyset^* = \varepsilon\). Always parenthesize substitutions. Update ALL pairs including self-loops.