04 Subset Construction (NFA → DFA)

Why This Matters

The previous page showed that NFAs and DFAs recognize the same languages. The subset construction is how — it's the constructive proof. Given any NFA (with all its nondeterminism and $\varepsilon$-transitions), this algorithm produces a DFA that accepts exactly the same language.

Software analogy: an NFA is like a regex engine that backtracks (exploring multiple paths). A DFA is like a compiled regex that runs in linear time with no backtracking. Subset construction is the compiler that transforms one into the other.

The cost of determinism: An NFA with $n$ states can require a DFA with up to $2^n$ states. In practice, most states are usually unreachable, but the worst case is exponential. This is a real tradeoff — convenience of design (NFA) vs. efficiency of execution (DFA).

The Core Idea

Key insight: At any point during NFA execution, the NFA is in some set of states simultaneously. The DFA simulates this by making each DFA state correspond to a set of NFA states. The DFA deterministically tracks which states the NFA could be in.

If the NFA has states $\{q_0, q_1, q_2\}$, then the DFA has states that are subsets of this set: $\emptyset, \{q_0\}, \{q_1\}, \{q_2\}, \{q_0, q_1\}, \{q_0, q_2\}, \{q_1, q_2\}, \{q_0, q_1, q_2\}$. That's $2^3 = 8$ possible DFA states (though not all will be reachable).

The Algorithm

Recall from page 03: $E(q)$ is the $\varepsilon$-closure of state $q$ — the set of all states reachable from $q$ by following zero or more $\varepsilon$-transitions (including $q$ itself). For a set of states: $E(S) = \bigcup_{q \in S} E(q)$. This function appears throughout the algorithm below.

Subset Construction. Given NFA $N = (Q, \Sigma, \delta_N, q_0, F_N)$, construct DFA $D = (Q', \Sigma, \delta_D, q_0', F')$ where:

1 $Q' = \mathcal{P}(Q)$ — each DFA state is a subset of NFA states

2 $q_0' = E(q_0)$ — DFA start state = $\varepsilon$-closure of NFA start state

3 For each DFA state $S \subseteq Q$ and symbol $a \in \Sigma$: $$\delta_D(S, a) = E\!\left(\bigcup_{q \in S} \delta_N(q, a)\right)$$ "From each NFA state in S, take all a-transitions, union the results, then ε-close."

4 $F' = \{S \in Q' \mid S \cap F_N \neq \emptyset\}$ — accept if the set contains ANY NFA accept state

Let's break down step 3, since that's where the work happens:

For each NFA state $q$ in the current set $S$: look up $\delta_N(q, a)$ — the set of states $q$ can reach on symbol $a$
Union all results: combine all reachable states into one big set
$\varepsilon$-close: add any states reachable by $\varepsilon$-transitions from the union

Don't forget $\varepsilon$-closure! It applies in TWO places:

At the start: the DFA start state is $E(q_0)$, not just $\{q_0\}$
After every transition: close the result set under $\varepsilon$

Forgetting $\varepsilon$-closure is the #1 error in subset construction.

The Dead State $\emptyset$

The DFA state $\emptyset$ (empty set) is the dead state — it means "the NFA has no possible states it could be in." Once you reach $\emptyset$, you're stuck:

$\delta_D(\emptyset, a) = E(\bigcup_{q \in \emptyset} \delta_N(q,a)) = E(\emptyset) = \emptyset$ for all $a$
$\emptyset \cap F_N = \emptyset$, so it's never an accept state

This is the formal version of "all NFA branches died."

Worked Example

Let's convert this 3-state NFA to a DFA. The NFA accepts strings over $\{0,1\}$ that contain "1" followed by at least one more symbol (i.e., "1" is not the last character):

NFA $N$

State diagram:

digraph { rankdir=LR; bgcolor="transparent"; node [shape=circle fontname="Helvetica" fontsize=11 width=0.4 fixedsize=true]; edge [fontname="Helvetica" fontsize=10]; __start [shape=none label="" width=0]; __start -> q0; q0 [label="q0" style=filled fillcolor="#d0e8ff"]; q1 [label="q1" style=filled fillcolor="#d0e8ff"]; q2 [label="q2" shape=doublecircle style=filled fillcolor="#c8f7c5"]; q0 -> q0 [label="0"]; q0 -> q0 [label="1"]; q0 -> q1 [label="1"]; q1 -> q2 [label="0"]; q1 -> q2 [label="1"]; q2 -> q2 [label="0"]; q2 -> q2 [label="1"]; }

Transition table:

	0	1	$\varepsilon$
$\to q_0$	$\{q_0\}$	$\{q_0, q_1\}$	$\emptyset$
$q_1$	$\{q_2\}$	$\{q_2\}$	$\emptyset$
$*\; q_2$	$\{q_2\}$	$\{q_2\}$	$\emptyset$

No $\varepsilon$-transitions in this example, so $E(q) = \{q\}$ for all states. This simplifies things — we just track the sets directly.

Step-by-step construction

Step 1: Start state. $q_0' = E(\{q_0\}) = \{q_0\}$

Step 2: Process $\{q_0\}$.

On 0: $\delta_N(q_0, 0) = \{q_0\}$. $E(\{q_0\}) = \{q_0\}$
On 1: $\delta_N(q_0, 1) = \{q_0, q_1\}$. $E(\{q_0, q_1\}) = \{q_0, q_1\}$ — new state!

Step 3: Process $\{q_0, q_1\}$.

On 0: $\delta_N(q_0, 0) \cup \delta_N(q_1, 0) = \{q_0\} \cup \{q_2\} = \{q_0, q_2\}$ — new state!
On 1: $\delta_N(q_0, 1) \cup \delta_N(q_1, 1) = \{q_0, q_1\} \cup \{q_2\} = \{q_0, q_1, q_2\}$ — new state!

Step 4: Process $\{q_0, q_2\}$.

On 0: $\{q_0\} \cup \{q_2\} = \{q_0, q_2\}$ — already known
On 1: $\{q_0, q_1\} \cup \{q_2\} = \{q_0, q_1, q_2\}$ — already known

Step 5: Process $\{q_0, q_1, q_2\}$.

On 0: $\{q_0\} \cup \{q_2\} \cup \{q_2\} = \{q_0, q_2\}$ — already known
On 1: $\{q_0, q_1\} \cup \{q_2\} \cup \{q_2\} = \{q_0, q_1, q_2\}$ — already known

All reachable states processed. Done.

Resulting DFA

State diagram:

digraph { rankdir=LR; bgcolor="transparent"; node [shape=circle fontname="Helvetica" fontsize=11 width=0.5 fixedsize=true]; edge [fontname="Helvetica" fontsize=10]; __start [shape=none label="" width=0]; __start -> A; A [label="A\n{q0}" style=filled fillcolor="#d0e8ff"]; B [label="B\n{q0,q1}" style=filled fillcolor="#d0e8ff"]; C [label="C\n{q0,q2}" shape=doublecircle style=filled fillcolor="#c8f7c5"]; D [label="D\n{q0,q1,q2}" shape=doublecircle style=filled fillcolor="#c8f7c5"]; A -> A [label="0"]; A -> B [label="1"]; B -> C [label="0"]; B -> D [label="1"]; C -> C [label="0"]; C -> D [label="1"]; D -> C [label="0"]; D -> D [label="1"]; }

Transition table:

DFA State	NFA States	0	1	Accept?
$\to A$	$\{q_0\}$	$A$	$B$	No
$B$	$\{q_0, q_1\}$	$C$	$D$	No
$*\; C$	$\{q_0, q_2\}$	$C$	$D$	Yes ($q_2 \in F$)
$*\; D$	$\{q_0, q_1, q_2\}$	$C$	$D$	Yes ($q_2 \in F$)

Out of the theoretical $2^3 = 8$ possible DFA states, only 4 are reachable. This is typical — most subset construction results are much smaller than the worst case.

Verify: The DFA accepts strings containing "1" followed by at least one more symbol. State A = haven't seen 1. State B = just saw 1. States C, D = have seen "1x" for some x. Both C and D are accepting because they contain $q_2$. Actually, looking more carefully: the NFA accepts "1" followed by any one symbol — the language is "contains 1 followed by at least one more character." The key point: the DFA recognizes the same language as the NFA.

Subset Construction Explorer

NFA: accepts strings over {a,b} ending with "ab" (3 states: q0, q1, q2).

Click to build the DFA state-by-state. Each click processes one unprocessed DFA state.

Common Mistakes

1. Forgetting $\varepsilon$-closure. Must be applied to the start state AND after every transition. If the NFA has no $\varepsilon$-transitions, $E(S) = S$ and you can skip it — but know WHY you're skipping it.

2. Not exploring all reachable states. Use a worklist algorithm: maintain a queue of unprocessed DFA states. Process each one (compute transitions on all symbols), add new states to the queue. Stop when the queue is empty.

3. Forgetting the $\emptyset$ state. If the union of transitions gives the empty set, the DFA state is $\emptyset$ — the dead state. It must be included in the DFA. It loops to itself on all symbols.

4. Wrong accept condition. A DFA state is accepting if it contains ANY NFA accept state. Not all. Not most. Any. One is enough.

Summary

Subset Construction Recipe:

Start: DFA start = $E(\{q_0\})$
For each unprocessed DFA state $S$ and each symbol $a$:
$\delta_D(S, a) = E\!\left(\bigcup_{q \in S} \delta_N(q, a)\right)$
Accept: any DFA state containing an NFA accept state
Repeat until no new states are discovered

This proves constructively that for every NFA there exists an equivalent DFA. Worst case: $2^n$ DFA states for an $n$-state NFA.

← NFA Index Regular Operations →

	0	1	\(\varepsilon\)
\(\to q_0\)	\(\{q_0\}\)	\(\{q_0, q_1\}\)	\(\emptyset\)
\(q_1\)	\(\{q_2\}\)	\(\{q_2\}\)	\(\emptyset\)
\(*\; q_2\)	\(\{q_2\}\)	\(\{q_2\}\)	\(\emptyset\)

DFA State	NFA States	0	1	Accept?
\(\to A\)	\(\{q_0\}\)	\(A\)	\(B\)	No
\(B\)	\(\{q_0, q_1\}\)	\(C\)	\(D\)	No
\(*\; C\)	\(\{q_0, q_2\}\)	\(C\)	\(D\)	Yes (\(q_2 \in F\))
\(*\; D\)	\(\{q_0, q_1, q_2\}\)	\(C\)	\(D\)	Yes (\(q_2 \in F\))

NFA \(N\)

Step-by-step construction

Resulting DFA

Subset Construction Explorer