04 — Subset Construction
Converting any NFA to an equivalent DFA — the proof that nondeterminism doesn't add power
Why This Matters

The previous page showed that NFAs and DFAs recognize the same languages. The subset construction is how — it's the constructive proof. Given any NFA (with all its nondeterminism and \(\varepsilon\)-transitions), this algorithm produces a DFA that accepts exactly the same language.

Software analogy: an NFA is like a regex engine that backtracks (exploring multiple paths). A DFA is like a compiled regex that runs in linear time with no backtracking. Subset construction is the compiler that transforms one into the other.

The cost of determinism: An NFA with \(n\) states can require a DFA with up to \(2^n\) states. In practice, most states are usually unreachable, but the worst case is exponential. This is a real tradeoff — convenience of design (NFA) vs. efficiency of execution (DFA).
The Core Idea
Key insight: At any point during NFA execution, the NFA is in some set of states simultaneously. The DFA simulates this by making each DFA state correspond to a set of NFA states. The DFA deterministically tracks which states the NFA could be in.

If the NFA has states \(\{q_0, q_1, q_2\}\), then the DFA has states that are subsets of this set: \(\emptyset, \{q_0\}, \{q_1\}, \{q_2\}, \{q_0, q_1\}, \{q_0, q_2\}, \{q_1, q_2\}, \{q_0, q_1, q_2\}\). That's \(2^3 = 8\) possible DFA states (though not all will be reachable).

The Algorithm

Recall from page 03: \(E(q)\) is the \(\varepsilon\)-closure of state \(q\) — the set of all states reachable from \(q\) by following zero or more \(\varepsilon\)-transitions (including \(q\) itself). For a set of states: \(E(S) = \bigcup_{q \in S} E(q)\). This function appears throughout the algorithm below.

Subset Construction. Given NFA \(N = (Q, \Sigma, \delta_N, q_0, F_N)\), construct DFA \(D = (Q', \Sigma, \delta_D, q_0', F')\) where:

1 \(Q' = \mathcal{P}(Q)\) — each DFA state is a subset of NFA states
2 \(q_0' = E(q_0)\) — DFA start state = \(\varepsilon\)-closure of NFA start state
3 For each DFA state \(S \subseteq Q\) and symbol \(a \in \Sigma\): $$\delta_D(S, a) = E\!\left(\bigcup_{q \in S} \delta_N(q, a)\right)$$ "From each NFA state in S, take all a-transitions, union the results, then ε-close."
4 \(F' = \{S \in Q' \mid S \cap F_N \neq \emptyset\}\) — accept if the set contains ANY NFA accept state

Let's break down step 3, since that's where the work happens:

  1. For each NFA state \(q\) in the current set \(S\): look up \(\delta_N(q, a)\) — the set of states \(q\) can reach on symbol \(a\)
  2. Union all results: combine all reachable states into one big set
  3. \(\varepsilon\)-close: add any states reachable by \(\varepsilon\)-transitions from the union
Don't forget \(\varepsilon\)-closure! It applies in TWO places: Forgetting \(\varepsilon\)-closure is the #1 error in subset construction.
The Dead State \(\emptyset\)

The DFA state \(\emptyset\) (empty set) is the dead state — it means "the NFA has no possible states it could be in." Once you reach \(\emptyset\), you're stuck:

This is the formal version of "all NFA branches died."

Worked Example

Let's convert this 3-state NFA to a DFA. The NFA accepts strings over \(\{0,1\}\) that contain "1" followed by at least one more symbol (i.e., "1" is not the last character):

NFA \(N\)

State diagram:

digraph { rankdir=LR; bgcolor="transparent"; node [shape=circle fontname="Helvetica" fontsize=11 width=0.4 fixedsize=true]; edge [fontname="Helvetica" fontsize=10]; __start [shape=none label="" width=0]; __start -> q0; q0 [label="q0" style=filled fillcolor="#d0e8ff"]; q1 [label="q1" style=filled fillcolor="#d0e8ff"]; q2 [label="q2" shape=doublecircle style=filled fillcolor="#c8f7c5"]; q0 -> q0 [label="0"]; q0 -> q0 [label="1"]; q0 -> q1 [label="1"]; q1 -> q2 [label="0"]; q1 -> q2 [label="1"]; q2 -> q2 [label="0"]; q2 -> q2 [label="1"]; }

Transition table:

01\(\varepsilon\)
\(\to q_0\)\(\{q_0\}\)\(\{q_0, q_1\}\)\(\emptyset\)
\(q_1\)\(\{q_2\}\)\(\{q_2\}\)\(\emptyset\)
\(*\; q_2\)\(\{q_2\}\)\(\{q_2\}\)\(\emptyset\)

No \(\varepsilon\)-transitions in this example, so \(E(q) = \{q\}\) for all states. This simplifies things — we just track the sets directly.

Step-by-step construction

Step 1: Start state. \(q_0' = E(\{q_0\}) = \{q_0\}\)

Step 2: Process \(\{q_0\}\). Step 3: Process \(\{q_0, q_1\}\). Step 4: Process \(\{q_0, q_2\}\). Step 5: Process \(\{q_0, q_1, q_2\}\). All reachable states processed. Done.

Resulting DFA

State diagram:

digraph { rankdir=LR; bgcolor="transparent"; node [shape=circle fontname="Helvetica" fontsize=11 width=0.5 fixedsize=true]; edge [fontname="Helvetica" fontsize=10]; __start [shape=none label="" width=0]; __start -> A; A [label="A\n{q0}" style=filled fillcolor="#d0e8ff"]; B [label="B\n{q0,q1}" style=filled fillcolor="#d0e8ff"]; C [label="C\n{q0,q2}" shape=doublecircle style=filled fillcolor="#c8f7c5"]; D [label="D\n{q0,q1,q2}" shape=doublecircle style=filled fillcolor="#c8f7c5"]; A -> A [label="0"]; A -> B [label="1"]; B -> C [label="0"]; B -> D [label="1"]; C -> C [label="0"]; C -> D [label="1"]; D -> C [label="0"]; D -> D [label="1"]; }

Transition table:

DFA StateNFA States01Accept?
\(\to A\)\(\{q_0\}\)\(A\)\(B\)No
\(B\)\(\{q_0, q_1\}\)\(C\)\(D\)No
\(*\; C\)\(\{q_0, q_2\}\)\(C\)\(D\)Yes (\(q_2 \in F\))
\(*\; D\)\(\{q_0, q_1, q_2\}\)\(C\)\(D\)Yes (\(q_2 \in F\))

Out of the theoretical \(2^3 = 8\) possible DFA states, only 4 are reachable. This is typical — most subset construction results are much smaller than the worst case.

Verify: The DFA accepts strings containing "1" followed by at least one more symbol. State A = haven't seen 1. State B = just saw 1. States C, D = have seen "1x" for some x. Both C and D are accepting because they contain \(q_2\). Actually, looking more carefully: the NFA accepts "1" followed by any one symbol — the language is "contains 1 followed by at least one more character." The key point: the DFA recognizes the same language as the NFA.

Subset Construction Explorer

NFA: accepts strings over {a,b} ending with "ab" (3 states: q0, q1, q2).

Click to build the DFA state-by-state. Each click processes one unprocessed DFA state.

Common Mistakes
1. Forgetting \(\varepsilon\)-closure. Must be applied to the start state AND after every transition. If the NFA has no \(\varepsilon\)-transitions, \(E(S) = S\) and you can skip it — but know WHY you're skipping it.

2. Not exploring all reachable states. Use a worklist algorithm: maintain a queue of unprocessed DFA states. Process each one (compute transitions on all symbols), add new states to the queue. Stop when the queue is empty.

3. Forgetting the \(\emptyset\) state. If the union of transitions gives the empty set, the DFA state is \(\emptyset\) — the dead state. It must be included in the DFA. It loops to itself on all symbols.

4. Wrong accept condition. A DFA state is accepting if it contains ANY NFA accept state. Not all. Not most. Any. One is enough.
Summary
Subset Construction Recipe:
  1. Start: DFA start = \(E(\{q_0\})\)
  2. For each unprocessed DFA state \(S\) and each symbol \(a\):
    \(\delta_D(S, a) = E\!\left(\bigcup_{q \in S} \delta_N(q, a)\right)\)
  3. Accept: any DFA state containing an NFA accept state
  4. Repeat until no new states are discovered

This proves constructively that for every NFA there exists an equivalent DFA. Worst case: \(2^n\) DFA states for an \(n\)-state NFA.