02 Deterministic Finite Automata (DFA)

Why DFAs?

Imagine you need to write a function that checks if a binary string has an even number of 1s. You'd probably use a boolean flag, flipping it on every 1. That flag is a state. You only need one bit of memory. You're already thinking like a DFA.

A DFA is the formalization of this idea: a machine that reads input one symbol at a time, left to right, transitioning between a finite number of states, and at the end either accepts or rejects. No backtracking, no memory beyond the current state, fully deterministic — for every state and symbol, there's exactly one next state.

DFAs are the foundation of the Chomsky hierarchy (a classification of language families by the type of automaton that recognizes them). They recognize exactly the regular languages — the same class described by regular expressions and NFAs.

Formal Definition

Definition. A deterministic finite automaton (DFA) is a 5-tuple $M = (Q, \Sigma, \delta, q_0, F)$ where:

$Q$ — a finite set of states
$\Sigma$ — a finite alphabet (input symbols)
$\delta: Q \times \Sigma \to Q$ — the transition function (total)
$q_0 \in Q$ — the start state
$F \subseteq Q$ — the set of accept states (also called final states)

Let's unpack each component:

$Q$ — States

The machine's complete memory. A DFA in state $q_3$ has forgotten everything about the input except whatever property $q_3$ encodes. This is both the power (simple) and the limitation (can't count arbitrarily) of DFAs.

$\delta$ — Transition Function

Total function. $\delta$ must be defined for every $(q, a)$ pair — every state, every symbol. No "undefined" transitions. If a DFA diagram doesn't show a transition, there's an implicit transition to a dead state (trap state) that loops on all symbols. This is the key difference from NFAs.

In software terms: $\delta$ is a lookup table with no null entries. It's a Map<[State, Symbol], State> that's exhaustively defined.

$F$ — Accept States

$F$ can be empty ($L(M) = \emptyset$), contain one state, or contain all states ($L(M) = \Sigma^*$). It's just a subset of $Q$.

Extended Transition Function $\hat{\delta}$

Definition. The extended transition function $\hat{\delta}: Q \times \Sigma^* \to Q$ processes an entire string: $$\hat{\delta}(q, \varepsilon) = q$$ $$\hat{\delta}(q, wa) = \delta(\hat{\delta}(q, w), a)$$ where $w \in \Sigma^*$ and $a \in \Sigma$.

Read the recursion: to process string $wa$, first process $w$ (getting to some state), then take one more step on symbol $a$. It's just "run the machine on the whole string."

Acceptance & Language

Definition. A DFA $M$ accepts string $w$ if $\hat{\delta}(q_0, w) \in F$.

The language recognized by $M$ is: $$L(M) = \{w \in \Sigma^* \mid \hat{\delta}(q_0, w) \in F\}$$

Definition. A language $L$ is regular if there exists a DFA $M$ such that $L = L(M)$.

Representations

State Diagram

Visual representation with circles (states), arrows (transitions), double circles (accept states), and an arrow from nowhere pointing to the start state.

Transition Table

For a DFA recognizing "strings over $\{a,b\}$ containing an even number of a's":

State diagram:

digraph { rankdir=LR; bgcolor="transparent"; node [shape=circle fontname="Helvetica" fontsize=11 width=0.4 fixedsize=true]; edge [fontname="Helvetica" fontsize=10]; __start [shape=none label="" width=0]; __start -> q_even; q_even [label="q_even" shape=doublecircle style=filled fillcolor="#c8f7c5"]; q_odd [label="q_odd" style=filled fillcolor="#d0e8ff"]; q_even -> q_odd [label="a"]; q_even -> q_even [label="b"]; q_odd -> q_even [label="a"]; q_odd -> q_odd [label="b"]; }

Transition table:

	$a$	$b$
$\to *\; q_{\text{even}}$	$q_{\text{odd}}$	$q_{\text{even}}$
$q_{\text{odd}}$	$q_{\text{even}}$	$q_{\text{odd}}$

$\to$ marks the start state, $*$ marks accept states. Two states, two symbols, every cell filled — that's what "total function" means.

Dead States (Trap States)

Dead state: A non-accepting state that loops to itself on every symbol. Once you enter it, you can never reach an accept state. In diagrams, dead states are often omitted for clarity — but they're formally required because $\delta$ must be total.

When you see a diagram where some transitions seem "missing," mentally add a dead state with self-loops on all symbols. Every missing arrow goes to the dead state.

Worked Example

Let's build a DFA $M$ that accepts binary strings containing the substring 01.

$M = (Q, \Sigma, \delta, q_0, F)$ where:

$Q = \{q_0, q_1, q_2\}$
$\Sigma = \{0, 1\}$
$q_0$ = haven't seen anything useful yet
$q_1$ = just saw a 0, waiting for a 1
$q_2$ = seen 01, accept forever
$F = \{q_2\}$

State diagram:

digraph { rankdir=LR; bgcolor="transparent"; node [shape=circle fontname="Helvetica" fontsize=11 width=0.4 fixedsize=true]; edge [fontname="Helvetica" fontsize=10]; __start [shape=none label="" width=0]; __start -> q0; q0 [label="q0" style=filled fillcolor="#d0e8ff"]; q1 [label="q1" style=filled fillcolor="#d0e8ff"]; q2 [label="q2" shape=doublecircle style=filled fillcolor="#c8f7c5"]; q0 -> q1 [label="0"]; q0 -> q0 [label="1"]; q1 -> q1 [label="0"]; q1 -> q2 [label="1"]; q2 -> q2 [label="0"]; q2 -> q2 [label="1"]; }

Transition table:

	0	1
$\to q_0$	$q_1$	$q_0$
$q_1$	$q_1$	$q_2$
$*\; q_2$	$q_2$	$q_2$

Now let's trace the string aabb... wait, wrong alphabet. Let's trace 1001:

Step-by-Step DFA Trace

DFA: accepts strings over $\{0,1\}$ containing substring "01"

Input string:

Manual trace of "1001"

Start in $q_0$
Read 1: $\delta(q_0, 1) = q_0$ — still haven't seen a useful prefix
Read 0: $\delta(q_0, 0) = q_1$ — saw a 0, now waiting for a 1
Read 0: $\delta(q_1, 0) = q_1$ — another 0, still waiting (we remember the last 0)
Read 1: $\delta(q_1, 1) = q_2$ — got the 01! Accept state.

Final state $q_2 \in F$ → ACCEPT. The string "1001" contains "01" (starting at position 2).

DFA Design Patterns

When designing DFAs, think about what each state needs to remember:

Counting mod $k$: Use $k$ states. E.g., "even number of 1s" needs 2 states (even/odd).
Pattern matching: States track how much of the pattern you've matched so far. E.g., "contains 01" needs 3 states (no match, seen 0, seen 01).
Last $k$ symbols: Use $2^k$ states (for binary). Each state remembers the last $k$ symbols seen.
Combinations: Use pairs of states (one from each sub-DFA) to track two properties simultaneously. E.g., "even # of 0s AND contains 11" — each state is a tuple tracking both conditions. (This is formalized as the product construction in page 05.)

Key limitation: DFAs can't count unboundedly. They can count mod $k$ (using $k$ states), but they can't remember "how many 0s we've seen" for an arbitrary number. That's why $\{0^n1^n\}$ isn't regular — you'd need infinitely many states to remember the exact count of 0s.

Summary

DFA in a nutshell:

5-tuple: $(Q, \Sigma, \delta, q_0, F)$
Deterministic: exactly one transition per (state, symbol) pair
Total: $\delta$ is defined everywhere — no missing transitions
Accepts: when final state is in $F$ after reading entire input
Recognizes: the class of regular languages
Memory: only the current state — finite, with no auxiliary storage

← Alphabets & Strings Index NFA →

	\(a\)	\(b\)
\(\to *\; q_{\text{even}}\)	\(q_{\text{odd}}\)	\(q_{\text{even}}\)
\(q_{\text{odd}}\)	\(q_{\text{even}}\)	\(q_{\text{odd}}\)

	0	1
\(\to q_0\)	\(q_1\)	\(q_0\)
\(q_1\)	\(q_1\)	\(q_2\)
\(*\; q_2\)	\(q_2\)	\(q_2\)

\(Q\) — States

\(\delta\) — Transition Function

\(F\) — Accept States

State Diagram

Transition Table

Step-by-Step DFA Trace

Manual trace of "1001"