10 — Pushdown Automata (PDA)
The machine model for context-free languages — an NFA with a stack
Why PDAs?

We've been building two parallel tracks through automata theory. For regular languages, we had a generative tool (regular expressions, page 06) and a recognition tool (DFAs/NFAs, pages 02-03). The two turned out to describe the same class of languages.

For context-free languages, we have the generative side: CFGs (page 09) generate strings through recursive rewriting rules. But what's the machine that recognizes these languages? What's the NFA-equivalent for CFLs?

That machine is the pushdown automaton (PDA). The idea is simple: take an NFA and give it a stack — an unbounded LIFO data structure. The NFA's finite states alone can't count (that's why \(\{0^n1^n\}\) isn't regular). But a stack can: push a symbol for every 0 you read, pop one for every 1. If the stack is empty when the input is done, the counts matched.

The pattern across the hierarchy:
Language classGenerative modelMachine modelMemory
RegularRegular expressionsDFA / NFAFinite (current state only)
Context-freeCFGsPDAFinite state + unbounded stack

And just like DFA/NFA/regex all describe exactly the regular languages, CFGs and PDAs describe exactly the context-free languages. That's the main theorem of this page.

Why specifically a stack? Context-free languages are characterized by nesting and matching — balanced parentheses, matched \(0^n1^n\), palindromes. A stack is precisely the data structure that handles LIFO matching: the last thing you opened is the first thing you close. A queue (FIFO) would actually be too powerful — a queue automaton is equivalent to a Turing machine, because you can simulate a tape by cycling elements through. Random-access memory is similarly too powerful (also a Turing machine). The stack is the right amount of extra memory for context-free languages — enough to count and match, not enough to compute anything.
Formal Definition
Definition (Sipser 2.13). A pushdown automaton (PDA) is a 6-tuple \(M = (Q, \Sigma, \Gamma, \delta, q_0, F)\) where:
  1. \(Q\) — a finite set of states
  2. \(\Sigma\) — a finite input alphabet
  3. \(\Gamma\) — a finite stack alphabet
  4. \(\delta: Q \times \Sigma_\varepsilon \times \Gamma_\varepsilon \to \mathcal{P}(Q \times \Gamma_\varepsilon)\) — the transition function
  5. \(q_0 \in Q\) — the start state
  6. \(F \subseteq Q\) — the set of accept states
where \(\Sigma_\varepsilon = \Sigma \cup \{\varepsilon\}\) and \(\Gamma_\varepsilon = \Gamma \cup \{\varepsilon\}\).

Compare this to the NFA 5-tuple \((Q, \Sigma, \delta, q_0, F)\). Two things changed:

New: \(\Gamma\) — the stack alphabet

The stack has its own alphabet \(\Gamma\), which can differ from \(\Sigma\). Typically \(\Gamma\) includes symbols from \(\Sigma\) plus special markers — most commonly $ for the bottom-of-stack marker. For the \(\{0^n1^n\}\) PDA, you might have \(\Sigma = \{0, 1\}\) but \(\Gamma = \{0, \$\}\).

Changed: \(\delta\) — the transition function

Unpacking \(\delta\). The transition function is: \[\delta: Q \times \Sigma_\varepsilon \times \Gamma_\varepsilon \to \mathcal{P}(Q \times \Gamma_\varepsilon)\]

Each transition depends on three things and produces two things:

InputWhat it means
Current state \(q \in Q\)Where the machine is
Input symbol \(a \in \Sigma_\varepsilon\)Read \(a\) from input (or \(\varepsilon\) = read nothing)
Stack symbol \(b \in \Gamma_\varepsilon\)Pop \(b\) from stack top (or \(\varepsilon\) = pop nothing)
OutputWhat it means
New state \(q' \in Q\)Where the machine goes
Stack symbol \(c \in \Gamma_\varepsilon\)Push \(c\) onto stack (or \(\varepsilon\) = push nothing)

Returns a set of (state, stack-symbol) pairs — this is nondeterministic, just like an NFA.

In other words, on each step, the PDA simultaneously: (1) optionally reads an input symbol, (2) optionally pops from the stack, (3) transitions to a new state, and (4) optionally pushes onto the stack. And it can choose among multiple possible transitions nondeterministically.

PDAs are nondeterministic by default. This is a critical difference from the DFA/NFA story. For finite automata, nondeterminism didn't add power — every NFA has an equivalent DFA (page 04). For pushdown automata, nondeterminism genuinely adds power. Deterministic PDAs (DPDAs) recognize a strictly smaller class of languages than nondeterministic PDAs. The palindrome language \(\{ww^R\}\) requires nondeterminism — there's no DPDA for it. When we say "PDA" without qualification, we always mean nondeterministic.
How PDAs Compute

A PDA computation starts in state \(q_0\) with an empty stack. It reads the input left to right, making transitions that depend on the current state, the next input symbol (or \(\varepsilon\)), and the stack top (or \(\varepsilon\)). Because it's nondeterministic, multiple computation branches can exist simultaneously — just like an NFA.

Acceptance. A PDA \(M\) accepts string \(w\) if there exists some sequence of transitions that:
  1. Starts in \(q_0\) with an empty stack
  2. Reads all of \(w\) (consuming every input symbol)
  3. Ends in a state \(q_f \in F\)

Like NFA acceptance, this is existential — if any branch accepts, the PDA accepts.

We describe a PDA's state at any point during computation as a configuration: the triple \((q,\; w,\; s)\) where \(q\) is the current state, \(w\) is the remaining input, and \(s\) is the stack contents (top-to-bottom).

PDA Transition Notation

State diagrams for PDAs look like NFA diagrams, but each transition arrow is labeled with three things instead of one. The standard shorthand is:

Notation: \(a, b \to c\) on an arrow means:

Examples:

The \(\$\) bottom-of-stack marker

Testing for an empty stack. The formal PDA definition has no built-in way to test whether the stack is empty. The standard trick: at the very beginning, push a special symbol $ onto the stack before doing anything else. If you ever see $ on top again, you know the stack is effectively empty. This is so common that nearly every PDA starts with a transition \(\varepsilon, \varepsilon \to \$\).

Similarly, PDAs can't directly test for end-of-input. The workaround: design the PDA so it can only reach an accept state after consuming all input (the accept condition checks final state after all input is read).
Example: PDA for \(\{0^n 1^n \mid n \geq 0\}\)

This is the canonical example — the language that proved finite automata aren't enough (page 08), and the first CFG we wrote (\(S \to 0S1 \mid \varepsilon\), page 09). Now we build the machine.

Strategy

  1. Push $ to mark the bottom of the stack
  2. For every 0 read, push it onto the stack
  3. When we start reading 1s, switch to popping: for every 1, pop a 0
  4. After reading all input, if the stack has only $ left, accept

Formal description (Sipser Example 2.14)

\(M_1 = (Q, \Sigma, \Gamma, \delta, q_1, F)\) where:

State diagram

digraph { rankdir=LR; bgcolor="transparent"; node [shape=circle fontname="Helvetica" fontsize=11 width=0.4 fixedsize=true]; edge [fontname="Helvetica" fontsize=10]; __start [shape=none label="" width=0]; __start -> q1; q1 [label="q1" shape=doublecircle style=filled fillcolor="#c8f7c5"]; q2 [label="q2" style=filled fillcolor="#d0e8ff"]; q3 [label="q3" style=filled fillcolor="#d0e8ff"]; q4 [label="q4" shape=doublecircle style=filled fillcolor="#c8f7c5"]; q1 -> q2 [label="ε,ε→$"]; q2 -> q2 [label="0,ε→0"]; q2 -> q3 [label="1,0→ε"]; q3 -> q3 [label="1,0→ε"]; q3 -> q4 [label="ε,$→ε"]; }

Reading the transitions:

\(q_1\) is an accept state to handle \(\varepsilon\) (the empty string, with \(n = 0\)). \(q_4\) is the accept state for all other cases.

Trace: \(M_1\) on input 0011

Configuration: (state, remaining input, stack [top first])

(q1, 0011, ε)
  ε,ε→$ — (q2, 0011, $)
  0,ε→0 — (q2, 011, 0$)
  0,ε→0 — (q2, 11, 00$)
  1,0→ε — (q3, 1, 0$)
  1,0→ε — (q3, ε, $)
  ε,$→ε — (q4, ε, ε)

Final state q4 ∈ F, all input consumed → ACCEPT

Each 0 gets pushed. Each 1 pops a 0. Two 0s in, two 1s out — stack empties (except for $), and we accept. If the input were 001 instead, we'd have a 0 still on the stack when input runs out — no transition to \(q_4\) possible, so we reject.

PDA Simulator: \(\{0^n1^n\}\)

Trace the PDA step by step on any binary string.

Example: PDA for \(\{ww^R \mid w \in \{0,1\}^*\}\)

This is where nondeterminism becomes essential. The language of even-length palindromes requires the PDA to "guess" where the middle of the string is — then verify that the second half mirrors the first.

Strategy

  1. Push $ to mark the bottom
  2. Read symbols and push each one onto the stack
  3. Nondeterministically guess "I've reached the middle" and switch to matching mode
  4. In matching mode: for each symbol read, pop from the stack and check they match
  5. When $ is on top, accept (both halves consumed)

State diagram (Sipser Example 2.18)

digraph { rankdir=LR; bgcolor="transparent"; node [shape=circle fontname="Helvetica" fontsize=11 width=0.4 fixedsize=true]; edge [fontname="Helvetica" fontsize=10]; __start [shape=none label="" width=0]; __start -> q1; q1 [label="q1" style=filled fillcolor="#d0e8ff"]; q2 [label="q2" style=filled fillcolor="#d0e8ff"]; q3 [label="q3" style=filled fillcolor="#d0e8ff"]; q4 [label="q4" shape=doublecircle style=filled fillcolor="#c8f7c5"]; q1 -> q2 [label="ε,ε→$"]; q2 -> q2 [label="0,ε→0\n1,ε→1"]; q2 -> q3 [label="ε,ε→ε"]; q3 -> q3 [label="0,0→ε\n1,1→ε"]; q3 -> q4 [label="ε,$→ε"]; }

The key transition is \(q_2 \to q_3\): \(\varepsilon, \varepsilon \to \varepsilon\). This is a free move — no input read, nothing popped, nothing pushed. At every position while in \(q_2\), the PDA nondeterministically spawns a branch that guesses "the middle is here" and switches to matching mode.

Why nondeterminism is necessary here. A deterministic machine would need to know when it's reached the exact middle of the string — but it can't know this without having read the whole string first. The nondeterministic PDA doesn't need to know: it tries every possible midpoint simultaneously, and if one guess turns out correct, that branch accepts.

This is a provable result: there is no deterministic PDA (DPDA) that recognizes \(\{ww^R\}\). The language of even-length palindromes is inherently nondeterministic among CFLs.

Trace: \(M_3\) on input 011110 (which is 011 + 110 = \(w \cdot w^R\))

The accepting branch (guesses middle at position 3):

(q1, 011110, ε)
  ε,ε→$ — (q2, 011110, $)
  0,ε→0 — (q2, 11110, 0$)
  1,ε→1 — (q2, 1110, 10$)
  1,ε→1 — (q2, 110, 110$)
  ε,ε→ε — (q3, 110, 110$) ← guess: middle is HERE
  1,1→ε — (q3, 10, 10$)
  1,1→ε — (q3, 0, 0$)
  0,0→ε — (q3, ε, $)
  ε,$→ε — (q4, ε, ε)

q4 ∈ F, all input consumed → ACCEPT

Other branches (guessing the middle at position 0, 1, 2, 4, 5, or 6) would all fail — the stack wouldn't empty cleanly when the input runs out. But the branch that guessed correctly at position 3 succeeds, and that's enough.

Acceptance Modes: Final State vs. Empty Stack

There are two natural ways to define PDA acceptance:

Accept by final state (our definition): The PDA accepts if, after reading all input, it can be in some state \(q \in F\). The stack contents don't matter.

Accept by empty stack (alternative definition): The PDA accepts if, after reading all input, the stack is empty. There are no accept states — acceptance is purely determined by the stack.

These two definitions are equivalent in power — any language recognized by one type can be recognized by the other.

Sipser uses accept-by-final-state. Some textbooks (Hopcroft-Ullman) prefer accept-by-empty-stack. Doesn't matter — same language class either way. The examples on this page all use accept-by-final-state.

The Equivalence Theorem: CFG = PDA
Theorem 2.20 (Sipser). A language is context free if and only if some pushdown automaton recognizes it.

This is the context-free analog of the regular language equivalence theorem. It means CFGs and PDAs are just two different views of the same language class.

The proof has two directions:

Direction 1: CFG → PDA (Lemma 2.21)

Given a CFG \(G\), we build a PDA \(P\) that simulates \(G\)'s derivations. The key idea: \(P\) uses its stack to hold the intermediate string of a derivation (the mix of variables and terminals produced at each step).

The construction (informal). PDA \(P\) has three states: \(q_{\text{start}}\), \(q_{\text{loop}}\), \(q_{\text{accept}}\).
  1. Push \(\$\) and the start variable \(S\) onto the stack
  2. Repeat in \(q_{\text{loop}}\):
    • (a) If top of stack is a variable \(A\): nondeterministically pick a rule \(A \to w\) and replace \(A\) with \(w\) on the stack
    • (b) If top of stack is a terminal \(a\): read the next input symbol. If it matches \(a\), pop it. If not, this branch dies.
    • (c) If top of stack is \(\$\): all variables have been expanded, all terminals matched — accept.

The nondeterminism handles the fact that multiple rules might apply to a variable — the PDA tries all possibilities in parallel (like an NFA). If any choice of rules produces the input string, the PDA accepts.

Direction 2: PDA → CFG (Lemma 2.27)

The reverse direction is more complex. Given a PDA \(P\), we construct a CFG \(G\) where each variable \(A_{pq}\) generates exactly the strings that take \(P\) from state \(p\) with empty stack to state \(q\) with empty stack. The construction produces rules of the form:

The start variable is \(A_{q_0, q_{\text{accept}}}\). The formal proof involves showing this construction is correct via induction on derivation/computation length (Sipser Claims 2.30 and 2.31). The intuition: every PDA computation that starts and ends with an empty stack either has the first push matching the last pop (captured by \(A_{pq} \to a A_{rs} b\)), or has the stack empty at some intermediate point (captured by \(A_{pq} \to A_{pr} A_{rq}\)).

Exam relevance. You should know that the equivalence holds and understand the CFG → PDA direction well enough to construct a PDA from a grammar. The PDA → CFG direction is conceptually important but the construction details are rarely tested at this level.
Comparison: DFA vs. NFA vs. PDA
DFA NFA PDA
Tuple 5-tuple 5-tuple 6-tuple (+\(\Gamma\))
Memory Current state only Set of current states State + unbounded stack
Determinism Deterministic Nondeterministic Nondeterministic
Nondeterminism adds power? No (DFA = NFA) Yes! (DPDA ⊊ PDA)
\(\varepsilon\)-transitions No Yes Yes
Languages recognized Regular Context-free
Generative equivalent Regular expressions CFGs
Closed under union? Yes Yes
Closed under intersection? Yes No!
Closed under complement? Yes No!
Closed under concatenation? Yes Yes
Closed under star? Yes Yes
Example beyond previous class N/A (base level) \(\{0^n1^n\}\), \(\{ww^R\}\)
Example beyond this class \(\{0^n1^n\}\) \(\{a^n b^n c^n\}\)
The closure property break. This is a big deal. Regular languages are closed under everything (union, intersection, complement, concat, star). Context-free languages lose intersection and complement closure. The classic proof: \(\{a^n b^n c^n\}\) is the intersection of two CFLs — \(\{a^n b^n c^k\}\) and \(\{a^k b^n c^n\}\) — but is not itself context-free. If CFLs were closed under intersection, this would be a contradiction.

PDA Trace: \(\{ww^R \mid w \in \{0,1\}^*\}\)

Enter an even-length binary string. The simulator will test if it's a palindrome and show the accepting trace.

Exercises

Exercise 1: Design a PDA

Give a PDA (as a state diagram with the \(a, b \to c\) notation) that recognizes the language:

\(L = \{a^i b^j \mid i > j \geq 0\}\)

Hint: Push each \(a\), pop on each \(b\). How do you ensure there are more \(a\)'s than \(b\)'s?

Exercise 2: Trace the PDA

Consider the PDA \(M_1\) for \(\{0^n1^n\}\) defined above. Show why the string 0110 is rejected by tracing all possible computation paths.

Exercise 3: From CFG to PDA (conceptual)

Consider the CFG \(G\) with rules \(S \to aTb \mid b\) and \(T \to Ta \mid \varepsilon\). Using the CFG → PDA construction from Lemma 2.21, describe informally how the PDA would accept the string aab.

Hint: First figure out what language \(G\) generates. Then trace how the PDA would simulate the derivation on its stack.

Summary
Pushdown Automata — the essentials: