10 Pushdown Automata (PDA)

Why PDAs?

We've been building two parallel tracks through automata theory. For regular languages, we had a generative tool (regular expressions, page 06) and a recognition tool (DFAs/NFAs, pages 02-03). The two turned out to describe the same class of languages.

For context-free languages, we have the generative side: CFGs (page 09) generate strings through recursive rewriting rules. But what's the machine that recognizes these languages? What's the NFA-equivalent for CFLs?

That machine is the pushdown automaton (PDA). The idea is simple: take an NFA and give it a stack — an unbounded LIFO data structure. The NFA's finite states alone can't count (that's why $\{0^n1^n\}$ isn't regular). But a stack can: push a symbol for every 0 you read, pop one for every 1. If the stack is empty when the input is done, the counts matched.

The pattern across the hierarchy:

Language class	Generative model	Machine model	Memory
Regular	Regular expressions	DFA / NFA	Finite (current state only)
Context-free	CFGs	PDA	Finite state + unbounded stack

And just like DFA/NFA/regex all describe exactly the regular languages, CFGs and PDAs describe exactly the context-free languages. That's the main theorem of this page.

Why specifically a stack? Context-free languages are characterized by nesting and matching — balanced parentheses, matched $0^n1^n$, palindromes. A stack is precisely the data structure that handles LIFO matching: the last thing you opened is the first thing you close. A queue (FIFO) would actually be too powerful — a queue automaton is equivalent to a Turing machine, because you can simulate a tape by cycling elements through. Random-access memory is similarly too powerful (also a Turing machine). The stack is the right amount of extra memory for context-free languages — enough to count and match, not enough to compute anything.

Formal Definition

Definition (Sipser 2.13). A pushdown automaton (PDA) is a 6-tuple $M = (Q, \Sigma, \Gamma, \delta, q_0, F)$ where:

$Q$ — a finite set of states
$\Sigma$ — a finite input alphabet
$\Gamma$ — a finite stack alphabet
$\delta: Q \times \Sigma_\varepsilon \times \Gamma_\varepsilon \to \mathcal{P}(Q \times \Gamma_\varepsilon)$ — the transition function
$q_0 \in Q$ — the start state
$F \subseteq Q$ — the set of accept states

where $\Sigma_\varepsilon = \Sigma \cup \{\varepsilon\}$ and $\Gamma_\varepsilon = \Gamma \cup \{\varepsilon\}$.

Compare this to the NFA 5-tuple $(Q, \Sigma, \delta, q_0, F)$. Two things changed:

New: $\Gamma$ — the stack alphabet

The stack has its own alphabet $\Gamma$, which can differ from $\Sigma$. Typically $\Gamma$ includes symbols from $\Sigma$ plus special markers — most commonly $ for the bottom-of-stack marker. For the $\{0^n1^n\}$ PDA, you might have $\Sigma = \{0, 1\}$ but $\Gamma = \{0, \$\}$.

Changed: $\delta$ — the transition function

Unpacking $\delta$. The transition function is: \[\delta: Q \times \Sigma_\varepsilon \times \Gamma_\varepsilon \to \mathcal{P}(Q \times \Gamma_\varepsilon)\]

Each transition depends on three things and produces two things:

Input	What it means
Current state $q \in Q$	Where the machine is
Input symbol $a \in \Sigma_\varepsilon$	Read $a$ from input (or $\varepsilon$ = read nothing)
Stack symbol $b \in \Gamma_\varepsilon$	Pop $b$ from stack top (or $\varepsilon$ = pop nothing)

Output	What it means
New state $q' \in Q$	Where the machine goes
Stack symbol $c \in \Gamma_\varepsilon$	Push $c$ onto stack (or $\varepsilon$ = push nothing)

Returns a set of (state, stack-symbol) pairs — this is nondeterministic, just like an NFA.

In other words, on each step, the PDA simultaneously: (1) optionally reads an input symbol, (2) optionally pops from the stack, (3) transitions to a new state, and (4) optionally pushes onto the stack. And it can choose among multiple possible transitions nondeterministically.

PDAs are nondeterministic by default. This is a critical difference from the DFA/NFA story. For finite automata, nondeterminism didn't add power — every NFA has an equivalent DFA (page 04). For pushdown automata, nondeterminism genuinely adds power. Deterministic PDAs (DPDAs) recognize a strictly smaller class of languages than nondeterministic PDAs. The palindrome language $\{ww^R\}$ requires nondeterminism — there's no DPDA for it. When we say "PDA" without qualification, we always mean nondeterministic.

How PDAs Compute

A PDA computation starts in state $q_0$ with an empty stack. It reads the input left to right, making transitions that depend on the current state, the next input symbol (or $\varepsilon$), and the stack top (or $\varepsilon$). Because it's nondeterministic, multiple computation branches can exist simultaneously — just like an NFA.

Acceptance. A PDA $M$ accepts string $w$ if there exists some sequence of transitions that:

Starts in $q_0$ with an empty stack
Reads all of $w$ (consuming every input symbol)
Ends in a state $q_f \in F$

Like NFA acceptance, this is existential — if any branch accepts, the PDA accepts.

We describe a PDA's state at any point during computation as a configuration: the triple $(q,\; w,\; s)$ where $q$ is the current state, $w$ is the remaining input, and $s$ is the stack contents (top-to-bottom).

PDA Transition Notation

State diagrams for PDAs look like NFA diagrams, but each transition arrow is labeled with three things instead of one. The standard shorthand is:

Notation: $a, b \to c$ on an arrow means:

Read $a$ from the input (or $\varepsilon$ = don't read)
Pop $b$ from the stack top (or $\varepsilon$ = don't pop)
Push $c$ onto the stack (or $\varepsilon$ = don't push)

Examples:

$0, \varepsilon \to 0$ — read a 0, pop nothing, push 0 (net effect: push 0)
$1, 0 \to \varepsilon$ — read a 1, pop a 0, push nothing (net effect: pop 0)
$\varepsilon, \varepsilon \to \$$ — read nothing, pop nothing, push $ (free push of marker)
$\varepsilon, \$ \to \varepsilon$ — read nothing, pop $, push nothing (test for empty stack)

The $\$$ bottom-of-stack marker

Testing for an empty stack. The formal PDA definition has no built-in way to test whether the stack is empty. The standard trick: at the very beginning, push a special symbol $ onto the stack before doing anything else. If you ever see $ on top again, you know the stack is effectively empty. This is so common that nearly every PDA starts with a transition $\varepsilon, \varepsilon \to \$$.

Similarly, PDAs can't directly test for end-of-input. The workaround: design the PDA so it can only reach an accept state after consuming all input (the accept condition checks final state after all input is read).

Example: PDA for $\{0^n 1^n \mid n \geq 0\}$

This is the canonical example — the language that proved finite automata aren't enough (page 08), and the first CFG we wrote ($S \to 0S1 \mid \varepsilon$, page 09). Now we build the machine.

Strategy

Push $ to mark the bottom of the stack
For every 0 read, push it onto the stack
When we start reading 1s, switch to popping: for every 1, pop a 0
After reading all input, if the stack has only $ left, accept

Formal description (Sipser Example 2.14)

$M_1 = (Q, \Sigma, \Gamma, \delta, q_1, F)$ where:

$Q = \{q_1, q_2, q_3, q_4\}$
$\Sigma = \{0, 1\}$
$\Gamma = \{0, \$\}$
$q_1$ = start state
$F = \{q_1, q_4\}$

State diagram

digraph { rankdir=LR; bgcolor="transparent"; node [shape=circle fontname="Helvetica" fontsize=11 width=0.4 fixedsize=true]; edge [fontname="Helvetica" fontsize=10]; __start [shape=none label="" width=0]; __start -> q1; q1 [label="q1" shape=doublecircle style=filled fillcolor="#c8f7c5"]; q2 [label="q2" style=filled fillcolor="#d0e8ff"]; q3 [label="q3" style=filled fillcolor="#d0e8ff"]; q4 [label="q4" shape=doublecircle style=filled fillcolor="#c8f7c5"]; q1 -> q2 [label="ε,ε→$"]; q2 -> q2 [label="0,ε→0"]; q2 -> q3 [label="1,0→ε"]; q3 -> q3 [label="1,0→ε"]; q3 -> q4 [label="ε,$→ε"]; }

Reading the transitions:

$q_1 \to q_2$: $\varepsilon, \varepsilon \to \$$ — push the bottom marker without reading input
$q_2 \to q_2$: $0, \varepsilon \to 0$ — reading 0s phase: push each 0
$q_2 \to q_3$: $1, 0 \to \varepsilon$ — switch to reading 1s: pop the first 0
$q_3 \to q_3$: $1, 0 \to \varepsilon$ — continue reading 1s: pop a 0 for each 1
$q_3 \to q_4$: $\varepsilon, \$ \to \varepsilon$ — done: stack has only $, so counts matched

$q_1$ is an accept state to handle $\varepsilon$ (the empty string, with $n = 0$). $q_4$ is the accept state for all other cases.

Trace: $M_1$ on input `0011`

Configuration: (state, remaining input, stack [top first])

(q1, 0011, ε)

  ε,ε→$ — (q2, 0011, $)

  0,ε→0 — (q2, 011, 0$)

  0,ε→0 — (q2, 11, 00$)

  1,0→ε — (q3, 1, 0$)

  1,0→ε — (q3, ε, $)

  ε,$→ε — (q4, ε, ε)

Final state q4 ∈ F, all input consumed → ACCEPT

Each 0 gets pushed. Each 1 pops a 0. Two 0s in, two 1s out — stack empties (except for $), and we accept. If the input were 001 instead, we'd have a 0 still on the stack when input runs out — no transition to $q_4$ possible, so we reject.

PDA Simulator: $\{0^n1^n\}$

Trace the PDA step by step on any binary string.

Example: PDA for $\{ww^R \mid w \in \{0,1\}^*\}$

This is where nondeterminism becomes essential. The language of even-length palindromes requires the PDA to "guess" where the middle of the string is — then verify that the second half mirrors the first.

Strategy

Push $ to mark the bottom
Read symbols and push each one onto the stack
Nondeterministically guess "I've reached the middle" and switch to matching mode
In matching mode: for each symbol read, pop from the stack and check they match
When $ is on top, accept (both halves consumed)

State diagram (Sipser Example 2.18)

digraph { rankdir=LR; bgcolor="transparent"; node [shape=circle fontname="Helvetica" fontsize=11 width=0.4 fixedsize=true]; edge [fontname="Helvetica" fontsize=10]; __start [shape=none label="" width=0]; __start -> q1; q1 [label="q1" style=filled fillcolor="#d0e8ff"]; q2 [label="q2" style=filled fillcolor="#d0e8ff"]; q3 [label="q3" style=filled fillcolor="#d0e8ff"]; q4 [label="q4" shape=doublecircle style=filled fillcolor="#c8f7c5"]; q1 -> q2 [label="ε,ε→$"]; q2 -> q2 [label="0,ε→0\n1,ε→1"]; q2 -> q3 [label="ε,ε→ε"]; q3 -> q3 [label="0,0→ε\n1,1→ε"]; q3 -> q4 [label="ε,$→ε"]; }

The key transition is $q_2 \to q_3$: $\varepsilon, \varepsilon \to \varepsilon$. This is a free move — no input read, nothing popped, nothing pushed. At every position while in $q_2$, the PDA nondeterministically spawns a branch that guesses "the middle is here" and switches to matching mode.

Why nondeterminism is necessary here. A deterministic machine would need to know when it's reached the exact middle of the string — but it can't know this without having read the whole string first. The nondeterministic PDA doesn't need to know: it tries every possible midpoint simultaneously, and if one guess turns out correct, that branch accepts.

This is a provable result: there is no deterministic PDA (DPDA) that recognizes $\{ww^R\}$. The language of even-length palindromes is inherently nondeterministic among CFLs.

Trace: $M_3$ on input `011110` (which is `011` + `110` = $w \cdot w^R$)

The accepting branch (guesses middle at position 3):

(q1, 011110, ε)

  ε,ε→$ — (q2, 011110, $)

  0,ε→0 — (q2, 11110, 0$)

  1,ε→1 — (q2, 1110, 10$)

  1,ε→1 — (q2, 110, 110$)

  ε,ε→ε — (q3, 110, 110$) ← guess: middle is HERE

  1,1→ε — (q3, 10, 10$)

  1,1→ε — (q3, 0, 0$)

  0,0→ε — (q3, ε, $)

  ε,$→ε — (q4, ε, ε)

q4 ∈ F, all input consumed → ACCEPT

Other branches (guessing the middle at position 0, 1, 2, 4, 5, or 6) would all fail — the stack wouldn't empty cleanly when the input runs out. But the branch that guessed correctly at position 3 succeeds, and that's enough.

Acceptance Modes: Final State vs. Empty Stack

There are two natural ways to define PDA acceptance:

Accept by final state (our definition): The PDA accepts if, after reading all input, it can be in some state $q \in F$. The stack contents don't matter.

Accept by empty stack (alternative definition): The PDA accepts if, after reading all input, the stack is empty. There are no accept states — acceptance is purely determined by the stack.

These two definitions are equivalent in power — any language recognized by one type can be recognized by the other.

Final state → empty stack: Add transitions from every accept state that pop all remaining stack symbols, then go to a new accept-by-empty-stack state.
Empty stack → final state: Push a special marker at the start. When you see that marker (meaning the original PDA's stack is empty), transition to a new final state.

Sipser uses accept-by-final-state. Some textbooks (Hopcroft-Ullman) prefer accept-by-empty-stack. Doesn't matter — same language class either way. The examples on this page all use accept-by-final-state.

The Equivalence Theorem: CFG = PDA

Theorem 2.20 (Sipser). A language is context free if and only if some pushdown automaton recognizes it.

This is the context-free analog of the regular language equivalence theorem. It means CFGs and PDAs are just two different views of the same language class.

The proof has two directions:

Direction 1: CFG → PDA (Lemma 2.21)

Given a CFG $G$, we build a PDA $P$ that simulates $G$'s derivations. The key idea: $P$ uses its stack to hold the intermediate string of a derivation (the mix of variables and terminals produced at each step).

The construction (informal). PDA $P$ has three states: $q_{\text{start}}$, $q_{\text{loop}}$, $q_{\text{accept}}$.

Push $\$$ and the start variable $S$ onto the stack
Repeat in $q_{\text{loop}}$:
- (a) If top of stack is a variable $A$: nondeterministically pick a rule $A \to w$ and replace $A$ with $w$ on the stack
- (b) If top of stack is a terminal $a$: read the next input symbol. If it matches $a$, pop it. If not, this branch dies.
- (c) If top of stack is $\$$: all variables have been expanded, all terminals matched — accept.

The nondeterminism handles the fact that multiple rules might apply to a variable — the PDA tries all possibilities in parallel (like an NFA). If any choice of rules produces the input string, the PDA accepts.

Direction 2: PDA → CFG (Lemma 2.27)

The reverse direction is more complex. Given a PDA $P$, we construct a CFG $G$ where each variable $A_{pq}$ generates exactly the strings that take $P$ from state $p$ with empty stack to state $q$ with empty stack. The construction produces rules of the form:

$A_{pq} \to a A_{rs} b$ — input $a$ pushes, the middle part runs with that symbol on the stack, input $b$ pops it
$A_{pq} \to A_{pr} A_{rq}$ — split the computation at an intermediate point $r$ where the stack is empty
$A_{pp} \to \varepsilon$ — going from $p$ to $p$ with empty stack: read nothing

The start variable is $A_{q_0, q_{\text{accept}}}$. The formal proof involves showing this construction is correct via induction on derivation/computation length (Sipser Claims 2.30 and 2.31). The intuition: every PDA computation that starts and ends with an empty stack either has the first push matching the last pop (captured by $A_{pq} \to a A_{rs} b$), or has the stack empty at some intermediate point (captured by $A_{pq} \to A_{pr} A_{rq}$).

Exam relevance. You should know that the equivalence holds and understand the CFG → PDA direction well enough to construct a PDA from a grammar. The PDA → CFG direction is conceptually important but the construction details are rarely tested at this level.

Comparison: DFA vs. NFA vs. PDA

	DFA	NFA	PDA
Tuple	5-tuple	5-tuple	6-tuple (+$\Gamma$)
Memory	Current state only	Set of current states	State + unbounded stack
Determinism	Deterministic	Nondeterministic	Nondeterministic
Nondeterminism adds power?	No (DFA = NFA)		Yes! (DPDA ⊊ PDA)
$\varepsilon$-transitions	No	Yes	Yes
Languages recognized	Regular		Context-free
Generative equivalent	Regular expressions		CFGs
Closed under union?	Yes		Yes
Closed under intersection?	Yes		No!
Closed under complement?	Yes		No!
Closed under concatenation?	Yes		Yes
Closed under star?	Yes		Yes
Example beyond previous class	N/A (base level)		$\{0^n1^n\}$, $\{ww^R\}$
Example beyond this class	$\{0^n1^n\}$		$\{a^n b^n c^n\}$

The closure property break. This is a big deal. Regular languages are closed under everything (union, intersection, complement, concat, star). Context-free languages lose intersection and complement closure. The classic proof: $\{a^n b^n c^n\}$ is the intersection of two CFLs — $\{a^n b^n c^k\}$ and $\{a^k b^n c^n\}$ — but is not itself context-free. If CFLs were closed under intersection, this would be a contradiction.

PDA Trace: $\{ww^R \mid w \in \{0,1\}^*\}$

Enter an even-length binary string. The simulator will test if it's a palindrome and show the accepting trace.

Exercises

Exercise 1: Design a PDA

Give a PDA (as a state diagram with the $a, b \to c$ notation) that recognizes the language:

$L = \{a^i b^j \mid i > j \geq 0\}$

Hint: Push each $a$, pop on each $b$. How do you ensure there are more $a$'s than $b$'s?

Strategy: Push $ as bottom marker. Push each a. When we start reading b's, pop an a for each b. To ensure more a's than b's: at the end, the stack must still have at least one a on it (not just $). We check this by requiring we can pop an a before we see $. States: q1 (start), q2 (pushing a's), q3 (popping on b's), q4 (verify extra a), q5 (accept) Transitions: q1 → q2: ε, ε → $ q2 → q2: a, ε → a (push each a) q2 → q3: b, a → ε (start reading b's, pop a) q3 → q3: b, a → ε (continue popping) q3 → q4: ε, a → ε (pop one more a — proving i > j) q4 → q4: ε, a → ε (pop remaining a's) q4 → q5: ε, $ → ε (accept — stack empty) Also: q2 → q4: ε, a → ε (no b's at all, i > 0 = j) Check: On aab: push $, push a, push a. Read b, pop a. Now pop a (verifying extra). Pop $. Accept. Correct! On ab: push $, push a. Read b, pop a. Try to pop another a — fails (only $ left). Reject. Correct, since i = j = 1.

Exercise 2: Trace the PDA

Consider the PDA $M_1$ for $\{0^n1^n\}$ defined above. Show why the string 0110 is rejected by tracing all possible computation paths.

Exercise 3: From CFG to PDA (conceptual)

Consider the CFG $G$ with rules $S \to aTb \mid b$ and $T \to Ta \mid \varepsilon$. Using the CFG → PDA construction from Lemma 2.21, describe informally how the PDA would accept the string aab.

Hint: First figure out what language $G$ generates. Then trace how the PDA would simulate the derivation on its stack.

Summary

Pushdown Automata — the essentials:

6-tuple: $(Q, \Sigma, \Gamma, \delta, q_0, F)$ — adds stack alphabet $\Gamma$ to the NFA definition
Transition function: $\delta: Q \times \Sigma_\varepsilon \times \Gamma_\varepsilon \to \mathcal{P}(Q \times \Gamma_\varepsilon)$ — read input (or not), pop (or not), push (or not), nondeterministically
Notation: $a, b \to c$ means "read $a$, pop $b$, push $c$"
Nondeterministic by default — and unlike finite automata, deterministic PDAs are strictly weaker
$\$$ trick: push $\$$ at start to detect empty stack; check for $\$$ to know stack is empty
Accept by final state and accept by empty stack are equivalent definitions
The equivalence theorem: a language is context-free iff some PDA recognizes it (CFG = PDA)
CFG → PDA: simulate derivations on the stack (variable replacement construction)
Closure: CFLs are closed under union, concatenation, and star — but NOT intersection or complement
Power source: the stack gives unbounded LIFO memory, enabling matching/nesting that finite automata can't do

← Context-Free Grammars Index CF Pumping Lemma →

Input	What it means
Current state \(q \in Q\)	Where the machine is
Input symbol \(a \in \Sigma_\varepsilon\)	Read \(a\) from input (or \(\varepsilon\) = read nothing)
Stack symbol \(b \in \Gamma_\varepsilon\)	Pop \(b\) from stack top (or \(\varepsilon\) = pop nothing)

New: \(\Gamma\) — the stack alphabet

Changed: \(\delta\) — the transition function

The \(\$\) bottom-of-stack marker

Strategy

Formal description (Sipser Example 2.14)

State diagram

Trace: \(M_1\) on input `0011`

PDA Simulator: \(\{0^n1^n\}\)

Strategy

State diagram (Sipser Example 2.18)

Trace: \(M_3\) on input `011110` (which is `011` + `110` = \(w \cdot w^R\))

Direction 1: CFG → PDA (Lemma 2.21)

Direction 2: PDA → CFG (Lemma 2.27)

PDA Trace: \(\{ww^R \mid w \in \{0,1\}^*\}\)

Exercise 1: Design a PDA

Exercise 2: Trace the PDA

Exercise 3: From CFG to PDA (conceptual)

Output	What it means
New state \(q' \in Q\)	Where the machine goes
Stack symbol \(c \in \Gamma_\varepsilon\)	Push \(c\) onto stack (or \(\varepsilon\) = push nothing)

New: \(\Gamma\) — the stack alphabet

Changed: \(\delta\) — the transition function

The \(\$\) bottom-of-stack marker

Strategy

Formal description (Sipser Example 2.14)

State diagram

Trace: \(M_1\) on input 0011

PDA Simulator: \(\{0^n1^n\}\)

Strategy

State diagram (Sipser Example 2.18)

Trace: \(M_3\) on input 011110 (which is 011 + 110 = \(w \cdot w^R\))

Direction 1: CFG → PDA (Lemma 2.21)

Direction 2: PDA → CFG (Lemma 2.27)

PDA Trace: \(\{ww^R \mid w \in \{0,1\}^*\}\)

Exercise 1: Design a PDA

Exercise 2: Trace the PDA

Exercise 3: From CFG to PDA (conceptual)

Trace: \(M_1\) on input `0011`

Trace: \(M_3\) on input `011110` (which is `011` + `110` = \(w \cdot w^R\))