The machine model for context-free languages — an NFA with a stack
Why PDAs?
We've been building two parallel tracks through automata theory. For regular languages, we had a generative tool (regular expressions, page 06) and a recognition tool (DFAs/NFAs, pages 02-03). The two turned out to describe the same class of languages.
For context-free languages, we have the generative side: CFGs (page 09) generate strings through recursive rewriting rules. But what's the machine that recognizes these languages? What's the NFA-equivalent for CFLs?
That machine is the pushdown automaton (PDA). The idea is simple: take an NFA and give it a stack — an unbounded LIFO data structure. The NFA's finite states alone can't count (that's why \(\{0^n1^n\}\) isn't regular). But a stack can: push a symbol for every 0 you read, pop one for every 1. If the stack is empty when the input is done, the counts matched.
The pattern across the hierarchy:
Language class
Generative model
Machine model
Memory
Regular
Regular expressions
DFA / NFA
Finite (current state only)
Context-free
CFGs
PDA
Finite state + unbounded stack
And just like DFA/NFA/regex all describe exactly the regular languages, CFGs and PDAs describe exactly the context-free languages. That's the main theorem of this page.
Why specifically a stack? Context-free languages are characterized by nesting and matching — balanced parentheses, matched \(0^n1^n\), palindromes. A stack is precisely the data structure that handles LIFO matching: the last thing you opened is the first thing you close. A queue (FIFO) would actually be too powerful — a queue automaton is equivalent to a Turing machine, because you can simulate a tape by cycling elements through. Random-access memory is similarly too powerful (also a Turing machine). The stack is the right amount of extra memory for context-free languages — enough to count and match, not enough to compute anything.
Formal Definition
Definition (Sipser 2.13). A pushdown automaton (PDA) is a 6-tuple \(M = (Q, \Sigma, \Gamma, \delta, q_0, F)\) where:
\(Q\) — a finite set of states
\(\Sigma\) — a finite input alphabet
\(\Gamma\) — a finite stack alphabet
\(\delta: Q \times \Sigma_\varepsilon \times \Gamma_\varepsilon \to \mathcal{P}(Q \times \Gamma_\varepsilon)\) — the transition function
\(q_0 \in Q\) — the start state
\(F \subseteq Q\) — the set of accept states
where \(\Sigma_\varepsilon = \Sigma \cup \{\varepsilon\}\) and \(\Gamma_\varepsilon = \Gamma \cup \{\varepsilon\}\).
Compare this to the NFA 5-tuple \((Q, \Sigma, \delta, q_0, F)\). Two things changed:
New: \(\Gamma\) — the stack alphabet
The stack has its own alphabet \(\Gamma\), which can differ from \(\Sigma\). Typically \(\Gamma\) includes symbols from \(\Sigma\) plus special markers — most commonly $ for the bottom-of-stack marker. For the \(\{0^n1^n\}\) PDA, you might have \(\Sigma = \{0, 1\}\) but \(\Gamma = \{0, \$\}\).
Changed: \(\delta\) — the transition function
Unpacking \(\delta\). The transition function is:
\[\delta: Q \times \Sigma_\varepsilon \times \Gamma_\varepsilon \to \mathcal{P}(Q \times \Gamma_\varepsilon)\]
Each transition depends on three things and produces two things:
Input
What it means
Current state \(q \in Q\)
Where the machine is
Input symbol \(a \in \Sigma_\varepsilon\)
Read \(a\) from input (or \(\varepsilon\) = read nothing)
Stack symbol \(b \in \Gamma_\varepsilon\)
Pop \(b\) from stack top (or \(\varepsilon\) = pop nothing)
Returns a set of (state, stack-symbol) pairs — this is nondeterministic, just like an NFA.
In other words, on each step, the PDA simultaneously: (1) optionally reads an input symbol, (2) optionally pops from the stack, (3) transitions to a new state, and (4) optionally pushes onto the stack. And it can choose among multiple possible transitions nondeterministically.
PDAs are nondeterministic by default. This is a critical difference from the DFA/NFA story. For finite automata, nondeterminism didn't add power — every NFA has an equivalent DFA (page 04). For pushdown automata, nondeterminism genuinely adds power. Deterministic PDAs (DPDAs) recognize a strictly smaller class of languages than nondeterministic PDAs. The palindrome language \(\{ww^R\}\) requires nondeterminism — there's no DPDA for it. When we say "PDA" without qualification, we always mean nondeterministic.
How PDAs Compute
A PDA computation starts in state \(q_0\) with an empty stack. It reads the input left to right, making transitions that depend on the current state, the next input symbol (or \(\varepsilon\)), and the stack top (or \(\varepsilon\)). Because it's nondeterministic, multiple computation branches can exist simultaneously — just like an NFA.
Acceptance. A PDA \(M\) accepts string \(w\) if there exists some sequence of transitions that:
Starts in \(q_0\) with an empty stack
Reads all of \(w\) (consuming every input symbol)
Ends in a state \(q_f \in F\)
Like NFA acceptance, this is existential — if any branch accepts, the PDA accepts.
We describe a PDA's state at any point during computation as a configuration: the triple \((q,\; w,\; s)\) where \(q\) is the current state, \(w\) is the remaining input, and \(s\) is the stack contents (top-to-bottom).
PDA Transition Notation
State diagrams for PDAs look like NFA diagrams, but each transition arrow is labeled with three things instead of one. The standard shorthand is:
Notation: \(a, b \to c\) on an arrow means:
Read \(a\) from the input (or \(\varepsilon\) = don't read)
Pop \(b\) from the stack top (or \(\varepsilon\) = don't pop)
Push \(c\) onto the stack (or \(\varepsilon\) = don't push)
Examples:
\(0, \varepsilon \to 0\) — read a 0, pop nothing, push 0 (net effect: push 0)
\(1, 0 \to \varepsilon\) — read a 1, pop a 0, push nothing (net effect: pop 0)
\(\varepsilon, \varepsilon \to \$\) — read nothing, pop nothing, push $ (free push of marker)
\(\varepsilon, \$ \to \varepsilon\) — read nothing, pop $, push nothing (test for empty stack)
The \(\$\) bottom-of-stack marker
Testing for an empty stack. The formal PDA definition has no built-in way to test whether the stack is empty. The standard trick: at the very beginning, push a special symbol $ onto the stack before doing anything else. If you ever see $ on top again, you know the stack is effectively empty. This is so common that nearly every PDA starts with a transition \(\varepsilon, \varepsilon \to \$\).
Similarly, PDAs can't directly test for end-of-input. The workaround: design the PDA so it can only reach an accept state after consuming all input (the accept condition checks final state after all input is read).
Example: PDA for \(\{0^n 1^n \mid n \geq 0\}\)
This is the canonical example — the language that proved finite automata aren't enough (page 08), and the first CFG we wrote (\(S \to 0S1 \mid \varepsilon\), page 09). Now we build the machine.
Strategy
Push $ to mark the bottom of the stack
For every 0 read, push it onto the stack
When we start reading 1s, switch to popping: for every 1, pop a 0
After reading all input, if the stack has only $ left, accept
Each 0 gets pushed. Each 1 pops a 0. Two 0s in, two 1s out — stack empties (except for $), and we accept. If the input were 001 instead, we'd have a 0 still on the stack when input runs out — no transition to \(q_4\) possible, so we reject.
PDA Simulator: \(\{0^n1^n\}\)
Trace the PDA step by step on any binary string.
Example: PDA for \(\{ww^R \mid w \in \{0,1\}^*\}\)
This is where nondeterminism becomes essential. The language of even-length palindromes requires the PDA to "guess" where the middle of the string is — then verify that the second half mirrors the first.
Strategy
Push $ to mark the bottom
Read symbols and push each one onto the stack
Nondeterministically guess "I've reached the middle" and switch to matching mode
In matching mode: for each symbol read, pop from the stack and check they match
The key transition is \(q_2 \to q_3\): \(\varepsilon, \varepsilon \to \varepsilon\). This is a free move — no input read, nothing popped, nothing pushed. At every position while in \(q_2\), the PDA nondeterministically spawns a branch that guesses "the middle is here" and switches to matching mode.
Why nondeterminism is necessary here. A deterministic machine would need to know when it's reached the exact middle of the string — but it can't know this without having read the whole string first. The nondeterministic PDA doesn't need to know: it tries every possible midpoint simultaneously, and if one guess turns out correct, that branch accepts.
This is a provable result: there is no deterministic PDA (DPDA) that recognizes \(\{ww^R\}\). The language of even-length palindromes is inherently nondeterministic among CFLs.
Trace: \(M_3\) on input 011110 (which is 011 + 110 = \(w \cdot w^R\))
The accepting branch (guesses middle at position 3):
Other branches (guessing the middle at position 0, 1, 2, 4, 5, or 6) would all fail — the stack wouldn't empty cleanly when the input runs out. But the branch that guessed correctly at position 3 succeeds, and that's enough.
Acceptance Modes: Final State vs. Empty Stack
There are two natural ways to define PDA acceptance:
Accept by final state (our definition): The PDA accepts if, after reading all input, it can be in some state \(q \in F\). The stack contents don't matter.
Accept by empty stack (alternative definition): The PDA accepts if, after reading all input, the stack is empty. There are no accept states — acceptance is purely determined by the stack.
These two definitions are equivalent in power — any language recognized by one type can be recognized by the other.
Final state → empty stack: Add transitions from every accept state that pop all remaining stack symbols, then go to a new accept-by-empty-stack state.
Empty stack → final state: Push a special marker at the start. When you see that marker (meaning the original PDA's stack is empty), transition to a new final state.
Sipser uses accept-by-final-state. Some textbooks (Hopcroft-Ullman) prefer accept-by-empty-stack. Doesn't matter — same language class either way. The examples on this page all use accept-by-final-state.
The Equivalence Theorem: CFG = PDA
Theorem 2.20 (Sipser). A language is context free if and only if some pushdown automaton recognizes it.
This is the context-free analog of the regular language equivalence theorem. It means CFGs and PDAs are just two different views of the same language class.
The proof has two directions:
Direction 1: CFG → PDA (Lemma 2.21)
Given a CFG \(G\), we build a PDA \(P\) that simulates \(G\)'s derivations. The key idea: \(P\) uses its stack to hold the intermediate string of a derivation (the mix of variables and terminals produced at each step).
The construction (informal). PDA \(P\) has three states: \(q_{\text{start}}\), \(q_{\text{loop}}\), \(q_{\text{accept}}\).
Push \(\$\) and the start variable \(S\) onto the stack
Repeat in \(q_{\text{loop}}\):
(a) If top of stack is a variable \(A\): nondeterministically pick a rule \(A \to w\) and replace \(A\) with \(w\) on the stack
(b) If top of stack is a terminal \(a\): read the next input symbol. If it matches \(a\), pop it. If not, this branch dies.
(c) If top of stack is \(\$\): all variables have been expanded, all terminals matched — accept.
The nondeterminism handles the fact that multiple rules might apply to a variable — the PDA tries all possibilities in parallel (like an NFA). If any choice of rules produces the input string, the PDA accepts.
Direction 2: PDA → CFG (Lemma 2.27)
The reverse direction is more complex. Given a PDA \(P\), we construct a CFG \(G\) where each variable \(A_{pq}\) generates exactly the strings that take \(P\) from state \(p\) with empty stack to state \(q\) with empty stack. The construction produces rules of the form:
\(A_{pq} \to a A_{rs} b\) — input \(a\) pushes, the middle part runs with that symbol on the stack, input \(b\) pops it
\(A_{pq} \to A_{pr} A_{rq}\) — split the computation at an intermediate point \(r\) where the stack is empty
\(A_{pp} \to \varepsilon\) — going from \(p\) to \(p\) with empty stack: read nothing
The start variable is \(A_{q_0, q_{\text{accept}}}\). The formal proof involves showing this construction is correct via induction on derivation/computation length (Sipser Claims 2.30 and 2.31). The intuition: every PDA computation that starts and ends with an empty stack either has the first push matching the last pop (captured by \(A_{pq} \to a A_{rs} b\)), or has the stack empty at some intermediate point (captured by \(A_{pq} \to A_{pr} A_{rq}\)).
Exam relevance. You should know that the equivalence holds and understand the CFG → PDA direction well enough to construct a PDA from a grammar. The PDA → CFG direction is conceptually important but the construction details are rarely tested at this level.
Comparison: DFA vs. NFA vs. PDA
DFA
NFA
PDA
Tuple
5-tuple
5-tuple
6-tuple (+\(\Gamma\))
Memory
Current state only
Set of current states
State + unbounded stack
Determinism
Deterministic
Nondeterministic
Nondeterministic
Nondeterminism adds power?
No (DFA = NFA)
Yes! (DPDA ⊊ PDA)
\(\varepsilon\)-transitions
No
Yes
Yes
Languages recognized
Regular
Context-free
Generative equivalent
Regular expressions
CFGs
Closed under union?
Yes
Yes
Closed under intersection?
Yes
No!
Closed under complement?
Yes
No!
Closed under concatenation?
Yes
Yes
Closed under star?
Yes
Yes
Example beyond previous class
N/A (base level)
\(\{0^n1^n\}\), \(\{ww^R\}\)
Example beyond this class
\(\{0^n1^n\}\)
\(\{a^n b^n c^n\}\)
The closure property break. This is a big deal. Regular languages are closed under everything (union, intersection, complement, concat, star). Context-free languages lose intersection and complement closure. The classic proof: \(\{a^n b^n c^n\}\) is the intersection of two CFLs — \(\{a^n b^n c^k\}\) and \(\{a^k b^n c^n\}\) — but is not itself context-free. If CFLs were closed under intersection, this would be a contradiction.
PDA Trace: \(\{ww^R \mid w \in \{0,1\}^*\}\)
Enter an even-length binary string. The simulator will test if it's a palindrome and show the accepting trace.
Exercises
Exercise 1: Design a PDA
Give a PDA (as a state diagram with the \(a, b \to c\) notation) that recognizes the language:
\(L = \{a^i b^j \mid i > j \geq 0\}\)
Hint: Push each \(a\), pop on each \(b\). How do you ensure there are more \(a\)'s than \(b\)'s?
Strategy: Push $ as bottom marker. Push each a. When we start reading b's, pop an a for each b. To ensure more a's than b's: at the end, the stack must still have at least one a on it (not just $). We check this by requiring we can pop an a before we see $.
States: q1 (start), q2 (pushing a's), q3 (popping on b's), q4 (verify extra a), q5 (accept)
Transitions:
q1 → q2: ε, ε → $
q2 → q2: a, ε → a (push each a)
q2 → q3: b, a → ε (start reading b's, pop a)
q3 → q3: b, a → ε (continue popping)
q3 → q4: ε, a → ε (pop one more a — proving i > j)
q4 → q4: ε, a → ε (pop remaining a's)
q4 → q5: ε, $ → ε (accept — stack empty)
Also: q2 → q4: ε, a → ε (no b's at all, i > 0 = j)
Check: On aab: push $, push a, push a. Read b, pop a. Now pop a (verifying extra). Pop $. Accept. Correct!
On ab: push $, push a. Read b, pop a. Try to pop another a — fails (only $ left). Reject. Correct, since i = j = 1.
Exercise 2: Trace the PDA
Consider the PDA \(M_1\) for \(\{0^n1^n\}\) defined above. Show why the string 0110 is rejected by tracing all possible computation paths.
Trace of M1 on "0110":
(q1, 0110, ε)
ε,ε→$ — (q2, 0110, $)
0,ε→0 — (q2, 110, 0$)
1,0→ε — (q3, 10, $)
Now in q3 with input "10" and stack "$":
- Try 1,0→ε: need 0 on stack top, but top is $. STUCK.
- Try ε,$→ε to q4: succeeds, reaching (q4, 10, ε).
But q4 has no transitions and input "10" remains unread. STUCK.
Both branches fail. No path reads all input and reaches an accept state.
Why rejected: The string has format 01|10 — it alternates, but isn't 0*1*. After reading 01, the stack is empty (one push, one pop). But two more symbols remain (10), and there's no way to process them. The PDA is designed for the pattern "all 0s then all 1s" — any deviation kills the computation.
Exercise 3: From CFG to PDA (conceptual)
Consider the CFG \(G\) with rules \(S \to aTb \mid b\) and \(T \to Ta \mid \varepsilon\). Using the CFG → PDA construction from Lemma 2.21, describe informally how the PDA would accept the string aab.
Hint: First figure out what language \(G\) generates. Then trace how the PDA would simulate the derivation on its stack.
Step 1: What does G generate?
T → Ta | ε generates a* (zero or more a's).
S → aTb generates a(a*)b = a¹b, a²b, a³b, ... (i.e., a^n b for n ≥ 1)
S → b handles the string "b" alone.
So L(G) = {a^n b | n ≥ 0} = a*b.
Step 2: Derivation of "aab"
S ⇒ aTb ⇒ aTab ⇒ aεab ⇒ aab
(Using S → aTb, then T → Ta, then T → ε)
Step 3: PDA simulation
The PDA P starts with $ and S on the stack.
Stack: S$ Input: aab
Pop S, choose rule S → aTb.
Push b, T, a onto stack (right-to-left, so a is on top).
Stack: aTb$ Input: aab
Top is terminal 'a'. Read input 'a'. They match! Pop.
Stack: Tb$ Input: ab
Top is variable T. Choose rule T → Ta.
Push a, T (so T on top).
Stack: Tab$ Input: ab
Top is variable T. Choose rule T → ε.
Pop T, push nothing.
Stack: ab$ Input: ab
Top is terminal 'a'. Read input 'a'. Match! Pop.
Stack: b$ Input: b
Top is terminal 'b'. Read input 'b'. Match! Pop.
Stack: $ Input: ε
Top is $. All input consumed. → ACCEPT.
The PDA accepted "aab" by simulating the leftmost derivation of G.
Summary
Pushdown Automata — the essentials:
6-tuple: \((Q, \Sigma, \Gamma, \delta, q_0, F)\) — adds stack alphabet \(\Gamma\) to the NFA definition