Everything up to now has been about showing languages ARE regular — by building DFAs, NFAs, or regexes. But what if you suspect a language isn't regular? You can't just say "I tried to build a DFA and couldn't" — maybe you're not clever enough.
The pumping lemma gives you a proof technique for showing a language is NOT regular. It exploits a structural limitation of all finite automata: they have finitely many states, so any long enough string must revisit a state (pigeonhole principle), creating a loop that can be "pumped" (repeated).
The Intuition: Pigeonhole Loops
The pigeonhole argument: A DFA has \(p\) states. If it reads a string of length \(\geq p\), it visits \(\geq p+1\) states (including the start). By the pigeonhole principle, some state must be visited twice — that's a loop.
A loop means: whatever string takes you around the loop can be repeated any number of times (0, 1, 2, ...) and the DFA still lands in the same state afterward. So if the original string is accepted, "pumped" versions must be too.
If a language has strings where pumping breaks membership, no DFA can recognize it — the language isn't regular.
Visualize it: String \(w = xyz\) with \(|w| \geq p\).
The loop on \(q_j\) (labeled \(y\)) can be traversed 0, 1, 2, ... times:
\(xy^0z = xz\) must also be accepted
\(xy^1z = xyz\) accepted (original)
\(xy^2z = xyyz\) must also be accepted
\(xy^iz\) for any \(i \geq 0\)
Formal Statement
Pumping Lemma for Regular Languages.
If \(L\) is regular, then there exists a number \(p \geq 1\) (the pumping length) such that for every string \(w \in L\) with \(|w| \geq p\), there exists a split \(w = xyz\) satisfying:
\(|y| > 0\) — the pumped part is non-empty
\(|xy| \leq p\) — the loop occurs within the first \(p\) characters
For all \(i \geq 0\): \(xy^iz \in L\) — pumping preserves membership
The Quantifier Structure
The quantifier nesting is crucial — it determines what YOU control and what the ADVERSARY controls:
\(\exists\, p\)\(\forall\, w \in L,\; |w| \geq p\)\(\exists\, xyz = w\)\(\forall\, i \geq 0\)
: \(xy^i z \in L\)
Reading left to right: "There exists \(p\)... for all long enough \(w\)... there exists a split... for all pump values... membership holds."
The Adversarial Game
To disprove regularity, you negate the lemma. The proof becomes a game between you (the prover) and an adversary (who defends regularity).
The Pumping Lemma Game (to prove \(L\) is NOT regular):
You DON'T get to pick \(p\). Your argument must work for ANY \(p\). Write your \(w\) in terms of \(p\).
You DON'T get to pick the split. You must show that ALL valid splits fail. This means reasoning about arbitrary \(x, y, z\) subject to \(|y| > 0\) and \(|xy| \leq p\).
The constraint \(|xy| \leq p\) is your most powerful weapon — it forces \(y\) to be in the first \(p\) characters.
The Proof Scheme
Template for pumping lemma proofs:
Assume for contradiction that \(L\) is regular.
Then the pumping lemma gives us a pumping length \(p\).
Choose \(w \in L\) with \(|w| \geq p\). (Strategic choice — must be in the language!)
Consider any split \(w = xyz\) with \(|y| > 0\) and \(|xy| \leq p\).
Show that some \(xy^iz \notin L\) — find a specific \(i\) that breaks it.
Contradiction. The pumping lemma is violated, so \(L\) is not regular.
Example 1: \(L = \{0^n 1^n \mid n \geq 0\}\)
The classic example. Equal numbers of 0s followed by equal numbers of 1s.
Step 1: Assume \(L\) is regular
Then by the pumping lemma, there exists a pumping length \(p\).
Step 2: Choose \(w = 0^p 1^p\)
\(w \in L\) because it has \(p\) zeros followed by \(p\) ones. \(|w| = 2p \geq p\).
Step 3: Consider any valid split \(w = xyz\)
Since \(|xy| \leq p\) and the first \(p\) characters are all 0s, both \(x\) and \(y\) consist entirely of 0s. Say \(y = 0^k\) where \(1 \leq k \leq p\).
Step 4: Pump with \(i = 2\)
\(xy^2z = 0^{p+k}1^p\). This has \(p+k\) zeros and \(p\) ones. Since \(k \geq 1\), we have \(p+k > p\), so the number of zeros exceeds the number of ones.
Therefore \(xy^2z \notin L\). (We could also use \(i=0\): \(xz = 0^{p-k}1^p\), giving fewer zeros than ones.)
Step 5: Contradiction
The pumping lemma requires \(xy^iz \in L\) for all \(i\), but we found \(i=2\) where it fails. Contradiction. Therefore \(L\) is not regular.
Why \(|xy| \leq p\) was essential: It guaranteed that \(y\) was entirely within the 0s. Without this constraint, someone could argue "what if \(y\) straddles the boundary and contains both 0s and 1s?" — then pumping might not break the count. The \(|xy| \leq p\) constraint eliminates this objection.
Example 2: \(L = \{ww \mid w \in \{0,1\}^*\}\)
Strings that are the concatenation of some string with itself. E.g., \(0101 \in L\), \(00110011 \in L\), \(01 \notin L\).
The "landmark" trick
Choose \(w = 0^p 1 0^p 1\). This is in \(L\) because it's \((0^p 1)(0^p 1)\) — the first half concatenated with itself.
\(|w| = 2p+2 \geq p\).
The split
Since \(|xy| \leq p\), the string \(y\) lies entirely within the first block of \(p\) zeros. So \(y = 0^k\) for some \(k \geq 1\).
Pump with \(i = 2\)
\(xy^2z = 0^{p+k} 1 0^p 1\). For this to be in \(L\) (form \(vv\)), the two halves must be identical. The string has length \(2p+k+2\). For even length, \(k\) must be even. If \(k\) is even, each half has length \(p + k/2 + 1\). The first half is \(0^{p+k} 1 0^{k/2-1}\ldots\) — but this places the 1 at a different position than in the second half, so the halves can't match. Either way, \(xy^2z \notin L\).
Alternatively, pump with \(i = 0\): \(xz = 0^{p-k}10^p1\). The first block of 0s is shorter than the second, so the two halves can't be equal.
The landmark trick: Place a "landmark" symbol (the 1 in this case) at a position where pumping within the first \(p\) characters shifts it relative to the second copy. The mismatch proves \(xy^iz \notin L\). This technique works for many "repetition" languages.
Example 3: Closure-Based Proof
Sometimes you don't use the pumping lemma directly — you use closure properties instead.
Claim: \(L = \{w \in \{0,1\}^* \mid w \text{ has equal number of 0s and 1s}\}\) is not regular.
Proof by closure:
Assume \(L\) is regular.
\(0^*1^*\) is regular (it's a regex!).
Regular languages are closed under intersection, so \(L \cap 0^*1^*\) would be regular.
\(L \cap 0^*1^* = \{0^n1^n \mid n \geq 0\}\) — strings of 0s-then-1s with equal counts.
But we proved \(\{0^n1^n\}\) is NOT regular (Example 1).
Contradiction. Therefore \(L\) is not regular.
When to use closure proofs: When the pumping lemma would be awkward (hard to constrain where \(y\) falls), but the language can be intersected/complemented into a known non-regular language. Closure proofs are often cleaner and shorter.
The Pumping Game
Play against the pumping lemma for \(L = \{0^n1^n \mid n \geq 0\}\).
The adversary picks \(p\). You pick \(i\) to break the split.
Common Mistakes
1. Picking a specific \(p\).
Wrong: "Let \(p = 5\)." The adversary picks \(p\), not you. Your string must work for ALL \(p\).
2. Picking a specific split.
Wrong: "Let \(y = 00\)." The adversary picks the split. You must show ALL valid splits fail.
3. Choosing \(w \notin L\).
Your string must be IN the language. Then pumping pushes it OUT. If \(w \notin L\) to begin with, you've proven nothing.
4. Bad string choice.
Some strings are pumpable even in non-regular languages. If your chosen \(w\) doesn't lead to a contradiction for all splits, choose a different \(w\). The lemma says there exists a good split — you need to show NO split works.
5. Using pumping lemma to prove regularity.
The pumping lemma is one-directional: regular \(\Rightarrow\) pumpable. NOT: pumpable \(\Rightarrow\) regular. Some non-regular languages satisfy the pumping conditions! The lemma can only prove NON-regularity.
Summary
The Pumping Lemma:
Purpose: Prove languages are NOT regular (never proves regularity)
Intuition: Long strings force state revisits (loops), loops can be pumped
Key constraint: \(|xy| \leq p\) forces \(y\) into the first \(p\) characters
Proof pattern: Assume regular → choose \(w\) → show all splits fail → contradiction
Alternative: Closure-based proofs (intersect/complement into known non-regular)
The quantifier game:
They pick \(p\) →
You pick \(w\) →
They split \(xyz\) →
You pick \(i\) →
show \(xy^iz \notin L\)