08 The Pumping Lemma

Why the Pumping Lemma?

Everything up to now has been about showing languages ARE regular — by building DFAs, NFAs, or regexes. But what if you suspect a language isn't regular? You can't just say "I tried to build a DFA and couldn't" — maybe you're not clever enough.

The pumping lemma gives you a proof technique for showing a language is NOT regular. It exploits a structural limitation of all finite automata: they have finitely many states, so any long enough string must revisit a state (pigeonhole principle), creating a loop that can be "pumped" (repeated).

The Intuition: Pigeonhole Loops

The pigeonhole argument: A DFA has \(p\) states. If it reads a string of length \(\geq p\), it visits \(\geq p+1\) states (including the start). By the pigeonhole principle, some state must be visited twice — that's a loop.

A loop means: whatever string takes you around the loop can be repeated any number of times (0, 1, 2, ...) and the DFA still lands in the same state afterward. So if the original string is accepted, "pumped" versions must be too.

If a language has strings where pumping breaks membership, no DFA can recognize it — the language isn't regular.

Visualize it: String \(w = xyz\) with \(|w| \geq p\).

The loop on \(q_j\) (labeled \(y\)) can be traversed 0, 1, 2, ... times:

\(xy^0z = xz\) must also be accepted
\(xy^1z = xyz\) accepted (original)
\(xy^2z = xyyz\) must also be accepted
\(xy^iz\) for any \(i \geq 0\)

Formal Statement

Pumping Lemma for Regular Languages.
If \(L\) is regular, then there exists a number \(p \geq 1\) (the pumping length) such that for every string \(w \in L\) with \(|w| \geq p\), there exists a split \(w = xyz\) satisfying:

\(|y| > 0\) — the pumped part is non-empty
\(|xy| \leq p\) — the loop occurs within the first \(p\) characters
For all \(i \geq 0\): \(xy^iz \in L\) — pumping preserves membership

The Quantifier Structure

The quantifier nesting is crucial — it determines what YOU control and what the ADVERSARY controls:

\(\exists\, p\) \(\forall\, w \in L,\; |w| \geq p\) \(\exists\, xyz = w\) \(\forall\, i \geq 0\) : \(xy^i z \in L\)

Reading left to right: "There exists \(p\)... for all long enough \(w\)... there exists a split... for all pump values... membership holds."

The Adversarial Game

To disprove regularity, you negate the lemma. The proof becomes a game between you (the prover) and an adversary (who defends regularity).

The Pumping Lemma Game (to prove \(L\) is NOT regular):

Adversary (defends regular)	You (attack)
1. Picks \(p\) — any positive integer
	2. Pick \(w \in L\) with \(\|w\| \geq p\)
3. Picks split \(w = xyz\) satisfying \(\|y\|>0\) and \(\|xy\| \leq p\)
	4. Pick \(i\) such that \(xy^iz \notin L\)

Critical rules of engagement:

You DON'T get to pick \(p\). Your argument must work for ANY \(p\). Write your \(w\) in terms of \(p\).
You DON'T get to pick the split. You must show that ALL valid splits fail. This means reasoning about arbitrary \(x, y, z\) subject to \(|y| > 0\) and \(|xy| \leq p\).
The constraint \(|xy| \leq p\) is your most powerful weapon — it forces \(y\) to be in the first \(p\) characters.

The Proof Scheme

Template for pumping lemma proofs:

Assume for contradiction that \(L\) is regular.
Then the pumping lemma gives us a pumping length \(p\).
Choose \(w \in L\) with \(|w| \geq p\). (Strategic choice — must be in the language!)
Consider any split \(w = xyz\) with \(|y| > 0\) and \(|xy| \leq p\).
Show that some \(xy^iz \notin L\) — find a specific \(i\) that breaks it.
Contradiction. The pumping lemma is violated, so \(L\) is not regular.

Example 1: \(L = \{0^n 1^n \mid n \geq 0\}\)

The classic example. Equal numbers of 0s followed by equal numbers of 1s.

Step 1: Assume \(L\) is regular

Then by the pumping lemma, there exists a pumping length \(p\).

Step 2: Choose \(w = 0^p 1^p\)

\(w \in L\) because it has \(p\) zeros followed by \(p\) ones. \(|w| = 2p \geq p\).

Step 3: Consider any valid split \(w = xyz\)

Since \(|xy| \leq p\) and the first \(p\) characters are all 0s, both \(x\) and \(y\) consist entirely of 0s. Say \(y = 0^k\) where \(1 \leq k \leq p\).

Step 4: Pump with \(i = 2\)

\(xy^2z = 0^{p+k}1^p\). This has \(p+k\) zeros and \(p\) ones. Since \(k \geq 1\), we have \(p+k > p\), so the number of zeros exceeds the number of ones.

Therefore \(xy^2z \notin L\). (We could also use \(i=0\): \(xz = 0^{p-k}1^p\), giving fewer zeros than ones.)

Step 5: Contradiction

The pumping lemma requires \(xy^iz \in L\) for all \(i\), but we found \(i=2\) where it fails. Contradiction. Therefore \(L\) is not regular.

Why \(|xy| \leq p\) was essential: It guaranteed that \(y\) was entirely within the 0s. Without this constraint, someone could argue "what if \(y\) straddles the boundary and contains both 0s and 1s?" — then pumping might not break the count. The \(|xy| \leq p\) constraint eliminates this objection.

Example 2: \(L = \{ww \mid w \in \{0,1\}^*\}\)

Strings that are the concatenation of some string with itself. E.g., \(0101 \in L\), \(00110011 \in L\), \(01 \notin L\).

The "landmark" trick

Choose \(w = 0^p 1 0^p 1\). This is in \(L\) because it's \((0^p 1)(0^p 1)\) — the first half concatenated with itself.

\(|w| = 2p+2 \geq p\).

The split

Since \(|xy| \leq p\), the string \(y\) lies entirely within the first block of \(p\) zeros. So \(y = 0^k\) for some \(k \geq 1\).

Pump with \(i = 2\)

\(xy^2z = 0^{p+k} 1 0^p 1\). For this to be in \(L\) (form \(vv\)), the two halves must be identical. The string has length \(2p+k+2\). For even length, \(k\) must be even. If \(k\) is even, each half has length \(p + k/2 + 1\). The first half is \(0^{p+k} 1 0^{k/2-1}\ldots\) — but this places the 1 at a different position than in the second half, so the halves can't match. Either way, \(xy^2z \notin L\).

Alternatively, pump with \(i = 0\): \(xz = 0^{p-k}10^p1\). The first block of 0s is shorter than the second, so the two halves can't be equal.

The landmark trick: Place a "landmark" symbol (the 1 in this case) at a position where pumping within the first \(p\) characters shifts it relative to the second copy. The mismatch proves \(xy^iz \notin L\). This technique works for many "repetition" languages.

Example 3: Closure-Based Proof

Sometimes you don't use the pumping lemma directly — you use closure properties instead.

Claim: \(L = \{w \in \{0,1\}^* \mid w \text{ has equal number of 0s and 1s}\}\) is not regular.

Proof by closure:

Assume \(L\) is regular.
\(0^*1^*\) is regular (it's a regex!).
Regular languages are closed under intersection, so \(L \cap 0^*1^*\) would be regular.
\(L \cap 0^*1^* = \{0^n1^n \mid n \geq 0\}\) — strings of 0s-then-1s with equal counts.
But we proved \(\{0^n1^n\}\) is NOT regular (Example 1).
Contradiction. Therefore \(L\) is not regular.

When to use closure proofs: When the pumping lemma would be awkward (hard to constrain where \(y\) falls), but the language can be intersected/complemented into a known non-regular language. Closure proofs are often cleaner and shorter.

The Pumping Game

Play against the pumping lemma for \(L = \{0^n1^n \mid n \geq 0\}\).

The adversary picks \(p\). You pick \(i\) to break the split.

Adversary's \(p\):

Common Mistakes

1. Picking a specific \(p\).
Wrong: "Let \(p = 5\)." The adversary picks \(p\), not you. Your string must work for ALL \(p\).

2. Picking a specific split.
Wrong: "Let \(y = 00\)." The adversary picks the split. You must show ALL valid splits fail.

3. Choosing \(w \notin L\).
Your string must be IN the language. Then pumping pushes it OUT. If \(w \notin L\) to begin with, you've proven nothing.

4. Bad string choice.
Some strings are pumpable even in non-regular languages. If your chosen \(w\) doesn't lead to a contradiction for all splits, choose a different \(w\). The lemma says there exists a good split — you need to show NO split works.

5. Using pumping lemma to prove regularity.
The pumping lemma is one-directional: regular \(\Rightarrow\) pumpable. NOT: pumpable \(\Rightarrow\) regular. Some non-regular languages satisfy the pumping conditions! The lemma can only prove NON-regularity.

Summary

The Pumping Lemma:

Purpose: Prove languages are NOT regular (never proves regularity)
Intuition: Long strings force state revisits (loops), loops can be pumped
Key constraint: \(|xy| \leq p\) forces \(y\) into the first \(p\) characters
Proof pattern: Assume regular → choose \(w\) → show all splits fail → contradiction
Alternative: Closure-based proofs (intersect/complement into known non-regular)

The quantifier game:

They pick \(p\) → You pick \(w\) → They split \(xyz\) → You pick \(i\) → show \(xy^iz \notin L\)

← GNFA & State Elimination Index Context-Free Grammars →