Before there were computers, mathematicians asked a question that would shape the 20th century: what does it mean to compute something? Not "how fast" or "how much memory" — but what is fundamentally possible? Are there problems that no procedure, no matter how clever, could ever solve?
The answer turned out to be yes. And the framework built to prove it — automata theory, formal languages, and computability — became the bedrock of computer science. This course is about that framework.
The story starts in 1900 with David Hilbert, who posed 23 problems he believed would guide mathematics into the new century. His program was ambitious: formalize all of mathematics, prove it consistent, and show that every true statement has a proof. In particular, Hilbert's Entscheidungsproblem (decision problem) asked whether there exists a mechanical procedure that can determine the truth or falsity of any mathematical statement.
In 1931, Kurt Gödel dealt the first blow: his incompleteness theorems showed that any sufficiently powerful formal system contains true statements that cannot be proved within that system. Mathematics could never be fully captured by axioms and rules.
Then, in 1936, two people independently proved that Hilbert's Entscheidungsproblem has no solution:
The remarkable fact: Turing machines, lambda calculus, recursive functions, and every other reasonable formalization of "computation" all turned out to recognize the same class of problems. This convergence is the Church-Turing thesis: anything we'd intuitively call "computable" can be computed by a Turing machine. It's not a theorem (you can't prove a claim about intuition), but no counterexample has ever been found.
Automata theory builds a tower of increasingly powerful machines. At each level, you gain something — more memory, more flexibility — and with it, the ability to recognize strictly more languages. The hierarchy looks like this:
The simplest machines. A DFA reads input one symbol at a time, left to right, transitioning between a finite number of states. No scratch space, no stack, no tape — just a state. Despite this simplicity, DFAs handle pattern matching (your IDE's syntax highlighter), protocol verification (does this sequence of packets follow the TCP handshake?), and lexical analysis (the first phase of every compiler). They recognize the regular languages.
NFAs (nondeterministic finite automata) look more powerful — they can "guess" the right path — but they're not. Every NFA can be converted to an equivalent DFA (subset construction). The machine is different; the class of languages is identical.
Take a finite automaton and give it a stack: unbounded LIFO memory. Now it can count, match, and nest. The language \(\{0^n 1^n \mid n \geq 0\}\) — impossible for a DFA, trivial for a PDA: push a symbol for every \(0\), pop for every \(1\), accept if the stack is empty. PDAs recognize the context-free languages, which are exactly the languages generated by context-free grammars (CFGs). This is why every programming language's parser uses a CFG.
Unlike at level 1, nondeterminism genuinely adds power here: there are context-free languages that no deterministic PDA can recognize. The stack gives you memory, but only LIFO access — you can't look at the bottom without popping everything on top.
Replace the stack with an infinite tape that the machine can read, write, and move on in both directions. This is the full model of computation. Turing machines can simulate any algorithm — sorting, graph search, compilation, machine learning. They recognize the recursively enumerable languages and decide the decidable (recursive) languages.
But even Turing machines have limits. The halting problem is undecidable: no TM can look at another TM's description and its input, and always correctly determine whether it halts. This isn't a matter of insufficient cleverness — it's a mathematical impossibility, proved by diagonalization (the same technique Cantor used to show the reals are uncountable).
In 1956, linguist Noam Chomsky classified formal grammars into four types, creating a hierarchy that aligns perfectly with the machine hierarchy. Each level of grammar generates a corresponding class of languages:
| Type | Grammar | Languages | Machine | Memory Model |
|---|---|---|---|---|
| 3 (simplest) | Regular grammar | Regular | DFA / NFA | Finite states, no memory |
| 2 | Context-free grammar | Context-Free | PDA | Finite states + stack |
| 1 | Context-sensitive grammar | Context-Sensitive | Linear-bounded TM | Bounded tape |
| 0 (most powerful) | Unrestricted grammar | Recursively Enumerable | Turing Machine | Infinite tape |
The containment is strict: every regular language is context-free, every context-free language is context-sensitive, every context-sensitive language is recursively enumerable. At each level, there exist languages that belong to the new class but not the one below it. The pumping lemmas and diagonalization are the tools for proving these separations.
This hierarchy is not just an intellectual curiosity. It tells you which tools to reach for:
Automata theory isn't a historical curiosity — it's the foundation that working systems are built on.
Every compiler has two phases that map directly to the Chomsky hierarchy. Lexical analysis (tokenization) uses regular expressions — DFAs in disguise — to break source code into tokens (keywords, identifiers, literals). Parsing uses context-free grammars to build syntax trees. The BNF notation you see in language specifications is a CFG. Parser generators like ANTLR, yacc, and Bison are direct applications of this theory.
Every grep invocation, every regex in your code, every find-and-replace in your editor runs a finite automaton. The theory explains why some patterns are fast (DFA-recognizable) and others cause catastrophic backtracking (when engines use backtracking instead of proper NFA-to-DFA conversion).
When you verify that a protocol (TCP, TLS, a smart contract) behaves correctly, you're often checking properties of a finite-state model. The closure properties of regular languages — union, intersection, complement — are the operations that make automated verification possible.
The P vs NP question — whether problems whose solutions can be quickly verified can also be quickly solved — is the direct descendant of the decidability questions in this course. Modern cryptography depends on the assumption that P ≠ NP: if factoring were in P, RSA would break. The theory of NP-completeness, the Cook-Levin theorem, and polynomial-time reductions are all built on the foundation of Turing machines and decidability.
Context-free grammars power natural language processing pipelines. Decidability results tell you which program analysis questions are answerable and which are provably impossible (Rice's theorem: every non-trivial semantic property of programs is undecidable). The boundaries of computation constrain what any AI system — however sophisticated — can guarantee.
There's a specific intellectual pleasure in automata theory that's different from most of CS. It's not about building things — it's about understanding what's possible and what isn't.
The course takes you from the simplest possible machine (a DFA: just states and arrows) all the way to the deepest questions in mathematics (is there a procedure that can solve any mathematical problem?). Along the way, you discover that: