The ultimate model of computation — and the edge of what's computable
Why Turing Machines?
We've been climbing a hierarchy of increasingly powerful machines. DFAs had no memory beyond their current state — good enough for regular languages, but they couldn't count. PDAs added a stack — good enough for context-free languages like \(\{0^n1^n\}\) and palindromes, but limited to LIFO matching. The CF pumping lemma showed that context-free languages have their own walls: \(\{a^nb^nc^n\}\), \(\{ww\}\), and anything requiring cross-serial dependencies.
So what if we remove all restrictions? Give a machine an infinite tape it can read, write, and move on in both directions. No stack discipline — full random access to everything it's written. This is the Turing machine, and it represents the biggest conceptual leap in this entire course.
The hierarchy so far:
Machine
Memory
Language class
Limitation
DFA / NFA
None (finite states only)
Regular
Can't count
PDA
Stack (LIFO)
Context-free
Can't compare across sections
Turing machine
Infinite read/write tape
???
???
The Turing machine fills in both question marks — and the answers are profound. It captures everything we intuitively call "computable." And its limitations are not about memory but about logic itself.
Why this matters. Up to now, every model had an obvious weakness you could point to: "it can't count" or "it can only do LIFO matching." The Turing machine has no such obvious weakness. It has unlimited memory, it can go back and re-read, it can overwrite. And yet — as we'll see in later pages — there are still things it cannot do. That's the deep surprise: the limits of computation are not about engineering (more memory, more time) but about mathematics.
The Physical Model
Picture a machine with three components:
An infinite tape divided into cells, stretching infinitely to the right. Each cell holds one symbol. The input string is written on the leftmost cells; everything else is blank.
A head that sits over one cell at a time. It can read the symbol, write a new symbol (possibly overwriting the old one), and move one cell left or right.
A finite control — a finite set of states, including a start state, an accept state, and a reject state.
↓ head (state q0)
0
1
1
0
␣
␣
␣
input: 0110 blanks extend infinitely →
Now compare this to a DFA. A DFA also reads a tape left-to-right using a finite control. But a DFA can only read and only move right. Those seem like minor restrictions — but removing them changes everything:
Writing lets the machine leave itself notes — mark symbols as "seen," convert them, use the tape as scratch paper
Moving left lets the machine re-examine previous input, shuttle back and forth, compare distant parts of the string
The key upgrade from PDA to TM: A PDA also has unlimited auxiliary storage (the stack). But the stack is LIFO — you can only see and remove the top element. A TM's tape is random access — the head can move to any position and read or write. This is why a TM can handle \(\{a^nb^nc^n\}\) (shuttle between three sections, matching counts) while a PDA cannot.
Input handling is fundamentally different. A DFA or PDA reads input one symbol at a time, left to right, no going back — the input is fed to the machine. A TM has its entire input pre-loaded on the tape before computation starts. The head begins at the leftmost symbol and can then move freely in both directions, re-reading and overwriting input as it goes. The input tape IS the work tape — one tape, dual purpose. There is no separate "input stream."
Formal Definition
Definition 3.3 (Sipser). A Turing machine is a 7-tuple \(M = (Q, \Sigma, \Gamma, \delta, q_0, q_{\text{accept}}, q_{\text{reject}})\) where:
\(Q\) — a finite set of states
\(\Sigma\) — the input alphabet, not containing the blank symbol \(\sqcup\)
\(\Gamma\) — the tape alphabet, where \(\sqcup \in \Gamma\) and \(\Sigma \subseteq \Gamma\)
\(\delta: Q \times \Gamma \to Q \times \Gamma \times \{L, R\}\) — the transition function
\(q_0 \in Q\) — the start state
\(q_{\text{accept}} \in Q\) — the accept state
\(q_{\text{reject}} \in Q\) — the reject state, where \(q_{\text{reject}} \neq q_{\text{accept}}\)
Let's compare this to what we know and unpack what changed:
\(\Sigma\) vs. \(\Gamma\) — two alphabets
A DFA had one alphabet \(\Sigma\). A PDA added a stack alphabet \(\Gamma\). A TM also has two alphabets, but the relationship is different:
\(\Sigma\) = input alphabet — the symbols that can appear in the input string. Crucially, the blank symbol \(\sqcup\) is not in \(\Sigma\). This is what lets the machine detect the end of the input: it's where the blanks start.
\(\Gamma\) = tape alphabet — everything that can appear on the tape, including \(\Sigma\) and the blank \(\sqcup\). The tape alphabet often includes extra "marker" symbols like \(\texttt{x}\), \(\dot{0}\), etc., that the machine writes as scratch work.
Lecture clicker: which sets work? Given \(A = \{0,1\}\), \(B = \{0,1,2\}\), \(C = \{0,1,\sqcup\}\), the valid assignment is \(\Sigma = A\) and \(\Gamma = C\). Why? Because \(\Sigma \subseteq \Gamma\) (check: \(\{0,1\} \subseteq \{0,1,\sqcup\}\)) and \(\sqcup \in \Gamma\) (check) and \(\sqcup \notin \Sigma\) (check). \(B\) can't be \(\Gamma\) because it doesn't contain \(\sqcup\). And \(\Gamma = A\) fails because it doesn't contain \(\sqcup\) either.
Also: \(Q\), \(\Sigma\), and \(\Gamma\) must all be finite sets. \(\mathbb{N}\) and \(\mathbb{R} \cup \{\sqcup\}\) are both infinite, so neither can serve as a tape alphabet.
\(\delta\) — the transition function
Unpacking \(\delta\). The transition function is:
\[\delta: Q \times \Gamma \to Q \times \Gamma \times \{L, R\}\]
On each step, the machine:
Reads
Decides
Current state \(q \in Q\)
Based on \((q, a)\), produce \((q', b, D)\): \(q'\) = new state \(b\) = symbol to write \(D\) = direction to move (L or R)
Symbol under head \(a \in \Gamma\)
Compare to our previous machines:
DFA: \(\delta: Q \times \Sigma \to Q\) — read a symbol, change state. That's it.
TM: \(\delta: Q \times \Gamma \to Q \times \Gamma \times \{L, R\}\) — read tape, write tape, move head. Deterministic (single output, not a set) and total (defined for every state-symbol pair, except at \(q_{\text{accept}}\) and \(q_{\text{reject}}\)).
No \(\varepsilon\)-transitions. Unlike NFAs and PDAs, a standard TM always reads the symbol under the head and always moves. There's no "free move" option. (The machine can effectively stay put by moving right then left, but each is a real step.)
Boundary behavior: If the head is at the left end of the tape and the transition says "move left," the head simply stays in place. The machine does not crash.
\(q_{\text{accept}}\) and \(q_{\text{reject}}\) — not a set \(F\)
This is a subtle but important change from DFAs and PDAs. Those machines had a set of accept states \(F \subseteq Q\), and acceptance was determined by which state you ended up in after reading all input. A Turing machine instead has:
A single accept state \(q_{\text{accept}}\) — entering this state immediately halts the machine and accepts
A single reject state \(q_{\text{reject}}\) — entering this state immediately halts the machine and rejects
These must be different: \(q_{\text{accept}} \neq q_{\text{reject}}\)
The machine halts IMMEDIATELY on entering \(q_{\text{accept}}\) or \(q_{\text{reject}}\). It doesn't finish reading the input first. It doesn't process the remaining tape. It just stops. This is fundamentally different from a DFA, which always reads the entire input before deciding. A TM can accept or reject at any point during computation — even partway through the input.
Configurations
A configuration captures the complete state of a TM at one moment: the current state, the tape contents, and the head position. We need a compact notation for this.
Notation (Sipser). For a state \(q\) and two strings \(u, v \in \Gamma^*\), we write:
\[u\, q\, v\]
to mean: the tape contains \(uv\) (followed by blanks), the machine is in state \(q\), and the head is on the first symbol of \(v\).
Example: \(1011\, q_7\, 01111\) means the tape is 101101111, the machine is in state \(q_7\), and the head is on the 0 at position 5 (the first symbol of the \(v\) part).
Configuration yields
We say configuration \(C_1\) yields \(C_2\) (written \(C_1 \vdash C_2\)) if the TM can go from \(C_1\) to \(C_2\) in a single step. Formally:
Leftward move: If \(\delta(q_i, b) = (q_j, c, L)\), then \(ua\, q_i\, bv \;\vdash\; u\, q_j\, acv\). The head was on \(b\), wrote \(c\), moved left onto \(a\).
Rightward move: If \(\delta(q_i, b) = (q_j, c, R)\), then \(ua\, q_i\, bv \;\vdash\; uac\, q_j\, v\). The head was on \(b\), wrote \(c\), moved right onto the first symbol of \(v\).
Special configurations:
Start configuration: \(q_0\, w\) — state \(q_0\), head on the leftmost symbol of input \(w\)
Accepting configuration: any configuration where the state is \(q_{\text{accept}}\)
Rejecting configuration: any configuration where the state is \(q_{\text{reject}}\)
Accepting and rejecting configurations are halting configurations — no further steps are taken
Three Outcomes: Accept, Reject, Loop
With DFAs and PDAs, there were two possible outcomes: accept or reject. A Turing machine introduces a third possibility that changes everything:
The three outcomes of a TM on input \(w\):
Accept — the machine enters \(q_{\text{accept}}\) and halts
Reject — the machine enters \(q_{\text{reject}}\) and halts
Loop — the machine runs forever, never entering either halting state
A DFA always halts (it reads a finite string left-to-right and stops). A TM can go back and forth on its tape indefinitely, potentially never halting. This is not a bug — it's a fundamental feature of the model, and it has deep consequences.
"Looping" doesn't necessarily mean the machine cycles through the same states — it can exhibit complex, non-repeating behavior while still never reaching a halting state. And here's the devastating fact: given an arbitrary TM and input, there is in general no way to tell whether the machine will eventually halt or loop forever. (That's the halting problem — coming in a later page.)
Recognizable vs. Decidable
The three-outcome possibility forces us to make a distinction that didn't exist before:
Definition 3.5 (Sipser). A language \(L\) is Turing-recognizable (also called recursively enumerable, r.e.) if some Turing machine \(M\) recognizes it — meaning:
If \(w \in L\): \(M\) halts and accepts
If \(w \notin L\): \(M\) either rejects or loops forever
Definition 3.6 (Sipser). A language \(L\) is decidable (also called recursive) if some Turing machine \(M\) decides it — meaning:
If \(w \in L\): \(M\) halts and accepts
If \(w \notin L\): \(M\) halts and rejects
\(M\) halts on every input — it never loops
A TM that halts on all inputs is called a decider.
The containment:
\[\text{Regular} \;\subset\; \text{Context-free} \;\subset\; \text{Decidable} \;\subset\; \text{Turing-recognizable}\]
Every regular language is decidable (just simulate the DFA — it always halts). Every context-free language is decidable (run a CYK parser — always halts). But some Turing-recognizable languages are NOT decidable — recognizers for those languages loop on some inputs. And some languages are not even Turing-recognizable.
Lecture clicker insights:
If \(L^c\) is context-free, is \(L\) Turing-recognizable?Yes. Context-free languages are decidable, so \(L^c\) is decidable. If \(L^c\) is decidable, we can build a recognizer for \(L\) by flipping accept/reject.
If \(L\) is Turing-recognizable, is \(L^c\) decidable?No. \(L\) being recognizable tells us almost nothing about \(L^c\). If both \(L\) and \(L^c\) are recognizable, then \(L\) is decidable — but recognizability of \(L\) alone doesn't give us recognizability of \(L^c\).
Can every NDTM recognizer N be converted to an equivalent decider?No. If \(L(N)\) is recognizable but not decidable, no decider for it exists — regardless of whether the original recognizer was deterministic or nondeterministic.
Describing Turing Machines — Three Levels
Sipser identifies three levels of description for Turing machines, from most to least formal. This is important because you'll use different levels in different contexts:
Level 1: Formal description
The complete 7-tuple plus the full transition function \(\delta\) (as a table or state diagram). This is what you'd need to actually build the machine. We give formal descriptions for the first few examples, but they're unwieldy even for simple languages.
Level 2: Implementation description
English prose describing the head movements, what gets written, and state transitions — but at the level of tape operations. "Scan right until you find a blank" or "mark the current symbol with an x and move left to the beginning." This is the most common level for exam answers.
Level 3: High-level description
Plain English algorithm with no mention of the tape, head, or states. "Sort the input and check if adjacent elements are equal." Sipser uses this for most examples, writing \(M = \)"On input \(w\): ..." followed by numbered steps.
Every high-level description corresponds to an actual TM. This is the whole point of the Church-Turing thesis (below): if you can describe an algorithm precisely in English, there exists a Turing machine that implements it. So high-level descriptions aren't hand-waving — they're shorthand for a real machine you could spell out in full if needed.
TM State Diagram Conventions
TM state diagrams look like DFA diagrams with one key difference in transition labels:
Transition label format: Each arrow is labeled \(a \to b, D\) meaning:
Read \(a\) from the tape (the symbol currently under the head)
Write \(b\) onto the tape (replacing \(a\))
Move in direction \(D\) (L = left, R = right)
Shorthand: \(a \to R\) means "read \(a\), write \(a\) (don't change it), move right." This is equivalent to \(a \to a, R\).
Other conventions:
\(q_{\text{accept}}\) and \(q_{\text{reject}}\) are drawn as labeled nodes (sometimes with double circles for accept). Often \(q_{\text{reject}}\) and its transitions are omitted for clarity — any missing transition implicitly goes to \(q_{\text{reject}}\).
Multiple transitions from the same state can be combined: 0,1 → R means "on reading 0 or 1, don't change it, move right."
Example 1: TM for \(\{0^n 1^n \mid n \geq 0\}\)
We already built a PDA for this language (page 10). The PDA used a stack to count: push 0s, pop on 1s. A TM takes a completely different approach — it has no stack, but it has a writable tape. The strategy is to match pairs by marking them off.
High-level description
\(M_1 = \) "On input \(w\):
Scan across the tape. If the tape contains symbols after the first \(\sqcup\), or if any \(0\) appears to the right of any \(1\), reject.
If the tape is empty (first symbol is \(\sqcup\)), accept.
Cross off the leftmost remaining \(0\) (replace with \(\texttt{x}\)) and scan right to cross off the leftmost remaining \(1\) (replace with \(\texttt{x}\)).
Move the head back to the leftmost remaining \(0\) and go to step 3.
If all \(0\)s have been crossed off: scan right — if any \(1\)s remain, reject; otherwise, accept."
Implementation-level description
Simplified version (assumes input is in \(0^*1^*\) form):
If head reads \(\sqcup\), accept (empty string).
If head reads \(0\): write \(\texttt{x}\), move right.
Scan right past \(0\)s and \(\texttt{x}\)s until finding a \(1\). If you hit \(\sqcup\) first, reject (more 0s than 1s).
Replace the \(1\) with \(\texttt{x}\), then scan left back to the first \(\texttt{x}\) followed by a \(0\) (or the leftmost position).
If head reads \(0\), go to step 2. If head reads \(\texttt{x}\), scan right: if you see a \(1\), reject (more 1s than 0s). If you reach \(\sqcup\), accept.
Trace on input 0011
Configuration sequence (head position marked by state):
Pass 1: cross off first 0-1 pair
q0 0011
→ read 0, write x, move R
x q1 011
→ scan right past 0s
x0 q1 11
→ found 1, write x, move L
x q2 0x1
→ scan left back
q2 x0x1
→ hit left-end marker, move R
x q0 0x1
Pass 2: cross off second 0-1 pair
x q0 0x1
→ read 0, write x, scan right past x
xx q1 x1
→ skip x
xxx q1 1
→ found 1, write x, move L
xx q2 xx
→ scan left
x q2 xxx
→ scan left
q2 xxxx
→ hit left-end marker, move R
x q0 xxx
→ no more 0s, scan right to check for 1s
Verification scan: no remaining 1s
xxxx q3 ␣
→ hit blank, all matched ACCEPT
The key insight: the machine makes \(n\) passes across the tape, crossing off one \(0\)-\(1\) pair each time. Each pass costs \(O(n)\) steps, so the total is \(O(n^2)\). Not efficient — but correct! TMs are about computability, not efficiency.
Example 2: TM for \(\{w\#w \mid w \in \{0,1\}^*\}\)
This language is not context-free — a PDA has no way to compare two arbitrary strings separated by a \(\#\). (The stack would need to be read forward and backward simultaneously.) A TM handles it by zig-zagging: mark a symbol on the left, scan right past \(\#\), check that the corresponding symbol on the right matches, scan back, repeat.
High-level description (Sipser Example 3.9)
\(M_1 = \) "On input \(w\):
Zig-zag across the tape to corresponding positions on either side of \(\#\), checking that these positions match. Cross off symbols as they're checked.
When all symbols to the left of \(\#\) have been crossed off, check for any remaining symbols to the right of \(\#\). If any remain, reject; otherwise, accept."
\(q_1\): Start state. Read current symbol: if \(0\), replace with \(\texttt{x}\) and enter \(q_2\) (remembering we need to match a 0). If \(1\), replace with \(\texttt{x}\) and enter \(q_3\) (remembering we need to match a 1). If \(\#\), go to \(q_8\) to verify right side is empty too.
\(q_2\)/\(q_3\): Scan right past \(0\)s, \(1\)s, and past \(\#\), then past \(\texttt{x}\)s, looking for the first uncrossed symbol on the right side.
\(q_4\)/\(q_5\): Verify the match (must be 0 or 1 respectively). Cross it off with \(\texttt{x}\).
\(q_6\)/\(q_7\): Scan left all the way back to the first \(\texttt{x}\) on the left side, then move right to the next uncrossed symbol.
\(q_8\): All left symbols matched — scan right past \(\#\) and \(\texttt{x}\)s to verify nothing remains. If \(\sqcup\), accept.
Trace on input 01#01
Pass 1: match first symbol (0)
q1 01#01 — read 0, write x, enter q2
x q2 1#01 — scan right past 1
x1 q2 #01 — scan past #
x1# q2 01 — found 0: match! write x
x1# q6 x1 — scan left back to start
x q1 1#x1 — next uncrossed symbol
Pass 2: match second symbol (1)
xx q3 #x1 — read 1, write x, enter q3
xx# q3 x1 — scan right past x
xx#x q3 1 — found 1: match! write x
x q1 x#xx — scan back
Verification: left side all crossed off
xx q8 #xx — see #, scan right past x's
xx#xx q8 ␣ — blank: nothing remains ACCEPT
Example 3: TM for \(\{0^{2^n} \mid n \geq 0\}\)
This is the language of strings of 0s whose length is a power of 2: \(\{0, 00, 0000, 00000000, \ldots\}\). Neither DFAs nor PDAs can handle this — a DFA can't count at all, and a PDA's stack can check linear relationships but not exponential ones. This example showcases the true power of a writable tape.
High-level description (Sipser Example 3.7)
\(M_2 = \) "On input \(w\):
Sweep left to right across the tape, crossing off every other \(0\).
If in stage 1 the tape contained a single \(0\), accept.
If in stage 1 the tape contained more than one \(0\) and the number of \(0\)s was odd, reject.
Return the head to the left-hand end of the tape.
Go to stage 1."
Each sweep halves the number of \(0\)s. If the original count was a power of 2, repeated halving will reach exactly 1. If not, some sweep will encounter an odd count greater than 1, and the machine rejects.
\(q_1\): Initial entry point. Write \(\sqcup\) over the leftmost 0 (marking the left end of tape), move right to \(q_2\). This blank serves as a left-end marker so the machine can find its way back.
\(q_2\): The "crossing off" state for even-positioned 0s. Skip this 0 (leave it), move right to \(q_3\).
\(q_3\): Cross off this 0 (write \(\texttt{x}\)), move right. If you see \(\sqcup\) (end of input), check whether you just crossed off the only one — if so, the count was odd (reject unless count was 1). Go back to \(q_5\) and return to start.
\(q_5\): Sweep left until finding the left-end blank. Move right to \(q_2\) and start the next halving pass.
Sweep 1: cross off every other 0 (from 4 to 2)
q1 0000 — mark leftmost 0 as blank
␣ q2 000 — keep first 0
␣0 q3 00 — cross off: write x
␣0x q2 0 — keep this 0
␣0x0 q3 ␣ — hit end, even count, go back
␣0x q5 0␣ — sweep left
␣ q5 0x0␣ — sweep left
q5 ␣0x0␣ — found left marker, move right
␣ q2 0x0␣ — start sweep 2
Sweep 2: cross off every other remaining 0 (from 2 to 1)
␣0 q3 x0␣ — skip x
␣0x q3 0␣ — cross off: write x
␣0xx q3 ␣ — hit end, only 1 zero left
The machine detects a single 0 remains → ACCEPT
If the input were \(000\) (length 3, not a power of 2): 3 is odd, so the first sweep detects an odd count > 1 and rejects immediately.
TM Simulator: \(\{0^n1^n \mid n \geq 0\}\)
Watch the zig-zag marking strategy in action. Enter a string of 0s and 1s.
Variants: Nondeterministic and Multitape TMs
Just as we studied NFA variants of DFAs, there are variants of Turing machines. The remarkable result is that none of them add computational power — they all recognize the same class of languages.
Nondeterministic Turing Machines (NDTM)
A nondeterministic TM has a transition function \(\delta: Q \times \Gamma \to \mathcal{P}(Q \times \Gamma \times \{L, R\})\) — at each step, multiple transitions may be possible. The computation branches into a tree. The NDTM accepts if any branch reaches \(q_{\text{accept}}\).
Theorem 3.16 (Sipser). Every nondeterministic Turing machine has an equivalent deterministic Turing machine.
Proof idea: A deterministic TM \(D\) simulates the NDTM \(N\) by exploring its computation tree using breadth-first search (not depth-first! — DFS could go down an infinite branch and miss an accepting branch). \(D\) uses three tapes: (1) input (never altered), (2) simulation tape (copy of \(N\)'s tape on current branch), (3) address tape (tracks position in the computation tree). By Theorem 3.13, this 3-tape DTM can be converted to a single-tape DTM.
Nondeterminism in TMs vs. PDAs — a critical difference. For PDAs, nondeterminism genuinely added power (DPDAs are strictly weaker than NPDAs). For Turing machines, nondeterminism does NOT add power — DTMs and NDTMs recognize exactly the same languages. This is analogous to DFAs vs. NFAs (equivalent), not DPDAs vs. NPDAs (not equivalent).
Subtle point from the lectures: The statement "for every NDTM \(N\), we can construct a decider \(M\) that decides \(L(N)\)" is false. We can always construct a deterministic TM that recognizes the same language. But if \(N\) is merely a recognizer (not a decider), the equivalent DTM will also merely be a recognizer — it may loop on some inputs. Converting NDTM to DTM preserves the language, not the halting property. However, if the NDTM is a decider (halts on all branches for all inputs), the DTM simulation will also halt on all inputs.
Specifically: if \(N\) accepts \(w\), the BFS finds the accepting branch in finite time. If \(N\) rejects \(w\) (all branches reject/halt), BFS explores them all and rejects. But if \(N\) loops on \(w\) (some branches loop, none accept), the BFS may never terminate — it keeps exploring deeper levels of an infinite tree.
Multitape Turing Machines
A \(k\)-tape TM has \(k\) separate infinite tapes, each with its own head. The transition function reads all \(k\) heads simultaneously and can write to all tapes and move all heads independently: \(\delta: Q \times \Gamma^k \to Q \times \Gamma^k \times \{L, R, S\}^k\) (where \(S\) = stay).
Theorem 3.13 (Sipser). Every multitape Turing machine has an equivalent single-tape Turing machine.
Proof idea: Simulate \(k\) tapes on one tape by storing them separated by \(\#\) delimiters: \(\#\dot{w}_1\#\dot{w}_2\#\cdots\#\dot{w}_k\#\). Mark each virtual head position with a dotted version of the symbol. To simulate one step: scan the entire tape to find all \(k\) head positions, compute the transition, scan again to update.
The robustness of Turing machines. This is a key theme: reasonable modifications to the TM model (multiple tapes, nondeterminism, two-way infinite tape, stay-put moves, etc.) never change what's computable. They may change efficiency (multitape can be polynomially faster), but not computability. This "robustness" is unique to TMs — finite automata are robust too (DFA = NFA), but PDAs are not (DPDA ⊊ NPDA). It's one of the reasons we believe TMs capture the right notion of computation.
The Church-Turing Thesis
We've now seen that multiple TM variants — multitape, nondeterministic, two-way infinite tapes, enumerators (covered in Decidability) — all recognize exactly the same class of languages. No reasonable enhancement to the model adds computational power. This is remarkable: unlike the DFA-to-PDA-to-TM jumps, where each new model was strictly more powerful, once we reach Turing machines, we're stuck. There's no "level above" that computes more.
This observation, combined with equivalent formalisms developed independently (Alonzo Church's \(\lambda\)-calculus, Emil Post's production systems, recursive functions), leads to one of the most important ideas in computer science:
The Church-Turing Thesis. The intuitive notion of "algorithm" — any procedure that a human could carry out mechanically, given unlimited time and paper — is captured exactly by the Turing machine model.
Equivalently: if a function is computable by any reasonable mechanical process, it is computable by a Turing machine.
This is a thesis, not a theorem. It cannot be proved because "algorithm" is an informal concept — you can't formally prove that a formal definition captures an informal idea. But the evidence is overwhelming:
Every alternative formalism ever proposed (lambda calculus, Post systems, RAM machines, cellular automata, quantum computers for decision problems) computes exactly the same class of functions
No one has ever found a "natural" problem that is intuitively computable but not TM-computable
The TM model is robust under all reasonable modifications (as we just saw)
The thesis has been universally accepted since the 1930s. If a counterexample existed, it would overturn the foundations of computer science.
What this means in practice: When we prove that something is undecidable (no TM can decide it), the Church-Turing thesis tells us that no algorithm of any kind can solve it — not just that no TM can. This is why undecidability results (coming in later pages) are so powerful: they're not limitations of a specific model, but limitations of computation itself.
It also justifies Sipser's "high-level descriptions" — when we write \(M =\) "On input \(w\): [English algorithm]", the thesis guarantees that a real TM implementing this algorithm exists.
The Complete Hierarchy
DFA
NFA
PDA
TM
Tuple
5-tuple
5-tuple
6-tuple
7-tuple
Memory
Current state
Set of states
State + stack
State + infinite r/w tape
Input processing
Read only, L→R
Read only, L→R
Read only, L→R
Read/write, L↔R
Deterministic?
Yes
No
No
Yes (standard)
Nondeterminism adds power?
No (DFA = NFA)
Yes! (DPDA ⊊ PDA)
No (DTM = NDTM)
Halting behavior
Always halts
Always halts
May loop forever
Outcomes
Accept / Reject
Accept / Reject / Loop
Accept states
Set \(F \subseteq Q\)
Single \(q_{\text{accept}}\)
Language class
Regular
Context-free
Turing-recognizable
Generative equivalent
Regular expressions
CFGs
Unrestricted grammars
Example beyond prev. class
—
\(\{0^n1^n\}\), \(\{ww^R\}\)
\(\{a^nb^nc^n\}\), \(\{ww\}\), \(\{0^{2^n}\}\)
Key takeaways from the table:
TMs are the first machine where looping is possible — this creates the recognizable/decidable distinction
TMs return to the DFA-like property that nondeterminism doesn't add power (unlike PDAs)
The jump from "read-only, left-to-right" to "read/write, bidirectional" is what unlocks full computational power
The 7-tuple adds \(\Gamma\) (tape alphabet), replaces \(F\) with individual \(q_{\text{accept}}\) and \(q_{\text{reject}}\)
Exercises
Exercise 1: Design a TM (implementation level)
Give an implementation-level description of a TM that decides the language:
\(L = \{w \in \{a,b\}^* \mid w \text{ contains an equal number of a's and b's}\}\)
Hint: The strategy is similar to the \(\{0^n1^n\}\) TM but you can't assume all a's come before all b's. Think about repeatedly finding and crossing off one a-b pair.
M = "On input w:
1. If the tape is blank, accept. (Empty string has 0 a's and 0 b's.)
2. Scan right from the current position to find the first uncrossed 'a'. If none found:
- Scan for any uncrossed 'b'. If found, reject (more b's than a's). If not found, accept.
3. Cross off the 'a' (write x). Now scan right for the first uncrossed 'b'.
- If found, cross it off (write x). Return head to leftmost position. Go to step 2.
- If not found, reject (more a's than b's).
Why this works: Each pass through step 2-3 eliminates one a-b pair. After all pairs are eliminated, we check whether any uncrossed symbols remain. If not, the counts were equal.
Why this is beyond a PDA: Wait — this IS context-free! {equal a's and b's} can be generated by a CFG. The TM approach works but is overkill. A PDA could handle this using a stack. The TM approach generalizes to cases where a PDA cannot, like {equal a's, b's, AND c's}.
Exercise 2: Recognizer vs. Decider
Consider these two TMs:
\(M_1\): On input \(w\): if \(w\) has even length, accept. Otherwise, move right forever.
\(M_2\): On input \(w\): if \(w\) has even length, accept. Otherwise, reject.
(a) Is \(M_1\) a recognizer for the language of even-length strings? (b) Is \(M_1\) a decider? (c) Is \(M_2\) a decider? (d) Do \(M_1\) and \(M_2\) recognize the same language?
(a) Yes. M1 accepts all even-length strings and never accepts odd-length strings (it loops on them). So L(M1) = {w : |w| is even}. It recognizes this language.
(b) No. M1 loops on odd-length strings — it doesn't halt on all inputs. A decider must halt on every input.
(c) Yes. M2 halts on every input: it accepts even-length strings and rejects odd-length strings. It's a decider, so the language is decidable. (And of course it is — even-length strings form a regular language!)
(d) Yes. Both recognize {w : |w| is even}. They have the same language. They differ only in behavior on non-members: M1 loops, M2 rejects. But the language (set of accepted strings) is the same.
Exercise 3: Configuration Traces
Consider the TM with states \(\{q_0, q_1, q_2, q_3, q_{\text{acc}}\}\), \(\Sigma = \{0,1\}\), \(\Gamma = \{0,1,\sqcup\}\), and transitions:
State
Read
Write
Move
Next State
\(q_0\)
0
1
R
\(q_0\)
\(q_0\)
1
1
R
\(q_1\)
\(q_1\)
0
1
R
\(q_1\)
\(q_1\)
1
L
L
\(q_2\)
\(q_2\)
1
0
R
\(q_{\text{acc}}\)
\(q_0\)
\(\sqcup\)
\(\sqcup\)
R
\(q_{\text{acc}}\)
Write the sequence of configurations for input 010. What does this TM do?
Trace on input 010:
q0 010 — read 0, write 1, move R
1 q0 10 — read 1, write 1, move R, enter q1
11 q1 0 — read 0, write 1, move R
111 q1 ⊔ — read ⊔ ... no transition for (q1, ⊔)!
The machine has no transition for (q1, ⊔), so it implicitly goes to q_reject.
REJECT.
Trace on input 011:
q0 011 — read 0, write 1, move R
1 q0 11 — read 1, write 1, move R, enter q1
11 q1 1 — read 1, write L, move L, enter q2
1 q2 1L — read 1, write 0, move R, enter q_acc
10 q_acc L
ACCEPT.
What it does: The TM converts leading 0s to 1s (q0), then looks for two consecutive 1s in the original input (q0→q1 on reading 1, then q1 reads another 1 → q2). When it finds "11", it backs up and replaces the first 1 with 0, then accepts. It accepts strings that contain the substring "11" somewhere after any number of leading 0s. More precisely, it seems to check for the pattern "11" in the input.
Summary
Turing Machines — the essentials:
7-tuple: \((Q, \Sigma, \Gamma, \delta, q_0, q_{\text{accept}}, q_{\text{reject}})\) — adds tape alphabet \(\Gamma\), replaces accept set \(F\) with individual accept/reject states
The upgrade from DFA: read/write tape + bidirectional movement. That's it. And it changes everything.
Transition function: \(\delta: Q \times \Gamma \to Q \times \Gamma \times \{L, R\}\) — deterministic, total (except at halting states)
Variants: multitape TMs, nondeterministic TMs — all equivalent in power to single-tape deterministic TMs
Robustness: unlike PDAs, TMs are invariant under nondeterminism. This is evidence that TMs capture the "right" notion of computation.
Design patterns: zig-zag marking (cross off matched pairs), sweeping (repeated passes that halve the problem), shuttling (compare distant sections by bouncing back and forth)