Before any math, the core idea: a Fourier Transform decomposes a signal into the frequencies that make it up.
Imagine recording a piano chord — three keys struck simultaneously. The microphone captures a single complex waveform: pressure varying over time. The Fourier Transform takes that single waveform and tells you which notes (frequencies) are present and how loud each one is.
The FT converts one representation to the other. A combined messy wave in → clean spikes at individual frequencies out.
Mathematically, the (continuous) Fourier Transform takes a function \(f(t)\) and produces \(\hat{f}(\nu)\):
$$\hat{f}(\nu) = \int_{-\infty}^{\infty} f(t)\, e^{-2\pi i \nu t}\, dt$$
Each frequency component \(\nu\) gets a coefficient. A spike at frequency \(\nu_0\) means the signal has a strong component oscillating at that rate. The exponential \(e^{-2\pi i \nu t}\) is a "probe" that tests how much the signal matches a particular frequency.
That's the full continuous version. In practice (and in computing), we work with finite lists of numbers, not continuous functions. That's where the DFT comes in.
The Discrete Fourier Transform does the same thing as the continuous FT, but for a finite list of \(N\) numbers instead of a continuous signal. Instead of integrating over all time, we sum over \(N\) data points.
\(N\) = the size of your input list (and output list). In classical signal processing, \(N\) is just the number of data points. In quantum computing, \(N = 2^n\) where \(n\) is the number of qubits — because \(n\) qubits have \(2^n\) basis states.
| Qubits (\(n\)) | Basis states (\(N = 2^n\)) | DFT matrix size | Example |
|---|---|---|---|
| 1 | 2 (\(\ket{0}, \ket{1}\)) | \(F_2\) = 2×2 | \(F_2 = H\) (Hadamard) |
| 2 | 4 (\(\ket{00}\) to \(\ket{11}\)) | \(F_4\) = 4×4 | \(\omega_4 = i\) |
| 3 | 8 (\(\ket{000}\) to \(\ket{111}\)) | \(F_8\) = 8×8 | \(\omega_8 = e^{i\pi/4}\) |
So \(F_2\) is a 1-qubit gate, \(F_4\) is a 2-qubit gate, \(F_8\) is a 3-qubit gate. The subscript is the number of basis states, not qubits.
The DFT maps a vector of \(N\) complex numbers to another vector of \(N\) complex numbers:
$$y_k = \frac{1}{\sqrt{N}} \sum_{j=0}^{N-1} x_j \, \omega_N^{jk}$$
where \(\omega_N = e^{2\pi i/N}\) is the primitive \(N\)-th root of unity — a point on the unit circle at angle \(2\pi/N\) (i.e., \(1/N\) of a full turn). Key property: \(\omega_N^N = 1\), so the roots of unity cycle back to 1 after \(N\) steps.
The formula computes one output at a time. \(y_k\) is the \(k\)-th entry of the output vector. To get all \(N\) outputs, you run the formula \(N\) times (once per \(k\)).
Take \(N = 4\). Then \(\omega_4 = e^{2\pi i/4} = i\) (a quarter turn on the unit circle). Say our input is \((x_0, x_1, x_2, x_3)\). The DFT produces 4 outputs:
\(k = 0\): \(\;y_0 = \tfrac{1}{2}(x_0 \cdot \omega^0 + x_1 \cdot \omega^0 + x_2 \cdot \omega^0 + x_3 \cdot \omega^0) = \tfrac{1}{2}(x_0 + x_1 + x_2 + x_3)\)
All weights are 1. Just the average (scaled).
\(k = 1\): \(\;y_1 = \tfrac{1}{2}(x_0 \cdot \omega^0 + x_1 \cdot \omega^1 + x_2 \cdot \omega^2 + x_3 \cdot \omega^3) = \tfrac{1}{2}(x_0 + x_1 \cdot i + x_2 \cdot (-1) + x_3 \cdot (-i))\)
Quarter-turn steps: 0°, 90°, 180°, 270°.
\(k = 2\): \(\;y_2 = \tfrac{1}{2}(x_0 \cdot \omega^0 + x_1 \cdot \omega^2 + x_2 \cdot \omega^4 + x_3 \cdot \omega^6) = \tfrac{1}{2}(x_0 - x_1 + x_2 - x_3)\)
Half-turn steps: 0°, 180°, 360°=0°, 540°=180°. Note: \(\omega^4 = 1\) (cycled!).
\(k = 3\): \(\;y_3 = \tfrac{1}{2}(x_0 \cdot \omega^0 + x_1 \cdot \omega^3 + x_2 \cdot \omega^6 + x_3 \cdot \omega^9) = \tfrac{1}{2}(x_0 - ix_1 - x_2 + ix_3)\)
Three-quarter-turn steps.
Each input \(x_j\) gets multiplied by a point on the unit circle. Which point? The one at \(j \times k\) steps around the circle (that's the \(jk\) in \(\omega^{jk}\)).
\(k\) controls how fast you spin — it's the "candidate frequency." \(k=0\) is no spin (DC/average), higher \(k\) = faster spin. If the input data has a pattern matching frequency \(k\), the terms add up (constructive interference) and \(y_k\) is big. If not, they cancel (destructive interference) and \(y_k\) is small.
Total work: \(N\) outputs × \(N\) terms each = \(N^2\) multiplications. That's why the naive DFT is \(O(N^2)\).
The DFT can be written as matrix multiplication \(\vec{y} = F_N \vec{x}\), where the DFT matrix \(F_N\) has entries:
$$(F_N)_{jk} = \frac{1}{\sqrt{N}} \omega_N^{jk}$$
\(F_N F_N^\dagger = I\). This is what makes the DFT a valid quantum gate — unitarity is the only requirement for a quantum operation. The inverse DFT is simply \(F_N^\dagger\), which has entries \(\frac{1}{\sqrt{N}} \omega_N^{-jk}\).
The matrix entry at row \(j\), column \(k\) is \(\frac{1}{\sqrt{N}} \omega_N^{jk}\). That's it — compute \(j \times k\), use it as the exponent of \(\omega_N\). Let's build both \(F_2\) and \(F_4\) from scratch.
For \(N = 2\): \(\omega_2 = e^{2\pi i/2} = e^{i\pi} = -1\). The matrix is 2×2, so \(j \in \{0, 1\}\) and \(k \in \{0, 1\}\).
Compute every entry by plugging \(j\) and \(k\) into \(\omega_2^{jk} = (-1)^{jk}\):
| Row \(j\) | Col \(k\) | \(j \times k\) | \(\omega_2^{jk} = (-1)^{jk}\) | Value |
|---|---|---|---|---|
| 0 | 0 | 0 | \((-1)^0\) | 1 |
| 0 | 1 | 0 | \((-1)^0\) | 1 |
| 1 | 0 | 0 | \((-1)^0\) | 1 |
| 1 | 1 | 1 | \((-1)^1\) | −1 |
Put them into the matrix (don't forget the \(1/\sqrt{N} = 1/\sqrt{2}\) prefactor):
$$F_2 = \frac{1}{\sqrt{2}} \begin{pmatrix} 1 & 1 \\ 1 & -1 \end{pmatrix} = H$$
That's the Hadamard gate. The single-qubit QFT is \(H\) — it was always "secretly" a Fourier transform over 2 elements.
For \(N = 4\): \(\omega_4 = e^{2\pi i/4} = e^{i\pi/2} = i\). The matrix is 4×4, so \(j, k \in \{0, 1, 2, 3\}\). We need \(\omega_4^{jk} = i^{jk}\).
First, the powers of \(i\) cycle every 4: \(i^0 = 1,\; i^1 = i,\; i^2 = -1,\; i^3 = -i,\; i^4 = 1,\; \ldots\)
| \(j \times k\) | \(k=0\) | \(k=1\) | \(k=2\) | \(k=3\) |
|---|---|---|---|---|
| \(j=0\) | \(0 \times 0 = 0\) | \(0 \times 1 = 0\) | \(0 \times 2 = 0\) | \(0 \times 3 = 0\) |
| \(j=1\) | \(1 \times 0 = 0\) | \(1 \times 1 = 1\) | \(1 \times 2 = 2\) | \(1 \times 3 = 3\) |
| \(j=2\) | \(2 \times 0 = 0\) | \(2 \times 1 = 2\) | \(2 \times 2 = 4\) | \(2 \times 3 = 6\) |
| \(j=3\) | \(3 \times 0 = 0\) | \(3 \times 1 = 3\) | \(3 \times 2 = 6\) | \(3 \times 3 = 9\) |
Now convert each \(jk\) to \(i^{jk}\) (reduce mod 4, since \(i^4 = 1\)):
| \(i^{jk}\) | \(k=0\) | \(k=1\) | \(k=2\) | \(k=3\) |
|---|---|---|---|---|
| \(j=0\) | \(i^0 = 1\) | \(i^0 = 1\) | \(i^0 = 1\) | \(i^0 = 1\) |
| \(j=1\) | \(i^0 = 1\) | \(i^1 = i\) | \(i^2 = -1\) | \(i^3 = -i\) |
| \(j=2\) | \(i^0 = 1\) | \(i^2 = -1\) | \(i^4 = 1\) | \(i^6 = -1\) |
| \(j=3\) | \(i^0 = 1\) | \(i^3 = -i\) | \(i^6 = -1\) | \(i^9 = i\) |
Add the \(1/\sqrt{4} = 1/2\) prefactor:
$$F_4 = \frac{1}{2}\begin{pmatrix} 1 & 1 & 1 & 1 \\ 1 & i & -1 & -i \\ 1 & -1 & 1 & -1 \\ 1 & -i & -1 & i \end{pmatrix}$$
1. Compute \(\omega_N = e^{2\pi i/N}\). 2. Fill in the grid: entry at row \(j\), column \(k\) is \(\omega_N^{j \times k}\) (reduce the exponent mod \(N\) since \(\omega_N^N = 1\)). 3. Multiply everything by \(1/\sqrt{N}\). That's the entire DFT matrix.
The first row is always all 1s (because \(j = 0\)). The first column is always all 1s (because \(k = 0\)). The interesting structure is in the bottom-right.
The Quantum Fourier Transform applies the DFT matrix to the amplitudes of a quantum state. If a state is written in the computational basis as \(\ket{\psi} = \sum_x \alpha_x \ket{x}\), then:
$$\text{QFT}\ket{\psi} = \sum_y \beta_y \ket{y} \quad \text{where} \quad \beta_y = \frac{1}{\sqrt{N}} \sum_{x=0}^{N-1} \alpha_x \, \omega_N^{xy}$$
On a single computational basis state:
$$\text{QFT}\ket{x} = \frac{1}{\sqrt{N}} \sum_{y=0}^{N-1} \omega_N^{xy} \ket{y} = \frac{1}{\sqrt{N}} \sum_{y=0}^{N-1} e^{2\pi i xy/N} \ket{y}$$
Let's compute QFT\(\ket{5}\) on 3 qubits (\(N = 8\), \(\omega_8 = e^{2\pi i/8} = e^{i\pi/4}\)). Plug \(x = 5\) into the formula:
$$\text{QFT}\ket{5} = \frac{1}{\sqrt{8}} \sum_{y=0}^{7} \omega_8^{5y} \ket{y}$$
Computing each term's phase \(\omega_8^{5y}\) (multiply the exponent by 5, then reduce mod 8 since \(\omega_8^8 = 1\)):
| \(y\) | \(5y\) | \(5y \bmod 8\) | \(\omega_8^{5y}\) | Value |
|---|---|---|---|---|
| 0 | 0 | 0 | \(\omega_8^0\) | 1 |
| 1 | 5 | 5 | \(\omega_8^5\) | \(e^{i5\pi/4}\) |
| 2 | 10 | 2 | \(\omega_8^2\) | \(i\) |
| 3 | 15 | 7 | \(\omega_8^7\) | \(e^{i7\pi/4}\) |
| 4 | 20 | 4 | \(\omega_8^4\) | \(-1\) |
| 5 | 25 | 1 | \(\omega_8^1\) | \(e^{i\pi/4}\) |
| 6 | 30 | 6 | \(\omega_8^6\) | \(-i\) |
| 7 | 35 | 3 | \(\omega_8^3\) | \(e^{i3\pi/4}\) |
Every term has amplitude \(1/\sqrt{8}\), so every \(\ket{y}\) is equally likely to be measured. The information about \(x = 5\) is entirely in the phases, invisible to measurement. You'd need QFT\(^\dagger\) to convert those phases back into a measurable spike at \(\ket{5}\).
To build intuition for what the QFT is "actually doing," map the classical Fourier Transform concepts onto the quantum setting:
| Classical FT | QFT |
|---|---|
| Time points \(t = 0, 1, \ldots, N-1\) | Computational basis states \(\ket{0}, \ket{1}, \ldots, \ket{N-1}\) |
| Signal value at time \(t\) | Amplitude \(\alpha_x\) of basis state \(\ket{x}\) |
| Frequency components | Output amplitudes \(\beta_y\) after QFT |
| Spike at frequency \(\nu\) | Large amplitude on \(\ket{y}\) — means "period \(N/y\) divides \(N\)" |
Take \(N = 8\) (3 qubits). Suppose your amplitudes repeat with period 4: the state has equal amplitude on \(\ket{0}\) and \(\ket{4}\), i.e. \(\ket{\psi} = \frac{1}{\sqrt{2}}(\ket{0} + \ket{4})\). The nonzero amplitudes are spaced 4 apart.
Apply the QFT. The output has spikes at multiples of \(N/r = 8/4 = 2\): you get equal amplitudes on \(\ket{0}, \ket{2}, \ket{4}, \ket{6}\). Measuring gives one of \(\{0, 2, 4, 6\}\) uniformly at random. Any of these outcomes tells you the spacing is 2, so \(r = N/\text{spacing} = 8/2 = 4\).
This is exactly the piano analogy: your "signal" has a repeating pattern with period 4, and the QFT produces a spike at the corresponding frequency.
Starting in Week 6, you'll see states written as \(\ket{5}\) instead of \(\ket{101}\). These are the same thing: \(\ket{k}\) means the computational basis state whose binary representation is \(k\).
The number of qubits is determined by context (usually \(n\) where \(N = 2^n\)). For \(N = 8\), \(\ket{5}\) is a 3-qubit state. For \(N = 16\), \(\ket{5} = \ket{0101}\) is a 4-qubit state.
In Deutsch-Jozsa and Bernstein-Vazirani, the oracle encoded information as \((-1)^{f(x)}\) — phases of either \(+1\) or \(-1\). That's the Hadamard world: two points on the unit circle (0 and \(\pi\)).
The QFT works with the full unit circle. A phase \(e^{i\theta}\) can be any point on the circle — any angle \(\theta \in [0, 2\pi)\). The \(N\)-th roots of unity \(\omega_N^k = e^{2\pi i k/N}\) are \(N\) equally spaced points on the unit circle.
More qubits = more points on the circle = finer phase resolution. The Hadamard was a Fourier transform that could only tell apart "in phase" vs "out of phase." The QFT can distinguish \(N\) different phases.
Here's the fundamental problem: if a basis state \(\ket{x}\) has amplitude \(\alpha_x = r \cdot e^{i\theta}\), then its measurement probability is \(|\alpha_x|^2 = r^2\). The phase \(e^{i\theta}\) vanishes — \(|e^{i\theta}|^2 = 1\) for any angle \(\theta\).
So if an oracle encodes its answer in the phases of your state, measuring directly gives you nothing — all phases produce the same probabilities. You need a Fourier transform (or Hadamard, in the simpler case) to convert those phase differences into amplitude differences that measurement can detect.
This is exactly why you needed the second \(H^{\otimes n}\) after the oracle in Deutsch-Jozsa and Bernstein-Vazirani. The oracle put the answer in the phases. The second Hadamard converted those phases into measurable amplitudes. The QFT does the same job for richer phase structures.
The QFT converts between two representations. Bar height = amplitude (visible to measurement). Bar color = phase (invisible to measurement).
Left: measure → get 0 with certainty. Right: all bars same height AND same color (phase +1). Pattern is "flat." Measure → get 0, 1, 2, or 3 each with 25%.
Same bar heights as QFT\(\ket{0}\) — measurement gives the same 25% each. But the colors are different: +1 → i → −1 → −i. The phases cycle through all 4 values once. That's "frequency 1."
Phases alternate: +1, −1, +1, −1. Only two colors — this is the Hadamard pattern (N=2 repeated). Frequency 2 = the phase flips twice across 4 slots.
QFT† reads the color pattern (+1, i, −1, −i), recognizes it as "frequency 1," and concentrates all amplitude on \(\ket{1}\). The invisible phases become a visible spike. Measure → get 1 with certainty.
QPE with the T-gate (θ = 1/8), 3 counting qubits. Left: 8 bars, equal height, 8 different colors — measuring gives a random 0–7. Right: QFT† reads the pattern (phases cycle once through all 8 positions = frequency 1), concentrates everything on \(\ket{1}\). Measure → 1 → θ = 1/8.
Quantum algorithms hide their answers in phase patterns (colors). Measurement can't see phases. QFT† converts the invisible color pattern into a visible height spike that measurement can read.
We now work with \(N = 2^n\) (since we have \(n\) qubits). The QFT on computational basis states is:
$$\text{QFT}\ket{x} = \frac{1}{\sqrt{2^n}} \sum_{y=0}^{2^n-1} e^{2\pi i xy/2^n} \ket{y}$$
The magic of the QFT circuit comes from rewriting this sum as a tensor product. Write \(x\) in binary as \(x = x_1 x_2 \ldots x_n\) (where \(x_1\) is the most significant bit), and use the binary fraction notation:
$$0.x_\ell x_{\ell+1} \ldots x_m = \frac{x_\ell}{2} + \frac{x_{\ell+1}}{4} + \cdots + \frac{x_m}{2^{m-\ell+1}}$$
Through algebraic manipulation (splitting the sum over \(y = y_1 y_2 \ldots y_n\) into \(n\) independent sums over each bit \(y_k\)), we arrive at the product representation:
$$\text{QFT}\ket{x_1 x_2 \ldots x_n} = \frac{1}{\sqrt{2^n}} \bigotimes_{k=1}^{n} \left(\ket{0} + e^{2\pi i \, 0.x_{n-k+1} \ldots x_n}\ket{1}\right)$$
Written out explicitly for each qubit:
$$\text{QFT}\ket{x} = \frac{1}{\sqrt{2^n}} \left(\ket{0} + e^{2\pi i \, 0.x_n}\ket{1}\right) \otimes \left(\ket{0} + e^{2\pi i \, 0.x_{n-1}x_n}\ket{1}\right) \otimes \cdots \otimes \left(\ket{0} + e^{2\pi i \, 0.x_1 x_2 \ldots x_n}\ket{1}\right)$$
Take \(x = 5 = 101\) in binary, so \(x_1 = 1, x_2 = 0, x_3 = 1\). Let's compute each qubit's phase:
Qubit 1 (output): phase = \(e^{2\pi i \cdot 0.x_3}\) = \(e^{2\pi i \cdot 0.1}\) = \(e^{2\pi i \cdot 1/2}\) = \(e^{i\pi}\) = \(-1\)
\(0.1\) in binary = \(1/2\). So this qubit is \(\frac{1}{\sqrt{2}}(\ket{0} + (-1)\ket{1}) = \ket{-}\).
Qubit 2 (output): phase = \(e^{2\pi i \cdot 0.x_2 x_3}\) = \(e^{2\pi i \cdot 0.01}\) = \(e^{2\pi i \cdot 1/4}\) = \(e^{i\pi/2}\) = \(i\)
\(0.01\) in binary = \(0/2 + 1/4 = 1/4\). So this qubit is \(\frac{1}{\sqrt{2}}(\ket{0} + i\ket{1}) = \ket{+i}\).
Qubit 3 (output): phase = \(e^{2\pi i \cdot 0.x_1 x_2 x_3}\) = \(e^{2\pi i \cdot 0.101}\) = \(e^{2\pi i \cdot 5/8}\) = \(e^{i5\pi/4}\)
\(0.101\) in binary = \(1/2 + 0/4 + 1/8 = 5/8\). This qubit sits at angle \(5\pi/4\) on the unit circle (225°).
Result: QFT\(\ket{101} = \frac{1}{\sqrt{8}} \ket{-} \otimes \ket{+i} \otimes (\ket{0} + e^{i5\pi/4}\ket{1})\)
Each qubit is independently in a superposition. The coarsest qubit (qubit 1) only sees the last bit of \(x\). The finest qubit (qubit 3) sees all three bits. This is the clock analogy: hour hand, minute hand, second hand.
Binary fractions work like decimal fractions, but each position is a power of 2 instead of 10:
| Binary | Calculation | Decimal |
|---|---|---|
| \(0.1\) | \(1/2\) | \(0.5\) |
| \(0.01\) | \(0/2 + 1/4\) | \(0.25\) |
| \(0.11\) | \(1/2 + 1/4\) | \(0.75\) |
| \(0.101\) | \(1/2 + 0/4 + 1/8\) | \(0.625\) |
| \(0.110\) | \(1/2 + 1/4 + 0/8\) | \(0.75\) |
In the product form, each qubit's phase is \(e^{2\pi i \cdot (\text{binary fraction})}\). The fraction determines the angle on the unit circle. For example, \(0.101 = 5/8\), so the angle is \(2\pi \cdot 5/8 = 5\pi/4\).
Each output qubit's state depends on the input \(x\) only through a phase rotation. The \(k\)-th output qubit gets a phase of \(e^{2\pi i \, x/2^k}\). Since each qubit can be prepared independently (up to controlled rotations from other qubits that determine the binary fraction), we can build the QFT with a circuit that acts qubit-by-qubit.
Without the product form, implementing the QFT would require a single \(2^n \times 2^n\) unitary — exponentially large. The product form reduces it to \(O(n^2)\) individual gates.
Start from the definition and write \(y\) in binary \(y = y_1 y_2 \ldots y_n\), so \(y = \sum_{k=1}^n y_k 2^{n-k}\):
$$\text{QFT}\ket{x} = \frac{1}{\sqrt{2^n}} \sum_{y=0}^{2^n-1} e^{2\pi i xy/2^n} \ket{y}$$
Substitute \(y = \sum_k y_k 2^{n-k}\):
$$= \frac{1}{\sqrt{2^n}} \sum_{y_1=0}^{1} \cdots \sum_{y_n=0}^{1} e^{2\pi i x \sum_k y_k 2^{n-k}/2^n} \ket{y_1 \ldots y_n}$$
The exponent factors: \(e^{2\pi i x \sum_k y_k / 2^k} = \prod_k e^{2\pi i x y_k / 2^k}\). And \(\ket{y_1 \ldots y_n} = \ket{y_1} \otimes \cdots \otimes \ket{y_n}\). So the whole sum factors into a tensor product:
$$= \frac{1}{\sqrt{2^n}} \bigotimes_{k=1}^{n} \left( \sum_{y_k=0}^{1} e^{2\pi i x y_k / 2^k} \ket{y_k} \right) = \frac{1}{\sqrt{2^n}} \bigotimes_{k=1}^{n} \left(\ket{0} + e^{2\pi i x/2^k}\ket{1}\right)$$
Finally, note that \(e^{2\pi i x/2^k}\) only depends on the last \(k\) bits of \(x\) (since \(e^{2\pi i \cdot \text{integer}} = 1\)), which gives us the binary fraction form \(0.x_{n-k+1}\ldots x_n\).
Think of the Fourier basis representation as a clock analogy:
This is like clock hands: the "seconds hand" (least significant Fourier qubit) moves slowly, while the "milliseconds hand" (most significant Fourier qubit) spins rapidly. The state \(\ket{0} + e^{2\pi i \cdot 0.x_n}\ket{1}\) only cares about the last bit of \(x\) — it flips between \(\ket{+}\) and \(\ket{-}\) with each increment. The deeper qubits encode finer and finer phase information.
For \(n = 3\) qubits (so \(N = 8\)), the product form says qubit \(k\) gets a phase of \(e^{2\pi i x / 2^k}\). The denominator \(2^k\) determines how many distinct positions that qubit can take — like a clock hand with \(2^k\) tick marks:
| Qubit | Phase formula | Distinct positions | Clock hand | Angles for \(x = 0, 1, \ldots, 7\) |
|---|---|---|---|---|
| Qubit 1 (output) | \(e^{2\pi i x / 2}\) | 2 | Hour hand | \(0, \pi, 0, \pi, 0, \pi, 0, \pi\) |
| Qubit 2 (output) | \(e^{2\pi i x / 4}\) | 4 | Minute hand | \(0, \frac{\pi}{2}, \pi, \frac{3\pi}{2}, 0, \frac{\pi}{2}, \pi, \frac{3\pi}{2}\) |
| Qubit 3 (output) | \(e^{2\pi i x / 8}\) | 8 | Second hand | \(0, \frac{\pi}{4}, \frac{\pi}{2}, \frac{3\pi}{4}, \pi, \frac{5\pi}{4}, \frac{3\pi}{2}, \frac{7\pi}{4}\) |
Qubit 1 (the "hour hand") has only 2 positions — it alternates \(+1, -1, +1, -1, \ldots\) as \(x\) increases. This is exactly what the Hadamard does. Qubit 2 (the "minute hand") has 4 positions — it cycles through \(1, i, -1, -i\). Qubit 3 (the "second hand") has 8 positions — it uses all 8th roots of unity and completes one full revolution as \(x\) goes from 0 to 7.
The coarser qubits carry the rough frequency information; the finer qubits carry the precision. Together they "read" the phase to \(n\) bits of accuracy.
The product representation tells us exactly how to build the circuit. We need two types of gates:
Notable special cases: \(R_1 = Z\), \(R_2 = S\) (phase gate), \(R_3 = T\) (\(\pi/8\) gate).
The circuit processes qubits from top to bottom. For an \(n\)-qubit input \(\ket{x_1 x_2 \ldots x_n}\):
For \(n = 3\), the circuit (before SWAPs) looks like:
|x₁⟩ ─── H ── R₂ ── R₃ ──────────────────── ×
│ │ │
|x₂⟩ ──────── •──── │──── H ── R₂ ────────── │
│ │ │
|x₃⟩ ───────────── •──────── •──── H ──── ×
Let's walk through what each gate does:
The total state (after SWAP) is:
$$\frac{1}{\sqrt{8}} \left(\ket{0} + e^{2\pi i \cdot 0.x_3}\ket{1}\right) \otimes \left(\ket{0} + e^{2\pi i \cdot 0.x_2 x_3}\ket{1}\right) \otimes \left(\ket{0} + e^{2\pi i \cdot 0.x_1 x_2 x_3}\ket{1}\right)$$
which matches the product representation of the QFT.
Input: \(x_1 = 1, x_2 = 0, x_3 = 1\). Trace each gate with actual numbers:
Key observation: When a control qubit is 0, the controlled-\(R_k\) doesn't fire — that bit contributes 0 to the binary fraction and the phase stays unchanged. This is how the circuit naturally builds up the binary fraction one bit at a time.
Total: \(n + (n-1) + \cdots + 1 + \lfloor n/2 \rfloor = \frac{n(n+1)}{2} + \lfloor n/2 \rfloor = O(n^2)\) gates.
Compare to the classical Fast Fourier Transform (FFT): \(O(N \log N) = O(n \cdot 2^n)\) operations. The QFT is exponentially faster.
The QFT is exponentially faster, but you can't read out the full Fourier-transformed amplitudes. Measuring gives you one sample from the output distribution. The power of the QFT comes from using it as a subroutine inside larger algorithms (like QPE or Shor's), where that single measurement reveals exactly the information you need.
The simplest case: \(\text{QFT}\ket{0} = \frac{1}{\sqrt{N}} \sum_{y=0}^{N-1} e^{0} \ket{y} = \frac{1}{\sqrt{N}} \sum_y \ket{y} = \ket{+}^{\otimes n}\).
The QFT of \(\ket{0}\) is the uniform superposition. This makes sense: in the Fourier basis, the "zero frequency" component is the uniform (DC) signal.
Suppose we have a state that is an equal superposition over evenly spaced computational basis states with spacing \(r\) (assuming \(r\) divides \(N\)):
$$\ket{\psi_{\text{periodic}}} = \sqrt{\frac{r}{N}} \sum_{j=0}^{N/r-1} \ket{j \cdot r}$$
The QFT maps this to another periodic state with spacing \(N/r\):
$$\text{QFT}\ket{\psi_{\text{periodic}}} = \frac{1}{\sqrt{r}} \sum_{k=0}^{r-1} \ket{k \cdot N/r}$$
This is the quantum version of the classical fact that the Fourier transform of a periodic signal with period \(r\) has peaks at multiples of the fundamental frequency \(N/r\). Measuring the output gives a random multiple of \(N/r\), from which you can extract \(r\). This is exactly how Shor's algorithm finds the period of modular exponentiation.
Apply the QFT definition:
$$\text{QFT}\left(\sqrt{\frac{r}{N}} \sum_{j=0}^{N/r-1} \ket{jr}\right) = \sqrt{\frac{r}{N}} \cdot \frac{1}{\sqrt{N}} \sum_{j=0}^{N/r-1} \sum_{y=0}^{N-1} e^{2\pi i \cdot jr \cdot y/N} \ket{y}$$
Swap the sums and examine the inner sum over \(j\):
$$\sum_{j=0}^{N/r-1} e^{2\pi i \cdot jr \cdot y/N} = \sum_{j=0}^{N/r-1} \left(e^{2\pi i r y/N}\right)^j$$
This is a geometric series. It equals \(N/r\) when \(e^{2\pi i ry/N} = 1\) (i.e., when \(y\) is a multiple of \(N/r\)), and equals 0 otherwise (the terms cancel). So only the values \(y = k \cdot N/r\) for \(k = 0, 1, \ldots, r-1\) survive, each contributing \(N/r\). The overall coefficient becomes \(\sqrt{r/N} \cdot (1/\sqrt{N}) \cdot (N/r) = 1/\sqrt{r}\).
Since \(F_N\) is unitary, its inverse is its conjugate transpose \(F_N^\dagger = F_N^{-1}\):
$$\text{QFT}^{-1}\ket{y} = \text{QFT}^\dagger\ket{y} = \frac{1}{\sqrt{N}} \sum_{x=0}^{N-1} e^{-2\pi i xy/N} \ket{x}$$
The only difference from the forward QFT is the sign of the exponent (minus instead of plus).
To get the inverse QFT circuit:
Equivalently: SWAPs first, then process qubits in reverse order (\(n\) to 1), with each rotation using \(-2\pi/2^k\) instead of \(+2\pi/2^k\).
In quantum algorithms, the QFT is typically used in its inverse form. The forward QFT encodes information into phases (done implicitly by the algorithm structure). The inverse QFT decodes those phases back into computational basis states that can be measured. Think of it as: the algorithm creates a state that "looks like" \(\text{QFT}\ket{\text{answer}}\), and then \(\text{QFT}^\dagger\) recovers \(\ket{\text{answer}}\).
We showed earlier that QFT\(\ket{5}\) produces 8 terms with phases \(\omega_8^0, \omega_8^5, \omega_8^2, \omega_8^7, \omega_8^4, \omega_8^1, \omega_8^6, \omega_8^3\). All terms equally likely — the number 5 is invisible to measurement.
Now apply QFT\(^\dagger\). It multiplies each \(\ket{y}\) by \(\omega_8^{-5y}\) (negated phases) and sums. For the amplitude of output \(\ket{5}\):
$$\text{amplitude of } \ket{5} = \frac{1}{8} \sum_{y=0}^{7} \omega_8^{5y} \cdot \omega_8^{-5y} = \frac{1}{8} \sum_{y=0}^{7} 1 = 1$$
For any other output \(\ket{m}\) where \(m \neq 5\):
$$\text{amplitude of } \ket{m} = \frac{1}{8} \sum_{y=0}^{7} \omega_8^{5y} \cdot \omega_8^{-my} = \frac{1}{8} \sum_{y=0}^{7} \omega_8^{(5-m)y} = 0$$
The sum is zero because the 8th roots of unity cancel (geometric series with \(r \neq 1\)). All amplitude concentrates on \(\ket{5}\). The inverse QFT converts 8 equally-probable-but-differently-phased terms into one certain outcome.
The Hadamard gate \(H\) is a Fourier transform over 2 elements. It can distinguish two phases: \(+1\) and \(-1\) (the two points where the unit circle intersects the real axis). If a qubit has phase \(+1\) on \(\ket{1}\), \(H\) maps it to \(\ket{0}\). If it has phase \(-1\), \(H\) maps it to \(\ket{1}\). Two possible inputs, two distinguishable outputs.
The QFT on \(n\) qubits is a Fourier transform over \(N = 2^n\) elements. It can distinguish \(N\) different phases: the \(N\)-th roots of unity, equally spaced around the full unit circle. Where \(H\) could only tell "+1 or -1?", the QFT can tell "\(\omega^0\) or \(\omega^1\) or \(\omega^2\) or \(\ldots\) or \(\omega^{N-1}\)?"
This is why QPE replaces the second \(H^{\otimes n}\) with \(\text{QFT}^\dagger\). The eigenvalue phases in QPE aren't limited to \(\pm 1\) — they can be any angle. You need the full QFT to decode them.
Before diving into the QPE circuit details, here's the high-level story of what happens and why the QFT is needed:
QPE's controlled-\(U\) gates naturally produce a state that looks like \(\text{QFT}\ket{\theta}\). Applying \(\text{QFT}^{-1}\) undoes that encoding and recovers \(\ket{\theta}\), which you can then measure.
In more detail, the pipeline has four stages:
Note that the forward QFT is never explicitly applied in QPE. The controlled-\(U\) gates implicitly create the QFT-encoded state through phase kickback. Only QFT\(^{-1}\) is applied as an actual circuit operation.
QPE is arguably the most important subroutine in quantum computing. It underlies Shor's factoring algorithm, quantum simulation, and many other applications.
Given: A unitary operator \(U\) and one of its eigenstates \(\ket{\psi}\) satisfying \(U\ket{\psi} = e^{2\pi i\theta}\ket{\psi}\).
Find: The eigenvalue phase \(\theta \in [0, 1)\).
We have access to controlled-\(U^{2^j}\) operations (controlled versions of \(U\) raised to powers of 2), and we use \(t\) "counting qubits" to store the estimate.
The circuit has two registers:
|0⟩ ─── H ────────────────── ● ────────────── QFT† ─── Measure
|0⟩ ─── H ──────────── ● ──── │ ────────────── QFT† ─── Measure
|0⟩ ─── H ────── ● ──── │ ──── │ ────────────── QFT† ─── Measure
│ │ │
|ψ⟩ ──────────── U¹ ── U² ── U⁴ ──────────────────────────────
The steps are:
We'll trace both the general formula and a concrete example (\(U = T\)-gate, \(\ket{\psi} = \ket{1}\), \(t = 3\) counting qubits, \(\theta = 1/8\)) side by side.
Step 1: Hadamard on counting register
$$\ket{0}^{\otimes t}\ket{\psi} \xrightarrow{H^{\otimes t}} \frac{1}{\sqrt{2^t}} \sum_{k=0}^{2^t-1} \ket{k} \ket{\psi}$$
T-gate example: \(\ket{000}\ket{1} \xrightarrow{H^{\otimes 3}} \frac{1}{\sqrt{8}}(\ket{0} + \ket{1} + \cdots + \ket{7})\ket{1}\). All 8 counting states in equal superposition, eigenstate \(\ket{1}\) sits untouched.
Step 2: Controlled-\(U\) operations
The controlled-\(U^{2^j}\) gate, controlled by counting qubit \(j\), acts as:
$$\ket{0}_j\ket{\psi} \to \ket{0}_j\ket{\psi} \qquad \ket{1}_j\ket{\psi} \to \ket{1}_j U^{2^j}\ket{\psi} = e^{2\pi i \theta \cdot 2^j} \ket{1}_j\ket{\psi}$$
Because \(\ket{\psi}\) is an eigenstate, \(U^{2^j}\ket{\psi} = (e^{2\pi i\theta})^{2^j}\ket{\psi} = e^{2\pi i \theta \cdot 2^j}\ket{\psi}\). The eigenstate is unchanged — all the information goes into the phase of the counting qubit. This is phase kickback again.
The T-gate has \(T\ket{1} = e^{i\pi/4}\ket{1} = e^{2\pi i/8}\ket{1}\), so \(\theta = 1/8\).
Each counting qubit starts in \(\frac{1}{\sqrt{2}}(\ket{0} + \ket{1})\). After the controlled-\(U^{2^j}\) with \(\ket{1}\) as eigenstate:
| Qubit \(j\) | Gate applied | Phase kicked back | Qubit state after |
|---|---|---|---|
| \(j = 0\) (bottom) | controlled-\(T^1\) | \(e^{2\pi i \cdot 1/8} = e^{i\pi/4}\) | \(\frac{1}{\sqrt{2}}(\ket{0} + e^{i\pi/4}\ket{1})\) |
| \(j = 1\) (middle) | controlled-\(T^2 = S\) | \(e^{2\pi i \cdot 2/8} = e^{i\pi/2} = i\) | \(\frac{1}{\sqrt{2}}(\ket{0} + i\ket{1})\) |
| \(j = 2\) (top) | controlled-\(T^4 = Z\) | \(e^{2\pi i \cdot 4/8} = e^{i\pi} = -1\) | \(\frac{1}{\sqrt{2}}(\ket{0} - \ket{1}) = \ket{-}\) |
Notice: \(T^2 = S\) and \(T^4 = Z\). The higher counting qubits apply bigger powers of \(U\), kicking back bigger multiples of \(\theta\).
After all \(t\) controlled operations, counting qubit \(j\) has picked up phase \(e^{2\pi i \theta \cdot 2^j}\) on its \(\ket{1}\) component. The combined state is:
$$\frac{1}{\sqrt{2^t}} \sum_{k=0}^{2^t-1} e^{2\pi i \theta k} \ket{k} \ket{\psi}$$
T-gate example: The counting register is \(\frac{1}{\sqrt{8}} \sum_{k=0}^{7} e^{2\pi i k/8} \ket{k}\). Write out: \(\frac{1}{\sqrt{8}}(\ket{0} + e^{i\pi/4}\ket{1} + e^{i\pi/2}\ket{2} + e^{i3\pi/4}\ket{3} + e^{i\pi}\ket{4} + e^{i5\pi/4}\ket{5} + e^{i3\pi/2}\ket{6} + e^{i7\pi/4}\ket{7})\). Every state equally likely to be measured — the answer is hidden in the phases.
The counting register is now in the state \(\frac{1}{\sqrt{2^t}} \sum_k e^{2\pi i \theta k} \ket{k}\). Compare with the QFT definition: \(\text{QFT}\ket{x} = \frac{1}{\sqrt{2^t}} \sum_k e^{2\pi i xk/2^t} \ket{k}\). These match when \(\theta = x/2^t\), i.e., when \(x = 2^t\theta\). So the counting register is in the state \(\text{QFT}\ket{2^t\theta}\)!
T-gate: \(2^t\theta = 8 \times 1/8 = 1\). The counting register is QFT\(\ket{1}\) = QFT\(\ket{001}\).
Step 3: Apply QFT\(^\dagger\) to the counting register
$$\text{QFT}^\dagger \left(\text{QFT}\ket{2^t\theta}\right) = \ket{2^t\theta}$$
The inverse QFT undoes the QFT, leaving us with the state \(\ket{2^t\theta}\) in the counting register.
T-gate: QFT\(^\dagger\)(QFT\(\ket{001}\)) = \(\ket{001}\). All 8 phases interfere — 7 cancel (destructive), 1 survives (constructive). The state collapses from "8 states with different phases" to "1 state with certainty."
Step 4: Measure the counting register
We measure and obtain the integer \(m = 2^t\theta\). Therefore:
$$\theta = \frac{m}{2^t}$$
T-gate: Measure \(\ket{001}\) → \(m = 1\). So \(\theta = 1/8\), and the eigenvalue is \(e^{2\pi i/8} = e^{i\pi/4}\). Done — we recovered the T-gate's eigenvalue phase with certainty.
If \(\theta\) can be written exactly as \(\theta = m/2^t\) for some integer \(m\), then the measurement gives \(m\) with probability 1. The QFT perfectly encodes and the QFT\(^\dagger\) perfectly decodes.
In general, \(2^t\theta\) may not be an integer. Then the state after QFT\(^\dagger\) is a superposition peaked around the nearest integer(s) to \(2^t\theta\):
Each additional counting qubit doubles the precision. With \(t\) counting qubits, you can resolve \(\theta\) to accuracy \(1/2^t\). But the circuit depth grows too: you need controlled-\(U^{2^{t-1}}\), which may require \(2^{t-1}\) applications of \(U\). So the total number of \(U\) applications is \(\sum_{j=0}^{t-1} 2^j = 2^t - 1\).
Let's work through a concrete example. Take \(U = T\) (the \(\pi/8\) gate) and \(\ket{\psi} = \ket{1}\).
$$T = \begin{pmatrix} 1 & 0 \\ 0 & e^{i\pi/4} \end{pmatrix}$$
The eigenvalues of \(T\): \(\ket{0}\) has eigenvalue 1 (i.e., \(\theta_0 = 0\)) and \(\ket{1}\) has eigenvalue \(e^{i\pi/4} = e^{2\pi i/8}\) (i.e., \(\theta_1 = 1/8\)).
Using \(t = 3\) counting qubits and \(\ket{\psi} = \ket{1}\):
\(\theta = 1/8 = 1/2^3\), which is exactly representable with 3 bits. With fewer counting qubits, we'd get a less precise estimate. With 2 qubits: \(2^2 \cdot 1/8 = 0.5\) — not an integer, so we'd get a probabilistic result (measuring either \(\ket{00}\) or \(\ket{01}\) with varying probabilities). With 1 qubit: \(2^1 \cdot 1/8 = 0.25\) — even worse.
At first glance, QPE seems circular: "we need to know the eigenstate \(\ket{\psi}\) and be able to implement controlled-\(U\), so don't we already know everything?" Not quite.
This is a common misconception. You don't need to know \(\theta\) to build controlled-\(U\). You need to be able to implement \(U\) as a circuit, which is a completely different thing. Examples:
Shor's algorithm for factoring \(N\):
QPE finds \(r\) by estimating the eigenvalues of the unitary \(U: \ket{y} \mapsto \ket{ay \bmod N}\). The eigenvalues are \(e^{2\pi i s/r}\) for \(s = 0, 1, \ldots, r-1\), so measuring gives an estimate of \(s/r\), from which \(r\) can be extracted using continued fractions.
The structure of Deutsch-Jozsa and Bernstein-Vazirani was:
\(H^{\otimes n} \;\to\; \text{Oracle} \;\to\; H^{\otimes n} \;\to\; \text{Measure}\)
QPE generalizes this to:
\(H^{\otimes t} \;\to\; \text{Controlled-}U^{2^j} \;\to\; \text{QFT}^\dagger \;\to\; \text{Measure}\)
The Hadamard gate \(H\) was the "baby QFT" — it's \(F_2\), the Fourier transform over two elements. In DJ/BV, the second \(H^{\otimes n}\) acted as the decoder. Now we use the full \(\text{QFT}^\dagger\) as the decoder. Same fundamental principle: encode information in phases, then decode with the (inverse) Fourier transform.
The progression of algorithms in this course follows a clear arc:
| Algorithm | Encoder | Phase Source | Decoder |
|---|---|---|---|
| Deutsch-Jozsa | \(H^{\otimes n}\) | Phase oracle \((-1)^{f(x)}\) | \(H^{\otimes n}\) |
| Bernstein-Vazirani | \(H^{\otimes n}\) | Phase oracle \((-1)^{s \cdot x}\) | \(H^{\otimes n}\) |
| Simon's | \(H^{\otimes n}\) | Boolean oracle + measurement | \(H^{\otimes n}\) |
| QPE / Shor's | \(H^{\otimes t}\) | Controlled-\(U^{2^j}\) (eigenvalue phases) | \(\text{QFT}^\dagger\) |
The key evolution: the first three algorithms use \(H^{\otimes n}\) as both encoder and decoder (because \(H = H^\dagger\), the Hadamard is its own inverse). QPE upgrades the decoder to the full inverse QFT, which can handle the richer phase structure of eigenvalue problems.