Complexity & Big-O

Computer science is, more than anything else, the science of cost. Big-O is the language we use to talk about that cost without getting tangled in hardware, language, or compiler details. This page builds the vocabulary from scratch, runs the proofs you need, and shows you how to read the cost off real code.

Prereq: Discrete math, basic programming Read time: ~30 min Interactive figures: 1 Code: Python

What you'll be able to do. By the end you'll glance at a snippet, identify the dominant operation, and write down a tight bound on its running time. You'll also know when that bound matters and when it lies.

1. Why complexity

Two algorithms can solve the same problem and one of them can be a million times faster on a million-element input. Not 10% faster. Not 2x. Six orders of magnitude. That gap doesn't come from clever assembly or a faster CPU; it comes from the shape of the algorithm — how the work it does grows as the input grows.

The central question of computer science, in one sentence, is:

How much does it cost to solve this problem, as a function of the size of the input?

Everything else — data structures, parallelism, caching, theory of computation — is a refinement of that question. Cost can mean time, memory, energy, network bandwidth, or any other resource. But the structure of the answer is always the same: a function of input size.

You care about this for two reasons. First, because the difference between $O(n)$ and $O(n^2)$ is the difference between "scales to billions of users" and "fell over at ten thousand." Second, because once you have the vocabulary, you can compare algorithms without running them. You don't need a benchmark to know that bubble sort will lose to merge sort on large inputs. You need an analysis.

The trap most people fall into. "My laptop is fast, I'll just measure it." Measuring works for small problems. It collapses the moment you change inputs, hardware, language, or scale. Asymptotic analysis gives you a guarantee that survives all of those.

2. The rules of the game

Before we can count operations, we need to agree on what counts as one. The model of computation pins this down. The default in algorithms textbooks is the RAM model (Random Access Machine):

Memory is an array of cells. Each cell holds an integer.
Reading a cell, writing a cell, adding two cells, comparing two cells — each costs 1.
Accessing memory by index is constant-time, no matter where in the array.
You don't pay for the size of the integers, as long as they fit in a "word."

The refinement that bounds the integer size is the word-RAM model: assume cells hold $w$ bits, where $w \ge \log_2 n$ (so a pointer to anywhere in the input fits in one cell). On a 64-bit machine, $w = 64$ and $n$ has to be enormous before this bites.

Here's the key thing this model gets right: we're not measuring wall-clock time on a specific machine. We're counting elementary operations. An elementary operation might take one nanosecond on your laptop and ten on a Raspberry Pi, but the count is the same. If algorithm A does $3n^2$ operations and algorithm B does $100 n$ operations, B wins for all $n > 34$, on every machine, in every language. That's what's portable.

And what counts as "the input"? The size of the input, written $n$, is the number of elementary items needed to describe it:

For an array, $n$ is the length.
For a graph, $n$ usually means the number of vertices and $m$ the number of edges (you'll see $|V|$ and $|E|$).
For a number $k$, $n$ is the number of bits needed to write $k$ down — so $n = \lceil \log_2 k \rceil$. This trips people up: a "small" number can have a "large" bit-length and vice versa.
For a matrix, $n$ might be the side length, with the input size being $n^2$ entries.

Why we ignore constants. A "constant factor" in front of the running time depends on the machine, the compiler, the language, the cache, and the phase of the moon. Asymptotic notation throws those away on purpose so the analysis stays portable. The price you pay is that asymptotic analysis is silent about small inputs — and you'll see in section 15 why that sometimes matters.

3. Big-O, formally

Now the definition. Read it slowly; it's the most important formula on the page.

f(n) = O(g(n)) \iff \exists\, c > 0,\, n_0 \in \mathbb{N} \text{ such that } 0 \le f(n) \le c\, g(n) \text{ for all } n \ge n_0.

Big-O

$f(n)$: The function you actually care about — usually the number of operations your algorithm performs on an input of size $n$.
$g(n)$: The simpler "reference" function you're comparing against — like $n$, $n^2$, $n \log n$, $2^n$.
$O(g(n))$: Pronounced "big-oh of g of n." A set of functions: all the $f$ that are eventually bounded above by some constant times $g$.
$\exists$: "There exists." Read as "you can find at least one."
$c$: A positive real constant. The constant absorbs everything machine-specific.
$n_0$: A threshold — a natural number after which the bound has to hold. The bound is allowed to fail for small $n$; we only care about what happens eventually.
$\iff$: "If and only if." The two sides mean exactly the same thing.

In English. "$f$ is $O(g)$" means: ignoring constant factors and small inputs, $f$ grows no faster than $g$. Big-O is an upper bound. It does not say $f$ is the same speed as $g$; it says $f$ is not asymptotically worse.

An immediate consequence: if $f(n) = 5n + 7$, then $f(n) = O(n)$, because once $n \ge 7$ you have $5n + 7 \le 5n + n = 6n$, so pick $c = 6$ and $n_0 = 7$. But it is also true that $f(n) = O(n^2)$ and $f(n) = O(2^n)$ — Big-O is an upper bound, and any sufficiently big function is an upper bound on a smaller one. People who say "my algorithm is $O(n)$" usually mean "it's tight at $n$" but that's not what the notation says.

Notation abuse alert. Everyone, including textbooks, writes $f(n) = O(g(n))$ even though Big-O is a set, so the honest version would be $f(n) \in O(g(n))$. The "$=$" is a one-way relation: you can write $5n + 7 = O(n)$ but not $O(n) = 5n + 7$. Live with it.

Two quick proofs to make this concrete.

Claim 1: $7 n^2 + 100 n + 4000 = O(n^2)$.

Proof. For $n \ge 1$, we have $n^2 \ge n$ and $n^2 \ge 1$, so

7 n^2 + 100 n + 4000 \;\le\; 7 n^2 + 100 n^2 + 4000 n^2 = 4107\, n^2.

Bounding a polynomial by its leading term

$7 n^2 + 100 n + 4000$: The original function — a quadratic with three terms.
$\le$: "Less than or equal to." We're bounding the original function by something simpler.
$4107\, n^2$: An upper bound that holds for all $n \ge 1$. The constant 4107 is ugly but legal — Big-O doesn't care.

Why this matters. This is the everyday move: "the lower-order terms are dominated by the highest-order term times some constant." Once you internalize it, you can read polynomial complexity off a snippet at a glance.

So $c = 4107$ and $n_0 = 1$ work. (You could be tighter; you don't need to be.)

Claim 2: $n^3 \ne O(n^2)$.

Proof (by contradiction). Suppose there were $c, n_0$ with $n^3 \le c\, n^2$ for all $n \ge n_0$. Divide both sides by $n^2$ (positive for $n \ge 1$) to get $n \le c$ for all $n \ge n_0$. But $n$ grows without bound, so no constant $c$ works. Contradiction. This is the typical "no constant can save you" argument.

4. Big-Theta and Big-Omega

Big-O alone isn't enough. It's an upper bound, so $5n = O(n^2)$ is technically correct but pedagogically gross — you've thrown away information. The richer vocabulary:

f(n) = \Omega(g(n)) \iff \exists\, c > 0,\, n_0 \text{ such that } f(n) \ge c\, g(n) \text{ for all } n \ge n_0.

Big-Omega — lower bound

$\Omega(g(n))$: Pronounced "big-omega of g of n." The set of functions that grow at least as fast as $g$, up to constants.
$f(n) \ge c\, g(n)$: Lower bound: for big enough $n$, $f$ is at least a constant fraction of $g$.
$c$: Positive constant, same role as in Big-O but pulling from below.
$n_0$: Threshold past which the bound holds.

The mirror. Big-O says "no worse than $g$." Big-Omega says "no better than $g$." Together they sandwich the function from above and below.

f(n) = \Theta(g(n)) \iff f(n) = O(g(n)) \text{ and } f(n) = \Omega(g(n)).

Big-Theta — tight bound

$\Theta(g(n))$: Pronounced "big-theta of g of n." The set of functions that grow at the same rate as $g$, within a constant factor.
$O(g(n))$: Upper bound from section 3.
$\Omega(g(n))$: Lower bound from above.

Why this is the one you usually want. When you say "merge sort is $\Theta(n \log n)$," you're saying merge sort takes $n \log n$ time up to constants — both above and below. You can't speed it up by more than a constant factor and you can't slow it down by more than a constant factor either. That's a real, falsifiable, useful claim.

An equivalent formulation that's sometimes easier to verify: $f(n) = \Theta(g(n))$ if there exist $c_1, c_2 > 0$ and $n_0$ such that

c_1\, g(n) \le f(n) \le c_2\, g(n) \quad \text{for all } n \ge n_0.

Sandwich form of Big-Theta

$c_1, c_2$: Two positive constants. $c_1$ is the lower-bound constant, $c_2$ the upper-bound constant.
$g(n)$: The reference growth rate.
$f(n)$: The function you're bounding from both sides.
$n_0$: Same threshold past which both inequalities hold.

Picture. Plot $c_1 g(n)$ and $c_2 g(n)$ as two curves. From $n_0$ onward, $f(n)$ stays trapped between them. That's "the same shape, give or take a scaling factor."

Two examples that should feel obvious by the end of the page.

$3 n^2 + 7 n - 5 = \Theta(n^2)$. Pick $c_1 = 1, c_2 = 4$, $n_0 = 10$.
$\log_2 n + \log_2 \log_2 n = \Theta(\log n)$. The doubly-log term is dwarfed by the single log.

5. Little-o and little-omega

Two more pieces of vocabulary that show up less often but matter when you read the theory papers. They're strict versions of $O$ and $\Omega$.

f(n) = o(g(n)) \iff \lim_{n \to \infty} \frac{f(n)}{g(n)} = 0.

Little-o — strict upper bound

$o(g(n))$: Pronounced "little-oh of g of n." A function $f$ is $o(g)$ if $g$ strictly dominates $f$ as $n$ grows — not just up to constants.
$\lim_{n \to \infty}$: "Limit as $n$ goes to infinity." See calculus for the formal definition.
$f(n) / g(n)$: The ratio. If it eventually becomes negligible, $f$ is little-o of $g$.

The difference from Big-O. $5n = O(n)$ but $5n \ne o(n)$, because the ratio is the constant $5$, which doesn't go to zero. By contrast $n = o(n^2)$, because $n / n^2 = 1/n \to 0$.

f(n) = \omega(g(n)) \iff \lim_{n \to \infty} \frac{f(n)}{g(n)} = \infty.

Little-omega — strict lower bound

$\omega(g(n))$: Pronounced "little-omega of g of n." $f$ grows strictly faster than $g$.
$f(n) / g(n) \to \infty$: The ratio blows up. $f$ outruns $g$ by more than any constant factor.

Example. $n^2 = \omega(n \log n)$, because $n^2 / (n \log n) = n / \log n \to \infty$. So $n^2$ is strictly worse than $n \log n$.

The full hierarchy, in one line of mental shorthand:

$o \subset O$, $\omega \subset \Omega$, $\Theta = O \cap \Omega$, and $f = \Theta(g) \iff f = O(g) \text{ and } g = O(f)$.

6. Growth-rate plotter

The fastest way to internalize the hierarchy of growth rates is to see them plotted on the same axes. Toggle the curves below and slide $n$ to find out where each function takes over.

n max: 30

show:

Curves are clipped at the top of the plot. Slide n to see how quickly the worse classes leave the chart entirely.

Here's the lesson buried in that plot: the order of the curves doesn't change. $O(\log n)$ stays below $O(n)$ stays below $O(n \log n)$ stays below $O(n^2)$. Always. Once you cross the crossover point, the higher-order term dominates forever and the gap only widens. That's why we say "$n^2$ is asymptotically worse than $n \log n$" without needing to specify how big $n$ is.

7. Common growth classes

Here are the classes you'll meet over and over, with examples of algorithms that hit them.

Class	Name	Real-world example
$O(1)$	constant	Hash table lookup (amortized). Array index. Pushing to a stack.
$O(\log n)$	logarithmic	Binary search. Insert into a balanced BST. Find an element in a heap of fixed-rank ancestors.
$O(\sqrt{n})$	root	Trial division primality. Block decomposition (sqrt-decomposition).
$O(n)$	linear	Linear scan. Sum of an array. Reading a file once. KMP string match.
$O(n \log n)$	linearithmic	Merge sort, heap sort, quicksort (average). FFT.
$O(n^2)$	quadratic	Bubble sort. Insertion sort (worst case). All-pairs shortest path on a small graph. Naïve attention.
$O(n^3)$	cubic	Naïve matrix multiplication. Floyd–Warshall.
$O(n^k)$	polynomial	Generally "tractable" — the class P. The exponent matters, but it's a bounded constant.
$O(2^n)$	exponential	Brute-force SAT. Subset sum by enumeration. Traveling salesman by exhaustive search.
$O(n!)$	factorial	All permutations. Brute-force TSP that tries every order. Ranking $n$ items by checking all rankings.

Some intuition for why factorial blows up so fast: $10! = 3{,}628{,}800$ but $20! \approx 2.4 \times 10^{18}$. Going from $n = 10$ to $n = 20$ multiplied the work by a factor of $670$ billion. That's why "just enumerate all the permutations" is almost never the answer past $n \approx 12$.

The cliff between polynomial and exponential. P versus EXP is the most important boundary in the table. Polynomial-time algorithms scale; exponential-time ones don't, full stop. A $1.0001^n$ algorithm is asymptotically worse than $n^{100}$, but in practice both are intractable past tiny inputs. The boundary you actually live near is "polynomial with small exponent" vs everything else.

8. Why log n appears

$\log n$ shows up in complexity analysis with strange regularity. Why? Because every time you can throw away a constant fraction of your input in one step, you get a log.

Binary search. You have a sorted array of $n$ elements. Compare to the middle. Either you found it, or you recurse on a half — half is $n/2$ elements. After $k$ steps you have $n / 2^k$ left. Stop when that's 1: $n / 2^k = 1 \Rightarrow k = \log_2 n$. Total comparisons: $\log_2 n$.

Balanced BST. Same trick. The tree has depth $\log n$ because each level holds twice as many nodes as the one above. Walk from root to leaf and you've done $\log n$ comparisons.

Heap. Sift up or sift down — same depth, same logic.

And here's the formal reason base doesn't matter:

\log_a n = \frac{\log_b n}{\log_b a}.

Change of logarithm base

$\log_a n$: Logarithm of $n$ with base $a$.
$\log_b n$: Logarithm of $n$ with base $b$.
$\log_b a$: A constant (depending on $a$ and $b$, not on $n$).

Why this matters for Big-O. Changing the log base only multiplies by a constant. Big-O eats constants. So $O(\log_2 n) = O(\log_{10} n) = O(\ln n) = O(\log n)$ — they're all the same set. Most authors drop the base entirely.

The pattern: divide and conquer. Anytime you cut the work by a constant fraction at each step, the depth of the recursion is logarithmic. We'll see this again in the master theorem.

9. Computing complexity of code

Now the practical part. Given some code, what's its complexity? The recipe is short:

Identify the dominant operation — the one that runs the most.
Count how many times it runs as a function of $n$.
Drop constants and lower-order terms.

Example 1: a single loop.

O(n)

def total(xs):
    s = 0
    for x in xs:
        s += x
    return s

The body of the loop runs once per element. $n$ additions, each constant time. Total: $\Theta(n)$.

Example 2: nested loops.

O(n²)

def all_pairs(xs):
    for i in range(len(xs)):
        for j in range(len(xs)):
            do_something(xs[i], xs[j])

Outer loop: $n$ iterations. Inner loop: $n$ iterations per outer iteration. Body: constant. Total: $n \cdot n = \Theta(n^2)$.

Example 3: triangular nested loops.

O(n²)

def upper_pairs(xs):
    n = len(xs)
    for i in range(n):
        for j in range(i + 1, n):
            do_something(xs[i], xs[j])

The inner loop runs $n - i - 1$ times. Total iterations: $\sum_{i=0}^{n-1} (n - i - 1) = \binom{n}{2} = \frac{n(n-1)}{2}$. That's still $\Theta(n^2)$, just with a constant of $1/2$ instead of $1$. Big-O eats the constant; the answer is the same.

Example 4: a loop that doubles.

O(log n)

def log_loop(n):
    i = 1
    while i < n:
        do_something(i)
        i *= 2

$i$ takes the values $1, 2, 4, 8, \dots, 2^k$ where $2^k \ge n$. So $k = \lceil \log_2 n \rceil$ iterations. Total: $\Theta(\log n)$.

Example 5: nested log and linear.

O(n log n)

def n_log_n(xs):
    for x in xs:
        i = 1
        while i < len(xs):
            do_something(x, i)
            i *= 2

Outer: $n$ iterations. Inner: $\log n$ iterations. Total: $n \cdot \log n = \Theta(n \log n)$.

The general rule. When loops are nested independently, multiply their bounds. When code blocks run in sequence, sum their bounds (and the sum collapses to the largest term). When you call a function, substitute its complexity in. Recursion is the only place where this gets tricky — see the next section.

Source — Complexity Examples

-- O(1): constant time, no loop
function first(array):
    return array[0]

-- O(log n): halve the search space each step
function binary_search(array, target):
    lo = 0, hi = len(array) - 1
    while lo <= hi:
        mid = (lo + hi) / 2
        if array[mid] == target: return mid
        if array[mid] < target: lo = mid + 1
        else: hi = mid - 1
    return -1

-- O(n): visit every element once
function linear_sum(array):
    total = 0
    for each x in array:
        total += x
    return total

-- O(n log n): outer n, inner log n
function n_log_n_example(array):
    for each x in array:           -- n times
        i = 1
        while i < len(array):      -- log n times
            do_work(x, i)
            i = i * 2

-- O(n^2): nested loops, each 1..n
function all_pairs(array):
    for i in 0..len(array):
        for j in 0..len(array):
            do_work(array[i], array[j])

from math import log2

def o_1_first(xs):
    """O(1) — constant regardless of input size."""
    return xs[0]

def o_logn_binary_search(xs, target):
    """O(log n) — halves the search space each iteration."""
    lo, hi = 0, len(xs) - 1
    while lo <= hi:
        mid = (lo + hi) // 2
        if xs[mid] == target:
            return mid
        elif xs[mid] < target:
            lo = mid + 1
        else:
            hi = mid - 1
    return -1

def o_n_linear_sum(xs):
    """O(n) — one pass over all elements."""
    total = 0
    for x in xs:
        total += x
    return total

def o_n2_all_pairs(xs):
    """O(n^2) — every pair of elements."""
    count = 0
    for i in range(len(xs)):
        for j in range(i + 1, len(xs)):
            if xs[i] < xs[j]:
                count += 1
    return count

def o_nlogn_example(xs):
    """O(n log n) — outer linear, inner logarithmic."""
    result = 0
    for x in xs:           # n iterations
        i = 1
        while i < len(xs):  # log n iterations
            result += x * i
            i *= 2
    return result

// O(1) — array index lookup
function first(xs) {
  return xs[0];
}

// O(log n) — binary search on sorted array
function binarySearch(xs, target) {
  let lo = 0, hi = xs.length - 1;
  while (lo <= hi) {
    const mid = (lo + hi) >> 1;
    if (xs[mid] === target) return mid;
    if (xs[mid] < target) lo = mid + 1;
    else hi = mid - 1;
  }
  return -1;
}

// O(n) — single pass
function linearSum(xs) {
  return xs.reduce((s, x) => s + x, 0);
}

// O(n^2) — all pairs
function allPairs(xs) {
  let count = 0;
  for (let i = 0; i < xs.length; i++)
    for (let j = i + 1; j < xs.length; j++)
      if (xs[i] < xs[j]) count++;
  return count;
}

// O(n log n) — outer n, inner log n
function nLogN(xs) {
  let result = 0;
  for (const x of xs) {         // n iterations
    for (let i = 1; i < xs.length; i *= 2)  // log n
      result += x * i;
  }
  return result;
}

#include <stddef.h>

/* O(1) */
int first(int *xs) { return xs[0]; }

/* O(log n) — binary search */
int binary_search(int *xs, size_t n, int target) {
    size_t lo = 0, hi = n - 1;
    while (lo <= hi) {
        size_t mid = lo + (hi - lo) / 2;
        if (xs[mid] == target) return (int)mid;
        if (xs[mid] < target) lo = mid + 1;
        else if (mid == 0) break;
        else hi = mid - 1;
    }
    return -1;
}

/* O(n) — linear scan */
long linear_sum(int *xs, size_t n) {
    long total = 0;
    for (size_t i = 0; i < n; i++) total += xs[i];
    return total;
}

/* O(n^2) — all pairs */
long all_pairs(int *xs, size_t n) {
    long count = 0;
    for (size_t i = 0; i < n; i++)
        for (size_t j = i + 1; j < n; j++)
            if (xs[i] < xs[j]) count++;
    return count;
}

/* O(n log n) — outer n, inner log n */
long n_log_n(int *xs, size_t n) {
    long result = 0;
    for (size_t i = 0; i < n; i++)          /* n */
        for (size_t j = 1; j < n; j *= 2)   /* log n */
            result += xs[i] * (long)j;
    return result;
}

#include <vector>
#include <algorithm>
using std::vector;

// O(1)
int first(const vector<int>& v) { return v[0]; }

// O(log n) — use STL lower_bound
int binarySearch(const vector<int>& v, int target) {
    auto it = std::lower_bound(v.begin(), v.end(), target);
    if (it != v.end() && *it == target)
        return (int)(it - v.begin());
    return -1;
}

// O(n)
long linearSum(const vector<int>& v) {
    long s = 0;
    for (int x : v) s += x;
    return s;
}

// O(n^2)
long allPairs(const vector<int>& v) {
    long count = 0;
    for (size_t i = 0; i < v.size(); i++)
        for (size_t j = i + 1; j < v.size(); j++)
            if (v[i] < v[j]) count++;
    return count;
}

// O(n log n) — outer n, inner log n
long nLogN(const vector<int>& v) {
    long result = 0;
    for (int x : v)
        for (size_t j = 1; j < v.size(); j *= 2)
            result += x * (long)j;
    return result;
}

import java.util.Arrays;

public class ComplexityExamples {

    // O(1)
    static int first(int[] xs) { return xs[0]; }

    // O(log n) — Arrays.binarySearch is O(log n)
    static int binarySearch(int[] xs, int target) {
        return Arrays.binarySearch(xs, target);
    }

    // O(n)
    static long linearSum(int[] xs) {
        long total = 0;
        for (int x : xs) total += x;
        return total;
    }

    // O(n^2)
    static long allPairs(int[] xs) {
        long count = 0;
        for (int i = 0; i < xs.length; i++)
            for (int j = i + 1; j < xs.length; j++)
                if (xs[i] < xs[j]) count++;
        return count;
    }

    // O(n log n) — outer n, inner log n
    static long nLogN(int[] xs) {
        long result = 0;
        for (int x : xs)
            for (int j = 1; j < xs.length; j *= 2)
                result += (long) x * j;
        return result;
    }
}

package main

// O(1)
func first(xs []int) int { return xs[0] }

// O(log n) — binary search
func binarySearch(xs []int, target int) int {
	lo, hi := 0, len(xs)-1
	for lo <= hi {
		mid := lo + (hi-lo)/2
		switch {
		case xs[mid] == target:
			return mid
		case xs[mid] < target:
			lo = mid + 1
		default:
			hi = mid - 1
		}
	}
	return -1
}

// O(n)
func linearSum(xs []int) int {
	total := 0
	for _, x := range xs {
		total += x
	}
	return total
}

// O(n^2)
func allPairs(xs []int) int {
	count := 0
	for i := range xs {
		for j := i + 1; j < len(xs); j++ {
			if xs[i] < xs[j] {
				count++
			}
		}
	}
	return count
}

// O(n log n) — outer n, inner log n
func nLogN(xs []int) int {
	result := 0
	for _, x := range xs {
		for j := 1; j < len(xs); j *= 2 {
			result += x * j
		}
	}
	return result
}

10. Recurrence relations

Recursive algorithms get analyzed via recurrences — equations that express $T(n)$, the time to solve a problem of size $n$, in terms of the time to solve smaller problems. Three tools to crack them.

10.1 Iteration / unrolling

Take merge sort. It splits the input in half, recurses on each half, then merges in linear time:

T(n) = 2\, T(n/2) + n.

Merge sort recurrence

$T(n)$: The number of operations merge sort takes on an input of size $n$.
$2\, T(n/2)$: Two recursive calls, each on a half-sized input.
$n$: The merge step — linear in the input size.

How to unroll. Substitute the recurrence into itself. $T(n) = 2 T(n/2) + n = 2(2 T(n/4) + n/2) + n = 4 T(n/4) + 2n$. Keep going. After $k$ steps you have $T(n) = 2^k T(n/2^k) + k n$. Stop when $n / 2^k = 1$, so $k = \log_2 n$. Then $T(n) = n T(1) + n \log_2 n = \Theta(n \log n)$.

10.2 Recursion tree

Same idea, drawn as a picture. Draw a tree with the root labeled $n$ (the work at the top level), two children each labeled $n/2$, four grandchildren each labeled $n/4$, etc. At depth $k$ there are $2^k$ nodes each doing $n/2^k$ work, so each level does total work $n$. The depth is $\log_2 n$. Sum over all levels: $n \log_2 n = \Theta(n \log n)$. Same answer.

Recursion trees are especially useful when the work per level isn't constant — like $T(n) = T(n/2) + T(n/3) + n$, where the per-level work is geometrically decreasing.

10.3 Master theorem

For recurrences of a specific common shape, there's a recipe. Here's the formal statement.

T(n) = a\, T(n/b) + f(n), \quad a \ge 1,\, b > 1.

Master theorem setup

$T(n)$: The running time of the algorithm on input of size $n$.
$a$: The branching factor — how many subproblems each call creates.
$b$: The shrinkage factor — each subproblem has size $n/b$.
$f(n)$: The "combine" cost at this level — the work done outside the recursive calls (e.g. merging).
$n^{\log_b a}$: The "balancing exponent." It's the work the leaves of the recursion tree contribute. Compare $f(n)$ to this.

The intuition. The recursion tree has depth $\log_b n$, and at the bottom there are $a^{\log_b n} = n^{\log_b a}$ leaves, each doing constant work. So leaves contribute $\Theta(n^{\log_b a})$. The total is dominated by either the leaves or the per-level combine cost — whichever wins is the answer.

The three cases:

T(n) = \begin{cases} \Theta(n^{\log_b a}) & \text{if } f(n) = O(n^{\log_b a - \varepsilon}) \text{ for some } \varepsilon > 0, \\[4pt] \Theta(n^{\log_b a} \log n) & \text{if } f(n) = \Theta(n^{\log_b a}), \\[4pt] \Theta(f(n)) & \text{if } f(n) = \Omega(n^{\log_b a + \varepsilon}) \text{ and the regularity condition holds.} \end{cases}

Master theorem cases

Case 1: The leaves dominate. $f(n)$ is polynomially smaller than $n^{\log_b a}$, so the answer is $\Theta(n^{\log_b a})$.
Case 2: Each level does the same amount of work, total. The depth $\log n$ gets multiplied in.
Case 3: The top of the tree dominates. $f(n)$ is polynomially larger than $n^{\log_b a}$, so the cost of combining at the top dwarfs the leaves.
$\varepsilon$: Any positive constant. "Polynomially smaller" means smaller by at least an $n^\varepsilon$ factor.
Regularity condition: $a\, f(n/b) \le c\, f(n)$ for some $c < 1$ and all sufficiently large $n$. It's almost always satisfied for the $f(n)$ you'll meet in practice.

How to use it. Compute $\log_b a$. Compare $f(n)$ to $n^{\log_b a}$. The bigger one (with a polynomial gap) wins. If they're tied, multiply by $\log n$.

Example A. Merge sort: $T(n) = 2 T(n/2) + n$. So $a = 2$, $b = 2$, $f(n) = n$. $\log_b a = \log_2 2 = 1$, so $n^{\log_b a} = n$. We have $f(n) = n = \Theta(n^1)$ — case 2. Answer: $\Theta(n \log n)$. Matches the unrolling.

Example B. Binary search: $T(n) = T(n/2) + 1$. So $a = 1$, $b = 2$, $f(n) = 1$. $\log_b a = 0$, so $n^{\log_b a} = 1$. $f(n) = 1 = \Theta(1)$ — case 2. Answer: $\Theta(\log n)$.

Example C. Strassen's matrix multiplication: $T(n) = 7 T(n/2) + n^2$. So $a = 7$, $b = 2$, $f(n) = n^2$. $\log_2 7 \approx 2.807$, so $n^{\log_b a} \approx n^{2.807}$. $f(n) = n^2 = O(n^{2.807 - 0.5})$ — case 1. Answer: $\Theta(n^{\log_2 7}) \approx \Theta(n^{2.81})$. That's the famous beat-cubic result.

Example D. A "shrink and combine a lot" recurrence: $T(n) = 2 T(n/2) + n^2$. $a = 2, b = 2$, $\log_b a = 1$, $n^{\log_b a} = n$. $f(n) = n^2 = \Omega(n^{1 + 0.5})$ — case 3. Answer: $\Theta(n^2)$.

What the master theorem can't do. It needs $f(n)$ to be polynomially bounded relative to $n^{\log_b a}$. If $f(n) = n \log n$ and $n^{\log_b a} = n$, you're in the gap and the basic master theorem fails — you need an extended version (Akra–Bazzi). Also, the recurrence has to have the form $a T(n/b) + f(n)$ — uneven splits like $T(n) = T(n/3) + T(2n/3) + n$ need other tools.

Source — Recurrence Examples

-- Fibonacci: T(n) = 2*T(n-1) + O(1) → O(2^n)
function fib(n):
    if n <= 1: return n
    return fib(n-1) + fib(n-2)

-- Merge sort: T(n) = 2*T(n/2) + O(n) → O(n log n)
function merge_sort(array):
    if len(array) <= 1: return array
    mid = len(array) / 2
    left  = merge_sort(array[0..mid])
    right = merge_sort(array[mid..end])
    return merge(left, right)

function merge(left, right):
    result = []
    while left and right are non-empty:
        if left[0] <= right[0]:
            append left[0] to result, advance left
        else:
            append right[0] to result, advance right
    append remaining elements
    return result

-- Binary search: T(n) = T(n/2) + O(1) → O(log n)
function bin_search(array, lo, hi, target):
    if lo > hi: return -1
    mid = (lo + hi) / 2
    if array[mid] == target: return mid
    if array[mid] < target: return bin_search(array, mid+1, hi, target)
    else:                    return bin_search(array, lo, mid-1, target)

import sys
sys.setrecursionlimit(10000)

# Fibonacci: T(n) = 2*T(n-1) + O(1)  →  O(2^n)
# Each call spawns two more; the tree has 2^n leaves.
def fib(n):
    if n <= 1:
        return n
    return fib(n - 1) + fib(n - 2)

# Merge sort: T(n) = 2*T(n/2) + O(n)  →  O(n log n)  (Master theorem case 2)
def merge_sort(xs):
    if len(xs) <= 1:
        return xs
    mid = len(xs) // 2
    left  = merge_sort(xs[:mid])
    right = merge_sort(xs[mid:])
    return merge(left, right)

def merge(a, b):
    out, i, j = [], 0, 0
    while i < len(a) and j < len(b):
        if a[i] <= b[j]:
            out.append(a[i]); i += 1
        else:
            out.append(b[j]); j += 1
    out.extend(a[i:]); out.extend(b[j:])
    return out

# Recursive binary search: T(n) = T(n/2) + O(1)  →  O(log n)
def bin_search(xs, lo, hi, target):
    if lo > hi:
        return -1
    mid = (lo + hi) // 2
    if xs[mid] == target:
        return mid
    if xs[mid] < target:
        return bin_search(xs, mid + 1, hi, target)
    return bin_search(xs, lo, mid - 1, target)

// Fibonacci: T(n) = 2T(n-1) + O(1) → O(2^n)
function fib(n) {
  if (n <= 1) return n;
  return fib(n - 1) + fib(n - 2);
}

// Merge sort: T(n) = 2T(n/2) + O(n) → O(n log n)
function mergeSort(xs) {
  if (xs.length <= 1) return xs;
  const mid = xs.length >> 1;
  const left  = mergeSort(xs.slice(0, mid));
  const right = mergeSort(xs.slice(mid));
  return merge(left, right);
}
function merge(a, b) {
  const out = [];
  let i = 0, j = 0;
  while (i < a.length && j < b.length)
    out.push(a[i] <= b[j] ? a[i++] : b[j++]);
  return out.concat(a.slice(i), b.slice(j));
}

// Recursive binary search: T(n) = T(n/2) + O(1) → O(log n)
function binSearch(xs, lo, hi, target) {
  if (lo > hi) return -1;
  const mid = (lo + hi) >> 1;
  if (xs[mid] === target) return mid;
  if (xs[mid] < target)  return binSearch(xs, mid + 1, hi, target);
  return binSearch(xs, lo, mid - 1, target);
}

#include <stdlib.h>
#include <string.h>

/* Fibonacci: T(n) = 2T(n-1)+O(1) → O(2^n) */
long fib(int n) {
    if (n <= 1) return n;
    return fib(n-1) + fib(n-2);
}

/* Recursive binary search: T(n) = T(n/2)+O(1) → O(log n) */
int bin_search(int *xs, int lo, int hi, int target) {
    if (lo > hi) return -1;
    int mid = lo + (hi - lo) / 2;
    if (xs[mid] == target) return mid;
    if (xs[mid] < target)  return bin_search(xs, mid+1, hi, target);
    return bin_search(xs, lo, mid-1, target);
}

/* Merge: O(n) combine step for merge sort */
void merge(int *xs, int *tmp, int lo, int mid, int hi) {
    memcpy(tmp + lo, xs + lo, (hi - lo + 1) * sizeof(int));
    int i = lo, j = mid + 1, k = lo;
    while (i <= mid && j <= hi)
        xs[k++] = (tmp[i] <= tmp[j]) ? tmp[i++] : tmp[j++];
    while (i <= mid) xs[k++] = tmp[i++];
    while (j <= hi)  xs[k++] = tmp[j++];
}

/* Merge sort: T(n) = 2T(n/2)+O(n) → O(n log n) */
void merge_sort(int *xs, int *tmp, int lo, int hi) {
    if (lo >= hi) return;
    int mid = lo + (hi - lo) / 2;
    merge_sort(xs, tmp, lo, mid);
    merge_sort(xs, tmp, mid+1, hi);
    merge(xs, tmp, lo, mid, hi);
}

#include <vector>
using std::vector;

// Fibonacci: T(n) = 2T(n-1)+O(1) → O(2^n)
long fib(int n) {
    if (n <= 1) return n;
    return fib(n-1) + fib(n-2);
}

// Recursive binary search: T(n) = T(n/2)+O(1) → O(log n)
int binSearch(const vector<int>& xs, int lo, int hi, int target) {
    if (lo > hi) return -1;
    int mid = lo + (hi - lo) / 2;
    if (xs[mid] == target) return mid;
    if (xs[mid] < target)  return binSearch(xs, mid+1, hi, target);
    return binSearch(xs, lo, mid-1, target);
}

// Merge sort: T(n) = 2T(n/2)+O(n) → O(n log n)
vector<int> merge(const vector<int>& a, const vector<int>& b) {
    vector<int> out;
    size_t i = 0, j = 0;
    while (i < a.size() && j < b.size())
        out.push_back(a[i] <= b[j] ? a[i++] : b[j++]);
    out.insert(out.end(), a.begin()+i, a.end());
    out.insert(out.end(), b.begin()+j, b.end());
    return out;
}

vector<int> mergeSort(vector<int> xs) {
    if (xs.size() <= 1) return xs;
    size_t mid = xs.size() / 2;
    return merge(mergeSort({xs.begin(), xs.begin()+mid}),
                 mergeSort({xs.begin()+mid, xs.end()}));
}

import java.util.Arrays;

public class RecurrenceExamples {

    // Fibonacci: T(n) = 2T(n-1)+O(1) → O(2^n)
    static long fib(int n) {
        if (n <= 1) return n;
        return fib(n - 1) + fib(n - 2);
    }

    // Recursive binary search: T(n) = T(n/2)+O(1) → O(log n)
    static int binSearch(int[] xs, int lo, int hi, int target) {
        if (lo > hi) return -1;
        int mid = lo + (hi - lo) / 2;
        if (xs[mid] == target) return mid;
        if (xs[mid] < target)  return binSearch(xs, mid + 1, hi, target);
        return binSearch(xs, lo, mid - 1, target);
    }

    // Merge sort: T(n) = 2T(n/2)+O(n) → O(n log n)
    static int[] mergeSort(int[] xs) {
        if (xs.length <= 1) return xs;
        int mid = xs.length / 2;
        int[] left  = mergeSort(Arrays.copyOfRange(xs, 0, mid));
        int[] right = mergeSort(Arrays.copyOfRange(xs, mid, xs.length));
        return merge(left, right);
    }

    static int[] merge(int[] a, int[] b) {
        int[] out = new int[a.length + b.length];
        int i = 0, j = 0, k = 0;
        while (i < a.length && j < b.length)
            out[k++] = (a[i] <= b[j]) ? a[i++] : b[j++];
        while (i < a.length) out[k++] = a[i++];
        while (j < b.length) out[k++] = b[j++];
        return out;
    }
}

package main

// Fibonacci: T(n) = 2T(n-1)+O(1) → O(2^n)
func fib(n int) int {
	if n <= 1 {
		return n
	}
	return fib(n-1) + fib(n-2)
}

// Recursive binary search: T(n) = T(n/2)+O(1) → O(log n)
func binSearch(xs []int, lo, hi, target int) int {
	if lo > hi {
		return -1
	}
	mid := lo + (hi-lo)/2
	if xs[mid] == target {
		return mid
	}
	if xs[mid] < target {
		return binSearch(xs, mid+1, hi, target)
	}
	return binSearch(xs, lo, mid-1, target)
}

// Merge sort: T(n) = 2T(n/2)+O(n) → O(n log n)
func mergeSort(xs []int) []int {
	if len(xs) <= 1 {
		return xs
	}
	mid := len(xs) / 2
	return mergeFn(mergeSort(xs[:mid]), mergeSort(xs[mid:]))
}

func mergeFn(a, b []int) []int {
	out := make([]int, 0, len(a)+len(b))
	i, j := 0, 0
	for i < len(a) && j < len(b) {
		if a[i] <= b[j] {
			out = append(out, a[i]); i++
		} else {
			out = append(out, b[j]); j++
		}
	}
	out = append(out, a[i:]...)
	out = append(out, b[j:]...)
	return out
}

11. Amortized analysis

Sometimes the worst-case cost of a single operation is misleading. A dynamic array's append is $O(1)$ most of the time, but every so often it has to copy the whole array to a bigger buffer, which is $O(n)$. If you only look at the worst case, you'd think append is $O(n)$ — and you'd massively overestimate the cost of, say, building a list of a million items.

Amortized analysis is the trick of averaging over a sequence of operations. Not the average over random inputs (that's average-case analysis, see section 13). Averaging over a deterministic sequence of operations on a single data structure.

11.1 The dynamic array — the canonical example

You start with an array of capacity 1. Every time you append and the array is full, you allocate a new array of double the size, copy everything over, and free the old one. What's the total cost of $n$ appends?

Worst-case-per-op says: an append might trigger a copy, copy is $O(n)$, so $n$ appends is $O(n^2)$. Wrong. Most appends don't copy. Only the appends at sizes $1, 2, 4, 8, \dots, 2^k$ do, and each such copy costs that capacity. So the total copy work is

1 + 2 + 4 + \dots + n \;\le\; 2 n.

Geometric sum for dynamic array doubling

$1 + 2 + 4 + \dots + n$: The sizes at which a resize fires. Each term is the cost of one resize.
$2n$: The closed-form upper bound. A geometric series with ratio 2 sums to $2 \cdot$(its largest term) minus 1.

The result. Total work for $n$ appends is at most $2n + n = 3n$ — the resizing plus the actual appends. So amortized cost per append is $\Theta(1)$, even though one specific append might cost $\Theta(n)$. The doublings are rare enough that they don't accumulate.

11.2 Three methods

Aggregate method. Compute total cost over a sequence of $n$ operations, divide by $n$. What we just did.
Accounting method. Charge each operation a "fee" higher than its true cost. Save the surplus as "credit" stored on data items. When an expensive operation fires, pay for it with the saved credit. As long as credit never goes negative, the fee is a valid amortized bound.
Potential method. Define a potential function $\Phi$ over states of the data structure. The amortized cost of an operation is its real cost plus the change in potential: $\hat{c}_i = c_i + \Phi_i - \Phi_{i-1}$. Choose $\Phi$ so the amortized costs are easy to bound.

All three give the same answer when applied to dynamic array doubling: amortized $O(1)$ per append. The accounting method is most intuitive: charge each append a constant fee of, say, 3. The first unit pays for the actual append; the second unit gets stashed on the new element; the third unit gets stashed on a previously-existing element. When a doubling fires, every old element has a stashed credit that pays for its move. The math works out.

Source — Dynamic Array (Amortized)

-- Dynamic array that doubles capacity on overflow.
-- Amortized cost of append: O(1).

DynamicArray:
    buf = [None] * 1
    size = 0
    cap  = 1
    total_copies = 0    -- for demonstration

    function append(x):
        if size == cap:
            new_buf = [None] * (cap * 2)
            copy buf[0..size] into new_buf   -- costs `size` copies
            total_copies += size
            buf = new_buf
            cap *= 2
        buf[size] = x
        size += 1

-- After N appends, total_copies < 2*N because:
--   copies happen at sizes 1, 2, 4, 8, ..., N/2
--   sum = 1+2+4+...+N/2 = N-1 < N
-- So amortized copies per append < 2, i.e., O(1).

class DynamicArray:
    """Dynamic array with doubling strategy — demonstrates O(1) amortized append."""

    def __init__(self):
        self.cap = 1
        self.size = 0
        self.buf = [None] * self.cap
        self.total_copies = 0

    def append(self, x):
        if self.size == self.cap:
            new_buf = [None] * (self.cap * 2)
            for i in range(self.size):
                new_buf[i] = self.buf[i]
                self.total_copies += 1
            self.buf = new_buf
            self.cap *= 2
        self.buf[self.size] = x
        self.size += 1

    def __getitem__(self, i):
        return self.buf[i]

# Demonstration
N = 200_000
arr = DynamicArray()
for i in range(N):
    arr.append(i)

print(f"N appends:         {N}")
print(f"total copies:      {arr.total_copies}")
print(f"copies per append: {arr.total_copies / N:.4f}")
# Expected: copies per append ≈ 1.0  (geometric sum 1+2+4+...≈N)
# This confirms O(1) amortized cost per append.

class DynamicArray {
  constructor() {
    this.buf = new Array(1);
    this.size = 0;
    this.cap = 1;
    this.totalCopies = 0;
  }

  append(x) {
    if (this.size === this.cap) {
      const newBuf = new Array(this.cap * 2);
      for (let i = 0; i < this.size; i++) {
        newBuf[i] = this.buf[i];
        this.totalCopies++;
      }
      this.buf = newBuf;
      this.cap *= 2;
    }
    this.buf[this.size++] = x;
  }

  get(i) { return this.buf[i]; }
}

const N = 100_000;
const arr = new DynamicArray();
for (let i = 0; i < N; i++) arr.append(i);

console.log(`N appends:         ${N}`);
console.log(`total copies:      ${arr.totalCopies}`);
console.log(`copies per append: ${(arr.totalCopies / N).toFixed(4)}`);
// copies per append ≈ 1.0 — O(1) amortized confirmed

#include <stdio.h>
#include <stdlib.h>

typedef struct {
    int *buf;
    int  size, cap;
    long total_copies;
} DynArray;

void da_init(DynArray *da) {
    da->buf = malloc(sizeof(int));
    da->size = 0;
    da->cap  = 1;
    da->total_copies = 0;
}

void da_append(DynArray *da, int x) {
    if (da->size == da->cap) {
        int *new_buf = malloc(da->cap * 2 * sizeof(int));
        for (int i = 0; i < da->size; i++) {
            new_buf[i] = da->buf[i];
            da->total_copies++;
        }
        free(da->buf);
        da->buf = new_buf;
        da->cap *= 2;
    }
    da->buf[da->size++] = x;
}

int main(void) {
    DynArray da;
    da_init(&da);
    int N = 100000;
    for (int i = 0; i < N; i++) da_append(&da, i);
    printf("N=%d  copies=%ld  per_append=%.4f\n",
           N, da.total_copies, (double)da.total_copies / N);
    free(da.buf);
    return 0;
}

#include <iostream>
#include <memory>

class DynamicArray {
    std::unique_ptr<int[]> buf;
    int sz = 0, cap = 1;
public:
    long totalCopies = 0;

    DynamicArray() : buf(std::make_unique<int[]>(1)) {}

    void append(int x) {
        if (sz == cap) {
            auto newBuf = std::make_unique<int[]>(cap * 2);
            for (int i = 0; i < sz; i++) {
                newBuf[i] = buf[i];
                totalCopies++;
            }
            buf = std::move(newBuf);
            cap *= 2;
        }
        buf[sz++] = x;
    }

    int size() const { return sz; }
};

int main() {
    const int N = 100'000;
    DynamicArray da;
    for (int i = 0; i < N; i++) da.append(i);
    std::cout << "N=" << N
              << "  copies=" << da.totalCopies
              << "  per_append=" << (double)da.totalCopies / N << "\n";
}

public class DynamicArray {
    private int[] buf;
    private int size, cap;
    public long totalCopies;

    public DynamicArray() {
        buf = new int[1];
        size = 0;
        cap  = 1;
        totalCopies = 0;
    }

    public void append(int x) {
        if (size == cap) {
            int[] newBuf = new int[cap * 2];
            for (int i = 0; i < size; i++) {
                newBuf[i] = buf[i];
                totalCopies++;
            }
            buf = newBuf;
            cap *= 2;
        }
        buf[size++] = x;
    }

    public static void main(String[] args) {
        int N = 100_000;
        DynamicArray da = new DynamicArray();
        for (int i = 0; i < N; i++) da.append(i);
        System.out.printf("N=%d  copies=%d  per_append=%.4f%n",
                N, da.totalCopies, (double) da.totalCopies / N);
        // per_append ≈ 1.0 → O(1) amortized
    }
}

package main

import "fmt"

type DynamicArray struct {
	buf         []int
	size, cap_  int
	totalCopies int
}

func NewDynamicArray() *DynamicArray {
	return &DynamicArray{buf: make([]int, 1), cap_: 1}
}

func (da *DynamicArray) Append(x int) {
	if da.size == da.cap_ {
		newBuf := make([]int, da.cap_*2)
		for i := 0; i < da.size; i++ {
			newBuf[i] = da.buf[i]
			da.totalCopies++
		}
		da.buf = newBuf
		da.cap_ *= 2
	}
	da.buf[da.size] = x
	da.size++
}

func main() {
	N := 100_000
	da := NewDynamicArray()
	for i := 0; i < N; i++ {
		da.Append(i)
	}
	fmt.Printf("N=%d  copies=%d  per_append=%.4f\n",
		N, da.totalCopies, float64(da.totalCopies)/float64(N))
	// per_append ≈ 1.0 → O(1) amortized confirmed
}

12. Space complexity

Time isn't the only resource. Space — how much memory the algorithm uses — also grows with the input, and the bounds work the same way. You write $S(n) = O(g(n))$ and so on.

Two flavors:

Auxiliary space. The extra memory the algorithm uses, on top of the input. This is what most analyses report.
Total space. Auxiliary space plus the input. If you have to count the input (e.g. on a Turing machine model), use this.

Some examples:

Insertion sort sorts in place — auxiliary space $O(1)$.
Merge sort needs an extra buffer for the merge — auxiliary space $O(n)$.
Iterative binary search uses $O(1)$ extra space.
Recursive binary search uses $O(\log n)$ stack space because the recursion depth is $\log n$. Recursion has a memory cost. Every call frame is bytes on the stack.
Recursive depth-first search on a tree of $n$ nodes uses $O(h)$ stack space where $h$ is the height. Worst case (a degenerate "stick" tree) that's $O(n)$.

Time-space tradeoff. Many algorithms let you trade one for the other. Memoization spends memory to save time. Streaming algorithms throw away inputs to save memory. Knowing both bounds tells you which trade you can make.

Source — Space Complexity

-- In-place reverse: O(1) auxiliary space
function reverse_inplace(array):
    lo = 0, hi = len(array) - 1
    while lo < hi:
        swap array[lo] and array[hi]
        lo += 1, hi -= 1
    -- No extra array allocated — swaps use one temp variable.

-- Copy-and-reverse: O(n) auxiliary space
function reverse_copy(array):
    result = new array of same length
    for i in 0..len(array):
        result[i] = array[len(array) - 1 - i]
    return result
    -- Extra array of size n allocated.

-- Recursive binary search: O(log n) stack space
function rec_bsearch(array, lo, hi, target):
    if lo > hi: return -1
    mid = (lo + hi) / 2
    if array[mid] == target: return mid
    if array[mid] < target: return rec_bsearch(array, mid+1, hi, target)
    else:                    return rec_bsearch(array, lo, mid-1, target)
    -- Each call frame is O(1); depth is O(log n).

# In-place reverse — O(1) auxiliary space
def reverse_inplace(xs):
    lo, hi = 0, len(xs) - 1
    while lo < hi:
        xs[lo], xs[hi] = xs[hi], xs[lo]
        lo += 1; hi -= 1
    # No extra memory beyond two index variables.

# Copy-and-reverse — O(n) auxiliary space
def reverse_copy(xs):
    return xs[::-1]   # creates a new list of length n

# Iterative binary search — O(1) auxiliary space
def bsearch_iter(xs, target):
    lo, hi = 0, len(xs) - 1
    while lo <= hi:
        mid = (lo + hi) // 2
        if xs[mid] == target:  return mid
        if xs[mid] < target:   lo = mid + 1
        else:                  hi = mid - 1
    return -1

# Recursive binary search — O(log n) stack space
# Each frame holds lo, hi, mid: O(1) per frame.
# Recursion depth = O(log n).
def bsearch_rec(xs, lo, hi, target):
    if lo > hi:
        return -1
    mid = (lo + hi) // 2
    if xs[mid] == target:
        return mid
    if xs[mid] < target:
        return bsearch_rec(xs, mid + 1, hi, target)
    return bsearch_rec(xs, lo, mid - 1, target)

// In-place reverse — O(1) auxiliary space
function reverseInPlace(xs) {
  let lo = 0, hi = xs.length - 1;
  while (lo < hi) {
    [xs[lo], xs[hi]] = [xs[hi], xs[lo]];
    lo++; hi--;
  }
}

// Copy-and-reverse — O(n) auxiliary space
function reverseCopy(xs) {
  return xs.slice().reverse(); // new array of length n
}

// Iterative binary search — O(1) auxiliary space
function bsearchIter(xs, target) {
  let lo = 0, hi = xs.length - 1;
  while (lo <= hi) {
    const mid = (lo + hi) >> 1;
    if (xs[mid] === target) return mid;
    if (xs[mid] < target) lo = mid + 1;
    else hi = mid - 1;
  }
  return -1;
}

// Recursive binary search — O(log n) stack space
function bsearchRec(xs, lo, hi, target) {
  if (lo > hi) return -1;
  const mid = (lo + hi) >> 1;
  if (xs[mid] === target) return mid;
  if (xs[mid] < target)  return bsearchRec(xs, mid + 1, hi, target);
  return bsearchRec(xs, lo, mid - 1, target);
}

#include <stdlib.h>
#include <string.h>

/* In-place reverse — O(1) auxiliary space */
void reverse_inplace(int *xs, int n) {
    int lo = 0, hi = n - 1;
    while (lo < hi) {
        int tmp = xs[lo]; xs[lo] = xs[hi]; xs[hi] = tmp;
        lo++; hi--;
    }
}

/* Copy-and-reverse — O(n) auxiliary space */
int *reverse_copy(const int *xs, int n) {
    int *out = malloc(n * sizeof(int));
    for (int i = 0; i < n; i++)
        out[i] = xs[n - 1 - i];
    return out;  /* caller must free */
}

/* Iterative binary search — O(1) space */
int bsearch_iter(const int *xs, int n, int target) {
    int lo = 0, hi = n - 1;
    while (lo <= hi) {
        int mid = lo + (hi - lo) / 2;
        if (xs[mid] == target) return mid;
        if (xs[mid] < target)  lo = mid + 1;
        else                   hi = mid - 1;
    }
    return -1;
}

/* Recursive binary search — O(log n) stack space */
int bsearch_rec(const int *xs, int lo, int hi, int target) {
    if (lo > hi) return -1;
    int mid = lo + (hi - lo) / 2;
    if (xs[mid] == target) return mid;
    if (xs[mid] < target)  return bsearch_rec(xs, mid+1, hi, target);
    return bsearch_rec(xs, lo, mid-1, target);
}

#include <vector>
#include <algorithm>
using std::vector;

// In-place reverse — O(1) auxiliary space
void reverseInPlace(vector<int>& v) {
    std::reverse(v.begin(), v.end());
}

// Copy-and-reverse — O(n) auxiliary space
vector<int> reverseCopy(const vector<int>& v) {
    return vector<int>(v.rbegin(), v.rend());
}

// Iterative binary search — O(1) space
int bsearchIter(const vector<int>& v, int target) {
    int lo = 0, hi = (int)v.size() - 1;
    while (lo <= hi) {
        int mid = lo + (hi - lo) / 2;
        if (v[mid] == target) return mid;
        if (v[mid] < target)  lo = mid + 1;
        else                  hi = mid - 1;
    }
    return -1;
}

// Recursive binary search — O(log n) stack space
int bsearchRec(const vector<int>& v, int lo, int hi, int target) {
    if (lo > hi) return -1;
    int mid = lo + (hi - lo) / 2;
    if (v[mid] == target) return mid;
    if (v[mid] < target)  return bsearchRec(v, mid+1, hi, target);
    return bsearchRec(v, lo, mid-1, target);
}

import java.util.Arrays;

public class SpaceComplexity {

    // In-place reverse — O(1) auxiliary space
    static void reverseInPlace(int[] xs) {
        for (int lo = 0, hi = xs.length - 1; lo < hi; lo++, hi--) {
            int tmp = xs[lo]; xs[lo] = xs[hi]; xs[hi] = tmp;
        }
    }

    // Copy-and-reverse — O(n) auxiliary space
    static int[] reverseCopy(int[] xs) {
        int[] out = new int[xs.length];
        for (int i = 0; i < xs.length; i++)
            out[i] = xs[xs.length - 1 - i];
        return out;
    }

    // Iterative binary search — O(1) space
    static int bsearchIter(int[] xs, int target) {
        int lo = 0, hi = xs.length - 1;
        while (lo <= hi) {
            int mid = lo + (hi - lo) / 2;
            if (xs[mid] == target) return mid;
            if (xs[mid] < target)  lo = mid + 1;
            else                   hi = mid - 1;
        }
        return -1;
    }

    // Recursive binary search — O(log n) stack space
    static int bsearchRec(int[] xs, int lo, int hi, int target) {
        if (lo > hi) return -1;
        int mid = lo + (hi - lo) / 2;
        if (xs[mid] == target) return mid;
        if (xs[mid] < target)  return bsearchRec(xs, mid + 1, hi, target);
        return bsearchRec(xs, lo, mid - 1, target);
    }
}

package main

// In-place reverse — O(1) auxiliary space
func reverseInPlace(xs []int) {
	for lo, hi := 0, len(xs)-1; lo < hi; lo, hi = lo+1, hi-1 {
		xs[lo], xs[hi] = xs[hi], xs[lo]
	}
}

// Copy-and-reverse — O(n) auxiliary space
func reverseCopy(xs []int) []int {
	out := make([]int, len(xs))
	for i, x := range xs {
		out[len(xs)-1-i] = x
	}
	return out
}

// Iterative binary search — O(1) auxiliary space
func bsearchIter(xs []int, target int) int {
	lo, hi := 0, len(xs)-1
	for lo <= hi {
		mid := lo + (hi-lo)/2
		if xs[mid] == target {
			return mid
		} else if xs[mid] < target {
			lo = mid + 1
		} else {
			hi = mid - 1
		}
	}
	return -1
}

// Recursive binary search — O(log n) stack space
func bsearchRec(xs []int, lo, hi, target int) int {
	if lo > hi {
		return -1
	}
	mid := lo + (hi-lo)/2
	if xs[mid] == target {
		return mid
	}
	if xs[mid] < target {
		return bsearchRec(xs, mid+1, hi, target)
	}
	return bsearchRec(xs, lo, mid-1, target)
}

13. Best, average, worst

Big-O so far has been cagey about which input we're talking about. For a given $n$, your algorithm might be fast on some inputs and slow on others. Three standard ways to summarize:

Worst case. The maximum time over all inputs of size $n$. This is the default when nothing is specified.
Best case. The minimum time over all inputs of size $n$. Usually optimistic to the point of useless, but it can illustrate a point.
Average case. The expected time over a probability distribution of inputs of size $n$. Requires you to specify the distribution — usually "uniform random."

Quicksort is the classic illustration.

Worst case: $\Theta(n^2)$. Happens when every pivot is the smallest or largest element, so the partition is maximally unbalanced. On a sorted array with the naive "pick first element" pivot, that's exactly what you get.
Average case: $\Theta(n \log n)$. With a random pivot, the expected partition is balanced "enough" that the recursion depth is $O(\log n)$ on average.
Best case: $\Theta(n \log n)$. Even the best case still has to look at every element at every level.

Why is quicksort competitive in practice when its worst case is so bad? Because (a) the average case is $n \log n$ with small constants — much smaller than merge sort's, (b) you can randomize to get expected $n \log n$ on adversarial inputs, and (c) it sorts in place. Worst-case-per-op pessimism would tell you to never use it; reality says "it's the fastest comparison sort on most inputs." This is one of the cleanest examples of why "average case" matters as its own concept.

Source — Best/Average/Worst

-- Linear search
--   Best case:  O(1)  — target is at index 0
--   Worst case: O(n)  — target not present or at last index
function linear_search(array, target):
    for i in 0..len(array):
        if array[i] == target:
            return i
    return -1

-- Binary search (sorted array)
--   Best case:  O(1)  — target is the middle element on first check
--   Worst case: O(log n) — target not present; halve until empty
--   Average case: O(log n)
function binary_search(array, target):
    lo = 0, hi = len(array) - 1
    while lo <= hi:
        mid = (lo + hi) / 2
        if array[mid] == target: return mid
        if array[mid] < target:  lo = mid + 1
        else:                    hi = mid - 1
    return -1

import random, time

# Linear search
# Best:  O(1) — element at index 0
# Worst: O(n) — element absent or at last position
def linear_search(xs, target):
    for i, x in enumerate(xs):
        if x == target:
            return i
    return -1

# Binary search (sorted array)
# Best:  O(1)  — target is the midpoint
# Worst: O(log n) — target absent
# Average: O(log n)
def binary_search(xs, target):
    lo, hi = 0, len(xs) - 1
    while lo <= hi:
        mid = (lo + hi) // 2
        if xs[mid] == target:  return mid
        elif xs[mid] < target: lo = mid + 1
        else:                  hi = mid - 1
    return -1

# Demonstrate best vs worst
n = 10_000
xs = list(range(n))  # sorted

# Linear search: best case (target at front)
t0 = time.perf_counter()
for _ in range(1000): linear_search(xs, 0)
best = time.perf_counter() - t0

# Linear search: worst case (target absent)
t0 = time.perf_counter()
for _ in range(1000): linear_search(xs, n + 1)
worst = time.perf_counter() - t0

print(f"linear_search  best={best*1e3:.2f}ms  worst={worst*1e3:.2f}ms")
print(f"ratio worst/best ≈ {worst/best:.0f}  (should be ≈ n = {n})")

// Linear search — best O(1), worst O(n)
function linearSearch(xs, target) {
  for (let i = 0; i < xs.length; i++)
    if (xs[i] === target) return i;
  return -1;
}

// Binary search — best O(1), worst O(log n)
function binarySearch(xs, target) {
  let lo = 0, hi = xs.length - 1;
  while (lo <= hi) {
    const mid = (lo + hi) >> 1;
    if (xs[mid] === target) return mid;
    if (xs[mid] < target)  lo = mid + 1;
    else                   hi = mid - 1;
  }
  return -1;
}

// Quick benchmark
const n = 10_000;
const xs = Array.from({length: n}, (_, i) => i);

const t = (fn) => { const t0 = performance.now(); fn(); return performance.now() - t0; };

const reps = 1000;
const bestLinear  = t(() => { for (let i=0;i<reps;i++) linearSearch(xs, 0); });
const worstLinear = t(() => { for (let i=0;i<reps;i++) linearSearch(xs, n+1); });
const bsearch     = t(() => { for (let i=0;i<reps;i++) binarySearch(xs, n+1); });

console.log(`linear best:  ${bestLinear.toFixed(2)}ms`);
console.log(`linear worst: ${worstLinear.toFixed(2)}ms`);
console.log(`binary worst: ${bsearch.toFixed(2)}ms`);

#include <stdio.h>

/* Linear search — best O(1), worst O(n) */
int linear_search(const int *xs, int n, int target) {
    for (int i = 0; i < n; i++)
        if (xs[i] == target) return i;
    return -1;
}

/* Binary search — best O(1), worst O(log n) */
int binary_search(const int *xs, int n, int target) {
    int lo = 0, hi = n - 1;
    while (lo <= hi) {
        int mid = lo + (hi - lo) / 2;
        if (xs[mid] == target) return mid;
        if (xs[mid] < target)  lo = mid + 1;
        else                   hi = mid - 1;
    }
    return -1;
}

int main(void) {
    int n = 10000;
    int xs[10000];
    for (int i = 0; i < n; i++) xs[i] = i;

    /* Best case: target at index 0 */
    int r1 = linear_search(xs, n, 0);
    /* Worst case: target absent */
    int r2 = linear_search(xs, n, n + 1);
    /* Binary always O(log n) */
    int r3 = binary_search(xs, n, n + 1);

    printf("linear best (idx 0): %d\n", r1);
    printf("linear worst (absent): %d\n", r2);
    printf("binary worst (absent): %d\n", r3);
    return 0;
}

#include <vector>
#include <algorithm>
#include <iostream>
using std::vector;

// Linear search — best O(1), worst O(n)
int linearSearch(const vector<int>& v, int target) {
    for (int i = 0; i < (int)v.size(); i++)
        if (v[i] == target) return i;
    return -1;
}

// Binary search — best O(1), worst O(log n)
// (std::lower_bound is also O(log n) on random-access iterators)
int binarySearch(const vector<int>& v, int target) {
    auto it = std::lower_bound(v.begin(), v.end(), target);
    if (it != v.end() && *it == target)
        return (int)(it - v.begin());
    return -1;
}

int main() {
    int n = 10000;
    vector<int> v(n);
    for (int i = 0; i < n; i++) v[i] = i;

    std::cout << "linear search (best, idx 0):  " << linearSearch(v, 0)   << "\n";
    std::cout << "linear search (worst, absent): " << linearSearch(v, n+1) << "\n";
    std::cout << "binary search (absent):        " << binarySearch(v, n+1) << "\n";
}

import java.util.Arrays;

public class BestAvgWorst {

    // Linear search — best O(1), worst O(n)
    static int linearSearch(int[] xs, int target) {
        for (int i = 0; i < xs.length; i++)
            if (xs[i] == target) return i;
        return -1;
    }

    // Binary search — best O(1), worst O(log n)
    static int binarySearch(int[] xs, int target) {
        int lo = 0, hi = xs.length - 1;
        while (lo <= hi) {
            int mid = lo + (hi - lo) / 2;
            if (xs[mid] == target) return mid;
            if (xs[mid] < target)  lo = mid + 1;
            else                   hi = mid - 1;
        }
        return -1;
    }

    public static void main(String[] args) {
        int n = 10_000;
        int[] xs = new int[n];
        for (int i = 0; i < n; i++) xs[i] = i;

        System.out.println("linear best  (idx 0):  " + linearSearch(xs, 0));
        System.out.println("linear worst (absent): " + linearSearch(xs, n + 1));
        System.out.println("binary worst (absent): " + binarySearch(xs, n + 1));
        // Arrays.binarySearch is the JDK O(log n) implementation:
        System.out.println("JDK binary   (absent): " + Arrays.binarySearch(xs, n + 1));
    }
}

package main

import "fmt"

// Linear search — best O(1), worst O(n)
func linearSearch(xs []int, target int) int {
	for i, x := range xs {
		if x == target {
			return i
		}
	}
	return -1
}

// Binary search — best O(1), worst O(log n)
func binarySearch(xs []int, target int) int {
	lo, hi := 0, len(xs)-1
	for lo <= hi {
		mid := lo + (hi-lo)/2
		if xs[mid] == target {
			return mid
		} else if xs[mid] < target {
			lo = mid + 1
		} else {
			hi = mid - 1
		}
	}
	return -1
}

func main() {
	n := 10_000
	xs := make([]int, n)
	for i := range xs {
		xs[i] = i
	}
	fmt.Println("linear best  (idx 0):", linearSearch(xs, 0))
	fmt.Println("linear worst (absent):", linearSearch(xs, n+1))
	fmt.Println("binary worst (absent):", binarySearch(xs, n+1))
}

14. Lower bounds

Big-O bounds the running time of one specific algorithm. Lower bounds are different and more powerful: they bound the running time of any possible algorithm for a problem. They tell you "no one will ever do better than this, no matter how clever they are."

The most famous lower bound in CS is the comparison-sort bound: any sorting algorithm that only inspects elements via comparisons must, in the worst case, do $\Omega(n \log n)$ comparisons. Here's the argument.

14.1 Decision trees

Model a comparison-based sorting algorithm as a binary tree. Each internal node is a comparison "is $a_i < a_j$?" with two outgoing edges, "yes" and "no." Each leaf is a final permutation of the input. Running the algorithm is following a path from root to leaf.

For the algorithm to be correct, it must produce a different output for every distinct input ordering. There are $n!$ orderings, so the tree must have at least $n!$ leaves.

A binary tree with $L$ leaves has depth at least $\log_2 L$. So the depth — which is the worst-case number of comparisons — is at least:

\log_2 (n!) \;=\; \sum_{k=1}^n \log_2 k \;\ge\; \int_1^n \log_2 x \, dx \;=\; n \log_2 n - n \log_2 e + 1.

Stirling-style bound on $\log(n!)$

$\log_2 (n!)$: The depth lower bound — at least this many comparisons in the worst case.
$\sum_{k=1}^n \log_2 k$: $\log(n!)$ written out as a sum: $\log 1 + \log 2 + \dots + \log n$.
$\int_1^n \log_2 x \, dx$: An integral approximation. The sum of $\log k$ is at least the integral of $\log x$ from 1 to $n$. See calculus.
$n \log_2 n - n \log_2 e + 1$: The closed form of the integral. The leading term is $n \log_2 n$.

Punchline. $\log_2(n!) = \Theta(n \log n)$ by Stirling's approximation. So any comparison-based sort does at least $\Omega(n \log n)$ comparisons. Merge sort, heap sort, randomized quicksort all hit this bound up to constants — they're asymptotically optimal.

You can sort faster than $n \log n$ if you cheat the model: counting sort, radix sort, and bucket sort all run in $O(n)$ when the keys come from a small or structured universe. They aren't comparison sorts — they peek at the actual values, not just their relative order. The lower bound is for comparison sorts only.

The decision-tree argument is your introduction to information-theoretic lower bounds. The general pattern: every "decision" your algorithm makes carries a bounded amount of information (1 bit for a comparison). To distinguish among $K$ possible answers, you need $\log_2 K$ bits. So at least $\log_2 K$ comparisons. Powerful and reusable.

Source — Sorting Lower Bound Demo

-- Instrumented comparison sort: count comparisons.
-- Lower bound says worst case >= log2(n!) comparisons.

function instrumented_sort(array):
    comparisons = 0
    -- run any comparison sort; intercept each compare:
    -- whenever we check array[i] < array[j]:
    --     comparisons += 1
    return sorted_array, comparisons

-- Information-theoretic argument:
-- n! possible orderings of n elements
-- each comparison (yes/no) gives 1 bit of information
-- need log2(n!) bits to identify the right permutation
-- log2(n!) ≈ n*log2(n) - n/ln(2)  [Stirling]
-- so any comparison sort must make Omega(n log n) comparisons

function stirling_lower_bound(n):
    -- log2(n!) >= n*log2(n) - n*log2(e)
    return n * log2(n) - n * log2(e)

import math, random

# Instrumented merge sort — counts comparisons
comparisons = 0

def merge_sort(xs):
    global comparisons
    if len(xs) <= 1:
        return xs
    mid = len(xs) // 2
    left  = merge_sort(xs[:mid])
    right = merge_sort(xs[mid:])
    return merge(left, right)

def merge(a, b):
    global comparisons
    out, i, j = [], 0, 0
    while i < len(a) and j < len(b):
        comparisons += 1   # one comparison per iteration
        if a[i] <= b[j]:
            out.append(a[i]); i += 1
        else:
            out.append(b[j]); j += 1
    out.extend(a[i:]); out.extend(b[j:])
    return out

# Stirling lower bound: log2(n!) >= n*log2(n) - n*log2(e)
def lower_bound(n):
    if n < 2: return 0
    return n * math.log2(n) - n * math.log2(math.e)

# Measure and compare
for n in [8, 16, 32, 64, 128, 256]:
    comparisons = 0
    xs = list(range(n)); random.shuffle(xs)
    merge_sort(xs)
    lb = lower_bound(n)
    print(f"n={n:4d}  comparisons={comparisons:5d}"
          f"  lower_bound={lb:6.1f}  ratio={comparisons/lb:.3f}")
# ratio stays near 1.0–1.5: comparisons are Theta(n log n)

let comparisons = 0;

function mergeSort(xs) {
  if (xs.length <= 1) return xs;
  const mid = xs.length >> 1;
  return merge(mergeSort(xs.slice(0, mid)), mergeSort(xs.slice(mid)));
}

function merge(a, b) {
  const out = [];
  let i = 0, j = 0;
  while (i < a.length && j < b.length) {
    comparisons++;
    out.push(a[i] <= b[j] ? a[i++] : b[j++]);
  }
  return out.concat(a.slice(i), b.slice(j));
}

// log2(n!) lower bound (Stirling)
function lowerBound(n) {
  return n * Math.log2(n) - n * Math.LOG2E;
}

// Shuffle helper
function shuffle(xs) {
  for (let i = xs.length - 1; i > 0; i--) {
    const j = Math.floor(Math.random() * (i + 1));
    [xs[i], xs[j]] = [xs[j], xs[i]];
  }
}

for (const n of [8, 16, 32, 64, 128, 256]) {
  const xs = Array.from({length: n}, (_, i) => i);
  shuffle(xs);
  comparisons = 0;
  mergeSort(xs);
  const lb = lowerBound(n);
  console.log(`n=${n}  cmps=${comparisons}  lb=${lb.toFixed(1)}  ratio=${(comparisons/lb).toFixed(3)}`);
}

#include <stdio.h>
#include <stdlib.h>
#include <math.h>

static long comparisons = 0;

void merge(int *xs, int *tmp, int lo, int mid, int hi) {
    for (int k = lo; k <= hi; k++) tmp[k] = xs[k];
    int i = lo, j = mid + 1, k = lo;
    while (i <= mid && j <= hi) {
        comparisons++;
        xs[k++] = (tmp[i] <= tmp[j]) ? tmp[i++] : tmp[j++];
    }
    while (i <= mid) xs[k++] = tmp[i++];
    while (j <= hi)  xs[k++] = tmp[j++];
}

void merge_sort(int *xs, int *tmp, int lo, int hi) {
    if (lo >= hi) return;
    int mid = lo + (hi - lo) / 2;
    merge_sort(xs, tmp, lo, mid);
    merge_sort(xs, tmp, mid + 1, hi);
    merge(xs, tmp, lo, mid, hi);
}

double lower_bound(int n) {
    /* log2(n!) >= n*log2(n) - n*log2(e) */
    return n * log2(n) - n * M_LOG2E;
}

int main(void) {
    int sizes[] = {8, 16, 32, 64, 128, 256};
    for (int s = 0; s < 6; s++) {
        int n = sizes[s];
        int *xs = malloc(n * sizeof(int));
        int *tmp = malloc(n * sizeof(int));
        for (int i = 0; i < n; i++) xs[i] = i;
        /* shuffle */
        for (int i = n - 1; i > 0; i--) {
            int j = rand() % (i + 1);
            int t = xs[i]; xs[i] = xs[j]; xs[j] = t;
        }
        comparisons = 0;
        merge_sort(xs, tmp, 0, n - 1);
        printf("n=%3d  cmps=%4ld  lb=%6.1f  ratio=%.3f\n",
               n, comparisons, lower_bound(n),
               comparisons / lower_bound(n));
        free(xs); free(tmp);
    }
}

#include <vector>
#include <cmath>
#include <iostream>
#include <algorithm>
using std::vector;

long comparisons = 0;

vector<int> merge(const vector<int>& a, const vector<int>& b) {
    vector<int> out;
    size_t i = 0, j = 0;
    while (i < a.size() && j < b.size()) {
        comparisons++;
        out.push_back(a[i] <= b[j] ? a[i++] : b[j++]);
    }
    out.insert(out.end(), a.begin()+i, a.end());
    out.insert(out.end(), b.begin()+j, b.end());
    return out;
}

vector<int> mergeSort(vector<int> xs) {
    if (xs.size() <= 1) return xs;
    size_t mid = xs.size() / 2;
    return merge(mergeSort({xs.begin(), xs.begin()+mid}),
                 mergeSort({xs.begin()+mid, xs.end()}));
}

double lowerBound(int n) {
    return n * std::log2(n) - n * M_LOG2E;
}

int main() {
    for (int n : {8, 16, 32, 64, 128, 256}) {
        vector<int> xs(n); std::iota(xs.begin(), xs.end(), 0);
        std::shuffle(xs.begin(), xs.end(), std::mt19937{42});
        comparisons = 0;
        mergeSort(xs);
        double lb = lowerBound(n);
        std::cout << "n=" << n << "  cmps=" << comparisons
                  << "  lb=" << lb << "  ratio=" << comparisons/lb << "\n";
    }
}

public class LowerBoundDemo {
    static long comparisons = 0;

    static int[] mergeSort(int[] xs) {
        if (xs.length <= 1) return xs;
        int mid = xs.length / 2;
        int[] left  = mergeSort(java.util.Arrays.copyOfRange(xs, 0, mid));
        int[] right = mergeSort(java.util.Arrays.copyOfRange(xs, mid, xs.length));
        return merge(left, right);
    }

    static int[] merge(int[] a, int[] b) {
        int[] out = new int[a.length + b.length];
        int i = 0, j = 0, k = 0;
        while (i < a.length && j < b.length) {
            comparisons++;
            out[k++] = (a[i] <= b[j]) ? a[i++] : b[j++];
        }
        while (i < a.length) out[k++] = a[i++];
        while (j < b.length) out[k++] = b[j++];
        return out;
    }

    static double lowerBound(int n) {
        return n * (Math.log(n) / Math.log(2)) - n / Math.log(2);
    }

    public static void main(String[] args) {
        for (int n : new int[]{8, 16, 32, 64, 128, 256}) {
            int[] xs = new int[n];
            for (int i = 0; i < n; i++) xs[i] = n - i;  // reverse sorted
            comparisons = 0;
            mergeSort(xs);
            double lb = lowerBound(n);
            System.out.printf("n=%3d  cmps=%4d  lb=%6.1f  ratio=%.3f%n",
                    n, comparisons, lb, comparisons / lb);
        }
    }
}

package main

import (
	"fmt"
	"math"
)

var comparisons int64

func mergeSort(xs []int) []int {
	if len(xs) <= 1 {
		return xs
	}
	mid := len(xs) / 2
	return mergeFn(mergeSort(xs[:mid]), mergeSort(xs[mid:]))
}

func mergeFn(a, b []int) []int {
	out := make([]int, 0, len(a)+len(b))
	i, j := 0, 0
	for i < len(a) && j < len(b) {
		comparisons++
		if a[i] <= b[j] {
			out = append(out, a[i]); i++
		} else {
			out = append(out, b[j]); j++
		}
	}
	out = append(out, a[i:]...)
	out = append(out, b[j:]...)
	return out
}

func lowerBound(n int) float64 {
	fn := float64(n)
	return fn*math.Log2(fn) - fn*math.Log2(math.E)
}

func main() {
	for _, n := range []int{8, 16, 32, 64, 128, 256} {
		xs := make([]int, n)
		for i := range xs { xs[i] = n - i } // reverse sorted
		comparisons = 0
		mergeSort(xs)
		lb := lowerBound(n)
		fmt.Printf("n=%3d  cmps=%4d  lb=%6.1f  ratio=%.3f\n",
			n, comparisons, lb, float64(comparisons)/lb)
	}
}

15. Practical considerations

Big-O is a beautiful tool and it lies to you in three predictable ways. Knowing the lies is part of growing up as a programmer.

Lie 1: constants don't matter. They do, just not asymptotically. Insertion sort is $\Theta(n^2)$ and merge sort is $\Theta(n \log n)$, but for $n < 30$ or so, insertion sort is faster — its constants are tiny and its loops are cache-friendly. Real merge sort implementations switch to insertion sort on small subarrays. The $n \log n$ "wins" only past the crossover.

Lie 2: all operations are equally cheap. The RAM model says memory access is constant time. On real hardware, an L1 cache hit is ~4 cycles and a main-memory miss is ~300. The same algorithm with the same Big-O can be 10× faster if it's cache-friendly. Linked lists are $O(1)$ insert and $O(n)$ traversal — the same as arrays — but the constant is much worse because of pointer chasing. This is why std::vector is often faster than std::list even when the list "should" win on paper.

Lie 3: $n$ goes to infinity. Asymptotic analysis is a statement about the limit. Your $n$ is 10,000, not infinity. An $O(\log n)$ algorithm with a constant of 1000 might lose to an $O(n)$ algorithm with a constant of 1 for any $n$ you're likely to see. People love using fancy data structures because they're $O(\log n)$. Sometimes a flat array and a linear scan crushes them on small $n$.

The pragmatic stance. Use Big-O to rule out algorithms that obviously won't scale. Use a profiler and a benchmark to choose between algorithms in the same complexity class. The two are complementary; either alone is dangerous.

16. Empirical benchmark

Here's a Python script you can run on your laptop. It times three algorithms — linear search, sorting, and a quadratic all-pairs comparison — at increasing sizes and prints the ratio of consecutive runs. You should see the ratios converge to the predicted growth factors: $\sim$2 for linear, $\sim$2 for $n \log n$ (the log term is small), and $\sim$4 for quadratic when $n$ doubles.

Source — Empirical Benchmarks

-- Time three functions at sizes 1000, 2000, 4000, 8000, 16000.
-- Print the ratio of consecutive measurements.
-- Expected ratios when n doubles:
--   O(n)      ~2.0
--   O(n log n) ~2.1  (log term grows slowly)
--   O(n^2)    ~4.0

function linear_sum(array):        -- O(n)
    s = 0
    for each x: s += x
    return s

function sort(array):               -- O(n log n)
    return sorted copy of array

function all_pairs_count(array):   -- O(n^2)
    c = 0
    for each pair (i,j) with i < j:
        if array[i] < array[j]: c += 1
    return c

function timed(fn, array, repeat=3):
    best = infinity
    repeat times:
        t0 = clock()
        fn(array)
        best = min(best, clock() - t0)
    return best

for each fn in [linear_sum, sort, all_pairs_count]:
    prev = null
    for each n in [1000, 2000, 4000, 8000, 16000]:
        data = random array of size n
        t = timed(fn, data)
        ratio = t/prev if prev else "--"
        print(n, t, ratio)
        prev = t

import time, random

def linear_sum(xs):
    s = 0
    for x in xs: s += x
    return s

def sort_copy(xs):
    return sorted(xs)

def all_pairs_count(xs):
    n, c = len(xs), 0
    for i in range(n):
        for j in range(i + 1, n):
            if xs[i] < xs[j]: c += 1
    return c

def timed(fn, xs, repeat=3):
    best = float("inf")
    for _ in range(repeat):
        t0 = time.perf_counter()
        fn(xs)
        best = min(best, time.perf_counter() - t0)
    return best

if __name__ == "__main__":
    sizes = [1000, 2000, 4000, 8000, 16000]
    fns = [("O(n)     linear_sum", linear_sum),
           ("O(nlogn) sort_copy",  sort_copy),
           ("O(n^2)   all_pairs",  all_pairs_count)]

    for name, fn in fns:
        print(f"\n{name}")
        prev = None
        for n in sizes:
            xs = [random.random() for _ in range(n)]
            t = timed(fn, xs)
            ratio = f"x{t/prev:5.2f}" if prev else "     "
            print(f"  n={n:6d}  t={t*1e3:8.2f} ms  {ratio}")
            prev = t
    # Expected ratios when n doubles:
    #   O(n)      ~2.0
    #   O(nlogn)  ~2.1
    #   O(n^2)    ~4.0

// Run in Node.js — uses performance.now() for timing.
// Expected ratios when n doubles: O(n)~2, O(nlogn)~2.1, O(n^2)~4

function linearSum(xs) {
  return xs.reduce((s, x) => s + x, 0);
}

function sortCopy(xs) {
  return xs.slice().sort((a, b) => a - b);
}

function allPairsCount(xs) {
  let c = 0;
  for (let i = 0; i < xs.length; i++)
    for (let j = i + 1; j < xs.length; j++)
      if (xs[i] < xs[j]) c++;
  return c;
}

function timed(fn, xs, repeat = 3) {
  let best = Infinity;
  for (let i = 0; i < repeat; i++) {
    const t0 = performance.now();
    fn(xs);
    best = Math.min(best, performance.now() - t0);
  }
  return best;
}

const sizes = [500, 1000, 2000, 4000, 8000];
const fns = [["O(n)     linearSum",   linearSum],
             ["O(nlogn) sortCopy",    sortCopy],
             ["O(n^2)   allPairs",    allPairsCount]];

for (const [name, fn] of fns) {
  console.log(`\n${name}`);
  let prev = null;
  for (const n of sizes) {
    const xs = Array.from({length: n}, () => Math.random());
    const t = timed(fn, xs);
    const ratio = prev ? `x${(t/prev).toFixed(2)}` : "    ";
    console.log(`  n=${n.toString().padStart(5)}  t=${t.toFixed(2).padStart(7)}ms  ${ratio}`);
    prev = t;
  }
}

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

double linear_sum(double *xs, int n) {
    double s = 0; for (int i=0;i<n;i++) s+=xs[i]; return s;
}

int cmp_dbl(const void *a, const void *b) {
    double x = *(double*)a, y = *(double*)b;
    return (x>y)-(x<y);
}
void sort_copy(double *xs, double *out, int n) {
    for(int i=0;i<n;i++) out[i]=xs[i];
    qsort(out, n, sizeof(double), cmp_dbl);
}

long all_pairs(double *xs, int n) {
    long c=0;
    for(int i=0;i<n;i++) for(int j=i+1;j<n;j++) if(xs[i]<xs[j]) c++;
    return c;
}

double now_ms(void) {
    struct timespec ts;
    clock_gettime(CLOCK_MONOTONIC, &ts);
    return ts.tv_sec*1e3 + ts.tv_nsec/1e6;
}

int main(void) {
    int sizes[] = {500, 1000, 2000, 4000, 8000};
    for (int s=0; s<5; s++) {
        int n = sizes[s];
        double *xs = malloc(n*sizeof(double));
        double *out = malloc(n*sizeof(double));
        for(int i=0;i<n;i++) xs[i]=(double)rand()/RAND_MAX;

        double t0=now_ms(); linear_sum(xs,n); double tl=now_ms()-t0;
        t0=now_ms(); sort_copy(xs,out,n);     double ts=now_ms()-t0;
        t0=now_ms(); all_pairs(xs,n);          double tq=now_ms()-t0;

        printf("n=%5d  linear=%.2fms  sort=%.2fms  allpairs=%.2fms\n",
               n, tl, ts, tq);
        free(xs); free(out);
    }
}

#include <iostream>
#include <vector>
#include <algorithm>
#include <chrono>
#include <random>
using namespace std;
using ms = chrono::duration<double, milli>;

double linearSum(const vector<double>& v) {
    double s = 0; for (double x : v) s += x; return s;
}
vector<double> sortCopy(vector<double> v) {
    sort(v.begin(), v.end()); return v;
}
long allPairs(const vector<double>& v) {
    long c = 0;
    for (int i=0;i<(int)v.size();i++)
        for (int j=i+1;j<(int)v.size();j++)
            if (v[i]<v[j]) c++;
    return c;
}

template<class F> double timed(F fn) {
    auto t0 = chrono::high_resolution_clock::now();
    fn();
    return ms(chrono::high_resolution_clock::now()-t0).count();
}

int main() {
    mt19937 rng(42);
    uniform_real_distribution<double> dist(0,1);
    for (int n : {500, 1000, 2000, 4000, 8000}) {
        vector<double> v(n);
        generate(v.begin(), v.end(), [&]{ return dist(rng); });
        double tl = timed([&]{ linearSum(v); });
        double ts = timed([&]{ sortCopy(v);  });
        double tq = timed([&]{ allPairs(v);  });
        cout << "n=" << n << "  linear=" << tl << "ms"
             << "  sort=" << ts << "ms"
             << "  allpairs=" << tq << "ms\n";
    }
}

import java.util.Arrays;
import java.util.Random;

public class Benchmark {
    static double linearSum(double[] xs) {
        double s = 0; for (double x : xs) s += x; return s;
    }
    static double[] sortCopy(double[] xs) {
        double[] out = xs.clone(); Arrays.sort(out); return out;
    }
    static long allPairs(double[] xs) {
        long c = 0;
        for (int i=0;i<xs.length;i++)
            for (int j=i+1;j<xs.length;j++)
                if (xs[i]<xs[j]) c++;
        return c;
    }

    public static void main(String[] args) {
        Random rng = new Random(42);
        int[] sizes = {500, 1000, 2000, 4000, 8000};
        for (int n : sizes) {
            double[] xs = new double[n];
            for (int i=0;i<n;i++) xs[i] = rng.nextDouble();

            long t0; double tl, ts, tq;
            t0=System.nanoTime(); linearSum(xs); tl=(System.nanoTime()-t0)/1e6;
            t0=System.nanoTime(); sortCopy(xs);  ts=(System.nanoTime()-t0)/1e6;
            t0=System.nanoTime(); allPairs(xs);  tq=(System.nanoTime()-t0)/1e6;

            System.out.printf("n=%5d  linear=%.2fms  sort=%.2fms  allpairs=%.2fms%n",
                    n, tl, ts, tq);
        }
    }
}

package main

import (
	"fmt"
	"math/rand"
	"sort"
	"time"
)

func linearSum(xs []float64) float64 {
	s := 0.0; for _, x := range xs { s += x }; return s
}
func sortCopy(xs []float64) []float64 {
	out := make([]float64, len(xs))
	copy(out, xs)
	sort.Float64s(out)
	return out
}
func allPairs(xs []float64) int {
	c := 0
	for i := range xs {
		for j := i + 1; j < len(xs); j++ {
			if xs[i] < xs[j] { c++ }
		}
	}
	return c
}

func bench(fn func(), n int) time.Duration {
	t0 := time.Now(); fn(); return time.Since(t0)
}

func main() {
	for _, n := range []int{500, 1000, 2000, 4000, 8000} {
		xs := make([]float64, n)
		for i := range xs { xs[i] = rand.Float64() }

		tl := bench(func() { linearSum(xs) }, n)
		ts := bench(func() { sortCopy(xs)  }, n)
		tq := bench(func() { allPairs(xs)  }, n)

		fmt.Printf("n=%5d  linear=%v  sort=%v  allpairs=%v\n",
			n, tl.Round(time.Microsecond),
			ts.Round(time.Microsecond),
			tq.Round(time.Microsecond))
	}
}

The first script shows you that asymptotic predictions are real and measurable. The doubling ratio is the experiment that turns Big-O from a hand-wave into a quantitative claim.

17. Connection to ML

If you came here from the AI/ML track, here's why complexity analysis matters in modern machine learning. It's the same math, just at terrifying scale.

Training cost is a complexity question. The Chinchilla compute formula from foundation models is

C \;\approx\; 6 \cdot N \cdot D.

Training compute

$C$: Total floating-point operations needed to train a transformer.
$N$: Parameter count.
$D$: Number of training tokens.
$6$: The constant comes from 2 multiply-adds per parameter in the forward pass plus 4 in the backward.

Read this as Big-O. Training is $\Theta(N D)$ — linear in parameters and linear in data. That's the tractable, well-behaved direction. When you double $D$ or double $N$, you double cost. There's nothing exponential here, which is why scaling worked.

Attention is the wall. Self-attention compares every token in the context to every other token. For a sequence of length $L$, that's $\Theta(L^2)$ pairs. Inside a single transformer layer, attention is $\Theta(L^2 \cdot d)$ for embedding dimension $d$. The forward pass through all $\ell$ layers is

\Theta(\ell \cdot L^2 \cdot d).

Attention cost

$\ell$: Number of transformer layers (e.g. 32, 80).
$L$: Sequence length — the number of tokens in the context.
$d$: Embedding dimension — the width of the model.
$L^2$: The number of attention pairs. This is the term that bites.

Why this is "the wall." Doubling the context window quadruples the compute and quadruples the memory. Going from 4k to 32k tokens is an $8 \times 8 = 64$-fold cost increase. The entire long-context research agenda — Flash Attention, sparse attention, linear attention, ring attention, Mamba — exists to fight this $L^2$. None of them have fully won.

The general rule: anywhere you see "$L^2$" in machine learning, someone is trying to make it $L \log L$ or even $L$. The history of every long-context paper is the history of attacking a quadratic bound.

And it's not just attention. KV cache size is $O(L)$ per layer, total inference memory is $O(\ell L d)$, training memory is dominated by activations at $O(\ell L d)$. The complexity story is everywhere.

Source — ML Complexity Examples

-- Naive matrix multiply: O(n^3)
function matmul(A, B, n):
    C = zero matrix n x n
    for i in 0..n:
        for j in 0..n:
            for k in 0..n:
                C[i][j] += A[i][k] * B[k][j]
    return C

-- Strassen (conceptual): O(n^(log2 7)) ≈ O(n^2.807)
-- Splits each n×n matrix into four (n/2)×(n/2) blocks.
-- Uses 7 recursive multiplications instead of 8.
-- T(n) = 7*T(n/2) + O(n^2)  →  O(n^(log2 7)) by master theorem case 1.

-- Transformer forward pass FLOP estimate
-- Inputs: layers=L, seq_len=S, embed_dim=D, vocab=V
function estimate_flops(L, S, D, V):
    -- Attention per layer: 2 * S^2 * D  (Q*K^T and softmax*V)
    attn = 2 * S * S * D
    -- FFN per layer (hidden = 4D): 2 * S * D * 4D + 2 * S * 4D * D
    ffn  = 16 * S * D * D
    -- Unembedding: S * D * V
    unembed = S * D * V
    return L * (attn + ffn) + unembed
    -- Dominant term: L * S^2 * D when S is large  → O(L S^2 D)

# Naive matrix multiply: O(n^3)
def matmul_naive(A, B):
    n = len(A)
    C = [[0.0] * n for _ in range(n)]
    for i in range(n):
        for j in range(n):
            for k in range(n):
                C[i][j] += A[i][k] * B[k][j]
    return C

# Strassen's algorithm: O(n^log2(7)) ≈ O(n^2.807)
# Only worth implementing for large n; shown here for illustration.
def strassen(A, B):
    n = len(A)
    if n == 1:
        return [[A[0][0] * B[0][0]]]
    # split into 2x2 blocks of (n/2)x(n/2) sub-matrices
    h = n // 2
    def sub(X, r, c): return [[X[r+i][c+j] for j in range(h)] for i in range(h)]
    def add(X, Y):    return [[X[i][j]+Y[i][j] for j in range(h)] for i in range(h)]
    def sub2(X, Y):   return [[X[i][j]-Y[i][j] for j in range(h)] for i in range(h)]
    A11,A12,A21,A22 = sub(A,0,0),sub(A,0,h),sub(A,h,0),sub(A,h,h)
    B11,B12,B21,B22 = sub(B,0,0),sub(B,0,h),sub(B,h,0),sub(B,h,h)
    M1 = strassen(add(A11,A22), add(B11,B22))
    M2 = strassen(add(A21,A22), B11)
    M3 = strassen(A11, sub2(B12,B22))
    M4 = strassen(A22, sub2(B21,B11))
    M5 = strassen(add(A11,A12), B22)
    M6 = strassen(sub2(A21,A11), add(B11,B12))
    M7 = strassen(sub2(A12,A22), add(B21,B22))
    # 7 recursive calls vs 8 for naive → exponent drops from 3 to log2(7)≈2.807
    C = [[0.0]*n for _ in range(n)]
    for i in range(h):
        for j in range(h):
            C[i][j]     = M1[i][j]+M4[i][j]-M5[i][j]+M7[i][j]
            C[i][j+h]   = M3[i][j]+M5[i][j]
            C[i+h][j]   = M2[i][j]+M4[i][j]
            C[i+h][j+h] = M1[i][j]-M2[i][j]+M3[i][j]+M6[i][j]
    return C

# Transformer forward pass FLOP estimate: O(L * S^2 * D) dominant for large S
def transformer_flops(layers, seq_len, embed_dim, vocab_size):
    S, D, L, V = seq_len, embed_dim, layers, vocab_size
    attn_per_layer = 2 * S * S * D          # QK^T and softmax*V
    ffn_per_layer  = 2 * S * D * (4*D) * 2  # two linear layers, hidden=4D
    unembed        = S * D * V
    total = L * (attn_per_layer + ffn_per_layer) + unembed
    return total

# Example: GPT-2 small (12 layers, 1024 ctx, 768 dim, 50257 vocab)
flops = transformer_flops(12, 1024, 768, 50257)
print(f"GPT-2 small forward FLOPs ≈ {flops:.2e}")

// Naive matrix multiply: O(n^3)
function matmulNaive(A, B, n) {
  const C = Array.from({length: n}, () => new Float64Array(n));
  for (let i = 0; i < n; i++)
    for (let j = 0; j < n; j++)
      for (let k = 0; k < n; k++)
        C[i][j] += A[i][k] * B[k][j];
  return C;
}

// Transformer forward pass FLOP estimate
// Dominant term: L * S^2 * D when S is large → O(L S^2 D)
function transformerFlops(layers, seqLen, embedDim, vocabSize) {
  const [L, S, D, V] = [layers, seqLen, embedDim, vocabSize];
  const attnPerLayer = 2 * S * S * D;       // QK^T + softmax*V
  const ffnPerLayer  = 2 * S * D * (4*D) * 2; // hidden = 4D
  const unembed      = S * D * V;
  return L * (attnPerLayer + ffnPerLayer) + unembed;
}

// Benchmark naive matmul growth: expect ratio ~8 when n doubles (O(n^3))
console.log("Matrix multiply timing (O(n^3) — ratio ~8 when n doubles):");
let prev = null;
for (const n of [32, 64, 128]) {
  const A = Array.from({length:n}, () => Array.from({length:n}, Math.random));
  const B = Array.from({length:n}, () => Array.from({length:n}, Math.random));
  const t0 = performance.now();
  matmulNaive(A, B, n);
  const t = performance.now() - t0;
  const ratio = prev ? `x${(t/prev).toFixed(1)}` : "  --";
  console.log(`  n=${n}  t=${t.toFixed(1)}ms  ${ratio}`);
  prev = t;
}

// GPT-2 small estimate
const gpt2Flops = transformerFlops(12, 1024, 768, 50257);
console.log(`\nGPT-2 small forward FLOPs ~ ${gpt2Flops.toExponential(2)}`);

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

/* Naive matrix multiply: O(n^3) */
void matmul(double *A, double *B, double *C, int n) {
    for (int i = 0; i < n; i++)
        for (int j = 0; j < n; j++) {
            C[i*n+j] = 0;
            for (int k = 0; k < n; k++)
                C[i*n+j] += A[i*n+k] * B[k*n+j];
        }
}

/* Transformer FLOP estimate */
long long transformer_flops(int L, int S, int D, int V) {
    long long attn   = 2LL * S * S * D;
    long long ffn    = 2LL * S * D * (4*D) * 2;
    long long unemb  = (long long)S * D * V;
    return L * (attn + ffn) + unemb;
}

double now_ms(void) {
    struct timespec ts;
    clock_gettime(CLOCK_MONOTONIC, &ts);
    return ts.tv_sec*1e3 + ts.tv_nsec/1e6;
}

int main(void) {
    printf("Naive matmul — O(n^3), ratio ~8 when n doubles:\n");
    double prev = -1;
    for (int n = 32; n <= 128; n *= 2) {
        double *A = calloc(n*n, sizeof(double));
        double *B = calloc(n*n, sizeof(double));
        double *C = calloc(n*n, sizeof(double));
        for (int i=0;i<n*n;i++) { A[i]=(double)rand()/RAND_MAX; B[i]=(double)rand()/RAND_MAX; }
        double t0 = now_ms();
        matmul(A, B, C, n);
        double t = now_ms() - t0;
        if (prev > 0)
            printf("  n=%3d  t=%.2fms  ratio=x%.1f\n", n, t, t/prev);
        else
            printf("  n=%3d  t=%.2fms\n", n, t);
        prev = t;
        free(A); free(B); free(C);
    }
    long long flops = transformer_flops(12, 1024, 768, 50257);
    printf("\nGPT-2 small forward FLOPs ~ %lld (%.2e)\n", flops, (double)flops);
}

#include <vector>
#include <chrono>
#include <iostream>
#include <random>
using namespace std;
using ms = chrono::duration<double, milli>;

// Naive matrix multiply: O(n^3)
vector<vector<double>> matmul(
    const vector<vector<double>>& A,
    const vector<vector<double>>& B, int n)
{
    vector<vector<double>> C(n, vector<double>(n, 0));
    for (int i=0;i<n;i++)
        for (int k=0;k<n;k++)
            for (int j=0;j<n;j++)   // loop order ijk is cache-friendlier
                C[i][j] += A[i][k] * B[k][j];
    return C;
}

// Transformer FLOP estimate: dominant term O(L*S^2*D)
long long transformerFlops(int L, int S, int D, int V) {
    long long attn  = 2LL*S*S*D;
    long long ffn   = 2LL*S*D*(4*D)*2;
    long long unemb = (long long)S*D*V;
    return L*(attn+ffn)+unemb;
}

int main() {
    mt19937 rng(42);
    uniform_real_distribution<double> dist(0,1);
    double prev = -1;
    for (int n : {32, 64, 128}) {
        vector<vector<double>> A(n, vector<double>(n)), B(n, vector<double>(n));
        for (auto& row : A) generate(row.begin(), row.end(), [&]{return dist(rng);});
        for (auto& row : B) generate(row.begin(), row.end(), [&]{return dist(rng);});
        auto t0 = chrono::high_resolution_clock::now();
        matmul(A, B, n);
        double t = ms(chrono::high_resolution_clock::now()-t0).count();
        if (prev > 0)
            cout << "n=" << n << "  t=" << t << "ms  ratio=x" << t/prev << "\n";
        else
            cout << "n=" << n << "  t=" << t << "ms\n";
        prev = t;
    }
    cout << "GPT-2 forward FLOPs ~ " << transformerFlops(12,1024,768,50257) << "\n";
}

public class MLComplexity {

    // Naive matrix multiply: O(n^3)
    static double[][] matmul(double[][] A, double[][] B, int n) {
        double[][] C = new double[n][n];
        for (int i = 0; i < n; i++)
            for (int k = 0; k < n; k++)
                for (int j = 0; j < n; j++)
                    C[i][j] += A[i][k] * B[k][j];
        return C;
    }

    // Transformer forward pass FLOP estimate: O(L * S^2 * D) dominant
    static long transformerFlops(int L, int S, int D, int V) {
        long attn  = 2L * S * S * D;
        long ffn   = 2L * S * D * (4 * D) * 2;
        long unemb = (long) S * D * V;
        return L * (attn + ffn) + unemb;
    }

    public static void main(String[] args) {
        System.out.println("Naive matmul — O(n^3), ratio ~8 when n doubles:");
        double prev = -1;
        for (int n : new int[]{32, 64, 128}) {
            double[][] A = new double[n][n], B = new double[n][n];
            for (int i=0;i<n;i++) for (int j=0;j<n;j++) { A[i][j]=Math.random(); B[i][j]=Math.random(); }
            long t0 = System.nanoTime();
            matmul(A, B, n);
            double t = (System.nanoTime() - t0) / 1e6;
            if (prev > 0) System.out.printf("  n=%3d  t=%.1fms  ratio=x%.1f%n", n, t, t/prev);
            else           System.out.printf("  n=%3d  t=%.1fms%n", n, t);
            prev = t;
        }
        System.out.printf("%nGPT-2 forward FLOPs ~ %.2e%n",
                (double) transformerFlops(12, 1024, 768, 50257));
    }
}

package main

import (
	"fmt"
	"math/rand"
	"time"
)

// Naive matrix multiply: O(n^3)
func matmul(A, B [][]float64, n int) [][]float64 {
	C := make([][]float64, n)
	for i := range C { C[i] = make([]float64, n) }
	for i := 0; i < n; i++ {
		for k := 0; k < n; k++ {
			for j := 0; j < n; j++ {
				C[i][j] += A[i][k] * B[k][j]
			}
		}
	}
	return C
}

// Transformer FLOP estimate: dominant term O(L*S^2*D)
func transformerFlops(L, S, D, V int) int64 {
	attn  := int64(2 * S * S * D)
	ffn   := int64(2 * S * D * (4 * D) * 2)
	unemb := int64(S) * int64(D) * int64(V)
	return int64(L)*(attn+ffn) + unemb
}

func main() {
	fmt.Println("Naive matmul — O(n^3), ratio ~8 when n doubles:")
	var prev float64 = -1
	for _, n := range []int{32, 64, 128} {
		A := make([][]float64, n)
		B := make([][]float64, n)
		for i := range A {
			A[i] = make([]float64, n)
			B[i] = make([]float64, n)
			for j := range A[i] { A[i][j] = rand.Float64(); B[i][j] = rand.Float64() }
		}
		t0 := time.Now()
		matmul(A, B, n)
		t := time.Since(t0).Seconds() * 1000
		if prev > 0 {
			fmt.Printf("  n=%3d  t=%.1fms  ratio=x%.1f\n", n, t, t/prev)
		} else {
			fmt.Printf("  n=%3d  t=%.1fms\n", n, t)
		}
		prev = t
	}
	flops := transformerFlops(12, 1024, 768, 50257)
	fmt.Printf("\nGPT-2 forward FLOPs ~ %.2e\n", float64(flops))
}

18. Summary

Complexity is the science of how cost scales with input size. Big-O is its language.
$f(n) = O(g(n))$ means there exist $c$ and $n_0$ with $f(n) \le c\, g(n)$ for all $n \ge n_0$. Big-O is an upper bound.
$\Omega$ is a lower bound, $\Theta$ is a tight bound, and little-$o$ / little-$\omega$ are strict versions.
The growth-rate hierarchy — $1 \prec \log n \prec n \prec n \log n \prec n^2 \prec 2^n \prec n!$ — is invariant; the order doesn't change as $n$ grows.
$\log n$ shows up whenever you can throw away a constant fraction of your input per step. The base doesn't matter in Big-O.
To analyze code: identify the dominant operation, count it as a function of $n$, drop constants and lower-order terms.
Recurrence relations describe recursive algorithms; the master theorem cracks the common $T(n) = a T(n/b) + f(n)$ form.
Amortized analysis averages cost over a sequence of operations. Dynamic array doubling gives $O(1)$ amortized append.
Worst case is the default, average case requires a distribution, best case is usually useless. Quicksort is the canonical reminder that all three exist.
Lower bounds (like the comparison-sort $\Omega(n \log n)$ via decision trees) tell you what no algorithm can beat.
In practice, constants and cache effects matter a lot. Use Big-O to rule out the obviously bad and a profiler to choose among the rest.
In ML, training is $\Theta(N D)$ — well-behaved. Attention is $\Theta(L^2)$ in sequence length — the current wall.

Complexity & Big-O

1. Why complexity

2. The rules of the game

3. Big-O, formally

4. Big-Theta and Big-Omega

5. Little-o and little-omega

6. Growth-rate plotter

7. Common growth classes

8. Why log n appears

9. Computing complexity of code

10. Recurrence relations

10.1 Iteration / unrolling

10.2 Recursion tree

10.3 Master theorem

11. Amortized analysis

11.1 The dynamic array — the canonical example

11.2 Three methods

12. Space complexity

13. Best, average, worst

14. Lower bounds

14.1 Decision trees

15. Practical considerations

16. Empirical benchmark

17. Connection to ML

18. Summary

Further reading

See also

Discrete math

Calculus

Algorithms

Theoretical CS

Foundation models