Discrete Mathematics
Sets, logic, graphs, counting, and proofs — the grammar that every CS algorithm, database query, network protocol, and correctness proof is written in. If calculus is the math of continuous change, discrete math is the math of things you can point at one by one.
1. Why discrete math
Computers can't really deal with continuous things. A float that looks like $0.1$ is actually one of a finite set of representable patterns. A pixel sits on an integer grid. A database row either exists or it doesn't. Every object a computer touches is discrete — finite, countable, separable from its neighbors by a nonzero gap.
Discrete math is the body of math designed for that world. It's where you learn to:
- Count things precisely — "how many ways can a load balancer assign 100 requests to 5 servers?"
- Reason with truth and falsity — "if the user is logged in and is an admin, then show the button."
- Describe structures made of connections — "who follows whom, what depends on what, which page links to which."
- Prove a program is correct, not just test it on three inputs and hope.
If you've ever felt that programming is unreasonable because everything runs on vibes and test cases, discrete math is the subject that teaches you the alternative. Nothing on this page requires calculus. Nothing requires you to remember trigonometry. You need to know what a variable is and what it means to add and multiply. That's it.
The five movements you'll work through:
- Sets. Collections of things.
- Logic. Rules for combining true/false statements.
- Relations and functions. Ways of pairing things up.
- Counting. How many ways can something happen.
- Graphs. Things connected by lines.
Everything else — recurrences, number theory, Boolean algebra — is built on top of those five.
2. Vocabulary cheat-sheet
Scan this once. Come back to it whenever you see a symbol you don't recognise.
- $\in$ — "is an element of." $3 \in \{1,2,3\}$.
- $\subseteq$ — "is a subset of." $\{1,2\} \subseteq \{1,2,3\}$.
- $\cup$ — union. $A \cup B$ is everything in either $A$ or $B$.
- $\cap$ — intersection. $A \cap B$ is everything in both.
- $\emptyset$ — the empty set, the set with no elements.
- $\wedge, \vee, \neg$ — logical AND, OR, NOT.
- $\Rightarrow, \Leftrightarrow$ — "implies," "if and only if."
- $\forall, \exists$ — "for all," "there exists."
- $\binom{n}{k}$ — "n choose k," the number of $k$-element subsets of an $n$-element set.
- $n!$ — factorial, $n \cdot (n-1) \cdot (n-2) \cdots 1$.
- $\bmod$ — remainder after division. $7 \bmod 3 = 1$.
- $|A|$ — size (cardinality) of a set. $|\{a,b,c\}| = 3$.
3. Sets
A set is an unordered collection of distinct things. Those things are called its elements. There are two ways to specify a set. By listing:
A set by listing
- $A$
- The name we're giving this set. Sets are usually named with capital letters.
- $\{\ldots\}$
- Curly braces enclose the list of elements. Order doesn't matter — $\{1,2,3,4\} = \{4,3,2,1\}$ — and duplicates don't count — $\{1,2,2,3\} = \{1,2,3\}$.
- $1, 2, 3, 4$
- The four elements of $A$. Each is a separate "thing" in the set.
Analogy. Think of a set like a bag of marbles: you care about which marbles are in it, not the order they were dropped in, and having the same marble twice is meaningless — it's still the same marble.
Or by description (set-builder notation), which you read as "the set of all $x$ such that ...":
Set-builder notation
- $B$
- The set we're defining. It ends up being $\{1,2,3,4,5,6,7,8,9\}$.
- $x$
- A placeholder for "a typical element." We're saying: "$x$ is in $B$ exactly when the condition on the right holds."
- $\mathbb{Z}$
- The set of all integers: $\ldots, -2, -1, 0, 1, 2, \ldots$ The blackboard-bold Z stands for German Zahlen, "numbers."
- $\mid$
- Read as "such that." Separates the variable being defined (left) from the conditions it must meet (right).
- $x > 0 \text{ and } x < 10$
- The conditions $x$ must satisfy to belong to $B$. Any integer strictly between 0 and 10.
Why you care. Set-builder notation is the math cousin of a database WHERE clause. "All users such that their signup date is after 2024 and their plan is pro." Same idea.
Standard number sets
These names recur everywhere. Memorise them once:
- $\mathbb{N}$ — natural numbers. Depending on the author, either $\{0, 1, 2, \ldots\}$ or $\{1, 2, 3, \ldots\}$. In CS we usually include 0.
- $\mathbb{Z}$ — integers. All of $\mathbb{N}$ plus the negatives.
- $\mathbb{Q}$ — rationals. Fractions $p/q$ with $p, q \in \mathbb{Z}$ and $q \neq 0$.
- $\mathbb{R}$ — real numbers. The full number line.
- $\mathbb{C}$ — complex numbers. Of the form $a + bi$ where $i^2 = -1$.
Operations on sets
Union
- $\cup$
- Union. The set of elements in either $A$, or $B$, or both.
- $x \in A$
- "$x$ is an element of $A$."
- or
- Inclusive "or" — not exclusive. If $x$ is in both, it's still in the union (exactly once, because sets don't have duplicates).
Example. If $A = \{1,2,3\}$ and $B = \{3,4,5\}$, then $A \cup B = \{1,2,3,4,5\}$.
Intersection
- $\cap$
- Intersection. The elements in both $A$ and $B$ at once.
- and
- Logical AND — both conditions must hold.
Example. With $A$ and $B$ as above, $A \cap B = \{3\}$. If two sets share no elements, $A \cap B = \emptyset$, and we call them disjoint.
Set difference
- $\setminus$
- Set difference. Elements in $A$ but not in $B$. Some authors write $A - B$.
- $x \notin B$
- "$x$ is not an element of $B$."
Example. $\{1,2,3\} \setminus \{3,4,5\} = \{1,2\}$. Used constantly when writing "new users this month" or "unprocessed items."
If there's a fixed universe $U$ that we're working inside of, the complement of $A$ is everything in the universe but not in $A$: $A^c = U \setminus A$.
Subsets and power sets
$A$ is a subset of $B$, written $A \subseteq B$, if every element of $A$ is also in $B$. The empty set $\emptyset$ is a subset of every set, including itself, and every set is a subset of itself. If $A \subseteq B$ but $A \neq B$, we write $A \subsetneq B$ and call $A$ a proper subset.
The power set $\mathcal{P}(A)$ is the set of all subsets of $A$:
Power set
- $\mathcal{P}(A)$
- The power set of $A$. A set whose elements are themselves sets.
- $S$
- A typical element of $\mathcal{P}(A)$ — that is, any subset of $A$.
- $S \subseteq A$
- The condition: $S$ is a subset of $A$.
Size fact. If $A$ has $n$ elements, $\mathcal{P}(A)$ has $2^n$ elements — one subset per "yes/no" choice per element. If $A = \{a,b,c\}$ then $\mathcal{P}(A) = \{\emptyset, \{a\}, \{b\}, \{c\}, \{a,b\}, \{a,c\}, \{b,c\}, \{a,b,c\}\}$ — that's $2^3 = 8$.
Cartesian product
Cartesian product
- $A \times B$
- The Cartesian product. A set of ordered pairs.
- $(a, b)$
- An ordered pair. Unlike sets, order matters: $(a,b) \neq (b,a)$ in general.
- $a \in A, b \in B$
- The first coordinate comes from $A$, the second from $B$.
Where you've seen this. $\mathbb{R} \times \mathbb{R} = \mathbb{R}^2$ is the 2D plane. A database JOIN is literally a filtered Cartesian product. If $|A| = m$ and $|B| = n$, then $|A \times B| = mn$.
4. Propositional logic
A proposition is a statement that's either true or false — no in-between. "It's raining." "$2 + 2 = 5$." "The user is logged in." Not propositions: questions, commands, opinions, or anything whose truth isn't well-defined.
We write propositions with lowercase letters: $p$, $q$, $r$. And we combine them with connectives.
The connectives
- Negation: $\neg p$ ("not $p$"). True when $p$ is false.
- Conjunction: $p \wedge q$ ("$p$ and $q$"). True only when both are true.
- Disjunction: $p \vee q$ ("$p$ or $q$"). True when at least one is true. This is inclusive or — if both are true, the result is still true.
- Implication: $p \Rightarrow q$ ("if $p$ then $q$"). False only when $p$ is true and $q$ is false. We'll explain the weird case below.
- Biconditional: $p \Leftrightarrow q$ ("$p$ if and only if $q$," often "$p$ iff $q$"). True when $p$ and $q$ have the same truth value.
Truth tables
The only tool you need to evaluate any logical expression. A truth table lists every combination of truth values for the input propositions, then fills in the output column by column. Here's the full table for the binary connectives:
| $p$ | $q$ | $p \wedge q$ | $p \vee q$ | $p \Rightarrow q$ | $p \Leftrightarrow q$ |
|---|---|---|---|---|---|
| T | T | T | T | T | T |
| T | F | F | T | F | F |
| F | T | F | T | T | F |
| F | F | F | F | T | T |
Most people get tripped up by two rows:
Row 4: $F \vee F = F$. Obvious, but remember: inclusive or still needs at least one input to be true.
Rows 3 and 4: $F \Rightarrow q$ is always T. This one always trips up beginners. The implication $p \Rightarrow q$ is only false when you promised $q$ would follow from $p$ and then $p$ happened but $q$ didn't. If $p$ never happened in the first place, you never broke a promise. "If it rains, I'll bring an umbrella" is not falsified by a sunny day.
Logical equivalence
Two formulas are logically equivalent, written $\equiv$, if they have the same truth value for every assignment of their variables. You check this by building a truth table for each and comparing columns.
The most useful equivalences — worth memorising — are De Morgan's laws:
De Morgan's laws
- $\neg$
- Logical NOT.
- $\wedge$
- Logical AND.
- $\vee$
- Logical OR.
- $\equiv$
- Logical equivalence — both sides are true under exactly the same assignments.
In plain English. "It's not the case that it's both raining and cold" means "it's not raining or it's not cold." And "it's not the case that it's either raining or cold" means "it's not raining and not cold." The NOT distributes over the connective, and the connective flips along the way.
Why you care. Every time you refactor an if condition by pushing a ! inside, you're applying De Morgan. These laws are also the foundation of simplifying circuits in hardware design — a chip designer who flipped the equivalence can save a transistor.
Other important equivalences:
- Distributivity: $p \wedge (q \vee r) \equiv (p \wedge q) \vee (p \wedge r)$, and symmetrically for $\vee$ over $\wedge$.
- Implication as disjunction: $p \Rightarrow q \equiv \neg p \vee q$. Check it against the truth table — it matches.
- Contrapositive: $p \Rightarrow q \equiv \neg q \Rightarrow \neg p$. This is the single most useful rewriting in proofs.
- Double negation: $\neg \neg p \equiv p$.
Tautologies and contradictions
A formula that's true for every assignment of its variables is a tautology. For example, $p \vee \neg p$ — "either it's raining or it isn't." A formula that's false for every assignment is a contradiction. For example, $p \wedge \neg p$.
A formula that's true under at least one assignment is satisfiable. The problem of "given this formula, is it satisfiable?" is SAT — the original NP-complete problem, and the thing every modern hardware verifier ends up solving internally.
5. Predicate logic
Propositional logic can only talk about whole statements. It can't say "everyone in this room has a laptop" without inventing one proposition per person. That's where predicates and quantifiers come in.
A predicate is a statement with one or more blanks. "$x$ is even" is a predicate, written $E(x)$. Plug in a value and you get a proposition. $E(4)$ is true, $E(5)$ is false.
Quantifiers
A quantifier says how a predicate should be evaluated over a whole set. There are two:
Quantifiers
- $\forall$
- The universal quantifier, "for all." The statement $\forall x \in S,\ P(x)$ says every element of $S$ satisfies $P$.
- $\exists$
- The existential quantifier, "there exists." The statement $\exists x \in S,\ P(x)$ says at least one element of $S$ satisfies $P$.
- $x$
- The variable being quantified — a placeholder ranging over $S$.
- $S$
- The domain of discourse. The set we're ranging over. Without this, the statement is meaningless: "for all $x$" over what?
- $P(x)$
- The predicate — the condition that may or may not hold for each $x$.
Read them aloud. $\forall$ is "for every." $\exists$ is "there is some." If you can rephrase a specification in English with "for every" and "there is some," you can translate it directly into predicate logic. Database ORMs, type systems, and formal verification all live on this vocabulary.
Negation rules
Negating a quantified statement flips the quantifier and negates the predicate:
Negation of quantifiers
- $\forall$ flips to $\exists$
- The negation of "all swans are white" is "there's at least one non-white swan" — not "no swans are white."
- $\exists$ flips to $\forall$
- The negation of "there's some bug" is "every piece of code is bug-free."
- $\neg P(x)$
- The condition itself also gets negated, because once you've flipped the outer quantifier, you need the inner statement to be the opposite.
Why you care. Every "not all" and "not exists" sentence your coworker writes in a spec is a quantifier negation. If you can't do these flips automatically, you'll argue about bugs that don't exist.
Worked example
Take the statement "every student passed every exam." With $S$ the set of students and $E$ the set of exams, we'd write:
Double universal
- $\forall s \in S$
- For every student $s$ in the set of students.
- $\forall e \in E$
- ...and for every exam $e$ in the set of exams...
- $\text{Passed}(s, e)$
- ...the predicate "student $s$ passed exam $e$" holds.
Negation. "Not everyone passed everything" becomes $\exists s \in S,\ \exists e \in E,\ \neg\text{Passed}(s,e)$ — "there's some student $s$ and some exam $e$ such that $s$ did not pass $e$." Notice how each $\forall$ flipped to $\exists$ and the predicate picked up a $\neg$.
Order matters with nested quantifiers of different types. "$\forall x,\ \exists y,\ y > x$" says every number has something bigger than it — true in $\mathbb{N}$. "$\exists y,\ \forall x,\ y > x$" says there's a single number bigger than everything — false in $\mathbb{N}$. The words look similar; the meanings are not.
6. Proof techniques
A proof is a chain of logical steps leading from stuff you're given (or stuff you've already proved) to the thing you want. Four patterns do 95% of the work.
Direct proof
Assume the hypothesis, march forward, arrive at the conclusion. "If $n$ is even, then $n^2$ is even." Assume $n$ is even, so $n = 2k$ for some integer $k$. Then $n^2 = (2k)^2 = 4k^2 = 2(2k^2)$, which is twice an integer, hence even. Done.
Contrapositive
To prove $p \Rightarrow q$, prove $\neg q \Rightarrow \neg p$ instead. These are logically equivalent (we saw this earlier), but sometimes the contrapositive is much easier to attack.
"If $n^2$ is even, then $n$ is even." Going direct is awkward. The contrapositive — "if $n$ is odd, then $n^2$ is odd" — is immediate: $n = 2k+1$, so $n^2 = 4k^2 + 4k + 1 = 2(2k^2 + 2k) + 1$, odd. Done.
Proof by contradiction
Assume the opposite of what you want to prove. Derive a contradiction (something both true and false, or something that contradicts a known fact). Conclude that your assumption must have been wrong.
Classic example. $\sqrt{2}$ is irrational. Assume for contradiction that $\sqrt{2} = p/q$ for integers $p$ and $q$ with no common factor. Squaring: $2 = p^2/q^2$, so $p^2 = 2q^2$. That means $p^2$ is even, so $p$ is even (by the previous proof), so $p = 2m$, so $p^2 = 4m^2 = 2q^2$, so $q^2 = 2m^2$, so $q$ is even too. But we said $p$ and $q$ had no common factor, and now they both have 2 as a factor. Contradiction. So $\sqrt{2}$ is not rational.
Proof by induction
The workhorse for any statement about "all natural numbers $n$." Three parts:
- Base case. Show the statement holds for $n = 0$ (or $n = 1$, whichever is your starting value).
- Inductive step. Assume it holds for some arbitrary $n = k$. Show it then holds for $n = k+1$.
- Conclude. Since it holds for the base case, it holds for the next one, and the next one, and so on — it holds for every $n$.
The classic. Prove that $1 + 2 + 3 + \cdots + n = \dfrac{n(n+1)}{2}$ for every positive integer $n$.
Gauss's sum
- $\sum_{i=1}^{n} i$
- "Sum of $i$ from $i=1$ to $i=n$" — that is, $1 + 2 + 3 + \cdots + n$. The big sigma is just compact notation for "add these things up."
- $n$
- The upper bound. A natural number we'll induct on.
- $\dfrac{n(n+1)}{2}$
- The closed-form value. No sum, no loop — just multiply and divide.
Legend. Gauss supposedly found this aged 8 when a teacher asked the class to add $1$ through $100$ as busywork. Gauss noticed that pairing $1 + 100$, $2 + 99$, $3 + 98$, all give 101, and there are 50 pairs — so the answer is $50 \cdot 101 = 5050$. The formula generalises that trick.
Base case ($n=1$). Left side: $1$. Right side: $\dfrac{1 \cdot 2}{2} = 1$. Match.
Inductive step. Assume the formula holds for $n = k$:
Inductive hypothesis
- $k$
- An arbitrary positive integer for which we're assuming the formula is already true. This assumption is called the inductive hypothesis.
You don't have to prove this line. You get to assume it — that's the whole point of induction.
We want to show the formula holds for $n = k+1$, i.e., that $1 + 2 + \cdots + k + (k+1) = \dfrac{(k+1)(k+2)}{2}$. Starting from the left:
Inductive step
- $\underbrace{(1+\cdots+k)}_{=k(k+1)/2}$
- Applying the inductive hypothesis: we replace the sum $1 + \cdots + k$ with the closed form that we're assuming works.
- $(k+1)$
- The new term we're adding — the one we need to account for to go from $n=k$ to $n=k+1$.
- $\dfrac{k(k+1) + 2(k+1)}{2}$
- Common denominator — we multiplied $(k+1)$ top and bottom by 2 to match.
- $\dfrac{(k+1)(k+2)}{2}$
- Factor out $(k+1)$. This is exactly the formula at $n = k+1$. Done.
The induction metaphor. Imagine an infinite row of dominoes. The base case is "I've knocked over the first one." The inductive step is "whenever domino $k$ falls, it hits domino $k+1$." Together, every domino falls.
Combined with the base case, by induction the formula holds for all $n \geq 1$. $\blacksquare$
7. Relations and functions
A relation between two sets $A$ and $B$ is just a subset of $A \times B$ — a set of pairs. The pair $(a,b)$ being in the relation means "$a$ is related to $b$." Written $a \sim b$ or $a R b$.
"$x$ is less than $y$" is a relation on $\mathbb{Z} \times \mathbb{Z}$. "$u$ follows $v$" is a relation on users. "$x$ divides $y$" is a relation on the integers. A database table with two foreign keys is a relation.
Properties of relations on a single set
When $A = B$, we can ask whether a relation has the following properties:
- Reflexive: $a \sim a$ for every $a$. ("Everybody is equal to themselves.")
- Symmetric: if $a \sim b$ then $b \sim a$. ("Friendship on a platform without directed follows.")
- Antisymmetric: if $a \sim b$ and $b \sim a$ then $a = b$. ("Less-than-or-equal: two things only cross each other if they're equal.")
- Transitive: if $a \sim b$ and $b \sim c$ then $a \sim c$. ("Ancestor of" is transitive; "parent of" is not.)
Equivalence relations
A relation that's reflexive, symmetric, and transitive is called an equivalence relation. These are the ones that partition a set into disjoint blocks — each element belongs to exactly one block of things that are equivalent to it.
"Has the same remainder when divided by 5" is an equivalence relation on the integers. It partitions $\mathbb{Z}$ into 5 blocks, one for each possible remainder: $\{\ldots, -10, -5, 0, 5, 10, \ldots\}$, $\{\ldots, -9, -4, 1, 6, 11, \ldots\}$, and so on. This is the basis of modular arithmetic, which we'll get to in §13.
Partial orders
A relation that's reflexive, antisymmetric, and transitive is a partial order. "$\leq$" on $\mathbb{Z}$ is one. "$\subseteq$" on sets is another. It's called partial because two elements might not be comparable at all — for example, $\{1,2\}$ and $\{2,3\}$ can't be put in an order under $\subseteq$, neither is a subset of the other. When every pair is comparable, we call it a total order.
Functions
A function $f: A \to B$ is a special kind of relation: each element of $A$ is related to exactly one element of $B$. We write $f(a) = b$ to mean "the unique $b$ that $a$ maps to."
- $A$ is the domain. Every element of the domain has to map somewhere.
- $B$ is the codomain. Candidates to be mapped to.
- The image (or range) is $\{f(a) \mid a \in A\}$ — the actual outputs that appear. Not always all of $B$.
Three special kinds:
- Injective (one-to-one). Different inputs go to different outputs. $f(a_1) = f(a_2) \Rightarrow a_1 = a_2$. Nothing in $B$ gets hit twice.
- Surjective (onto). Every element of $B$ is hit by something. The image is $B$.
- Bijective. Both injective and surjective. A perfect pairing between $A$ and $B$. This is the rigorous notion of "the two sets have the same size," and it's how we make sense of comparing infinite sets — if there's a bijection from $\mathbb{N}$ to $\mathbb{Q}$ (there is), then $\mathbb{N}$ and $\mathbb{Q}$ have the "same" infinity.
Programmer's note. A Python dict is a function from its keys to its values. An injective function is a hash where no two keys map to the same value. A surjective function "covers" its codomain. A bijection is invertible — you can run it backwards to recover the input from the output.
8. Counting & combinatorics
Combinatorics is the subfield of discrete math that asks "how many ways?" It's the foundation of probability, where "how many ways" turns into "what fraction of the possible worlds," and of algorithm analysis, where "how many steps" turns into big-$O$.
Two fundamental rules
Rule of sum. If there are $m$ ways to do thing A and $n$ ways to do thing B, and you must do exactly one of them, there are $m + n$ ways total (assuming the options don't overlap).
Rule of product. If there are $m$ ways to do A and $n$ ways to do B, and you do both independently, there are $m \cdot n$ ways in sequence.
Example: picking a password with one letter (26 options) followed by one digit (10 options) gives $26 \cdot 10 = 260$ possibilities. Picking either a letter or a digit gives $26 + 10 = 36$.
Permutations
A permutation of a set is an ordering of its elements. The number of ways to arrange all $n$ elements in a line is $n$ factorial:
Factorial
- $n!$
- "$n$ factorial" — the product of all positive integers up to $n$. By convention $0! = 1$, an empty product (there's exactly one way to arrange zero things).
- $n$
- A non-negative integer, the number of things you're arranging.
Why multiply. There are $n$ choices for the first slot, then $n-1$ for the second (you can't reuse the one you picked), then $n-2$, etc. Rule of product gives $n!$. For 10 items, that's already $10! = 3{,}628{,}800$. Factorials explode fast.
If you only want to pick and arrange $k$ out of $n$, you get the partial permutation count:
Permutations of $n$ taken $k$ at a time
- $P(n, k)$
- The number of ordered arrangements of $k$ distinct items chosen from a pool of $n$. Sometimes written $_{n}P_k$ or $n^{\underline{k}}$ (the "falling factorial").
- $n$
- Size of the pool.
- $k$
- How many you're picking (and arranging), with $0 \leq k \leq n$.
- $(n-k)!$
- The tail we're cutting off — it cancels all the factors from $(n-k)$ down to $1$ that we don't use.
Example. How many ways to award gold, silver, and bronze from 8 runners? $P(8, 3) = 8 \cdot 7 \cdot 6 = 336$.
Combinations
When order does not matter — you just want to pick $k$ items out of $n$ as an unordered set — you get combinations:
$n$ choose $k$
- $\binom{n}{k}$
- "$n$ choose $k$." The number of $k$-element subsets of an $n$-element set. This is the single most common object in combinatorics.
- $n!$
- All orderings of the $n$ items.
- $k!$
- Internal orderings of the chosen $k$ that we're not distinguishing.
- $(n-k)!$
- Internal orderings of the unchosen $n - k$ that we're also not distinguishing.
Where it comes from. We divide $P(n,k)$ by $k!$ because picking $\{A, B, C\}$ in any of 3! = 6 orders should count as one combination, not six. Concrete: $\binom{5}{2} = 10$ ways to pick 2 out of 5. Check: $\{AB, AC, AD, AE, BC, BD, BE, CD, CE, DE\}$.
Combinations satisfy a simple symmetry: $\binom{n}{k} = \binom{n}{n-k}$ — picking what to include is the same as picking what to exclude. And they obey Pascal's identity:
Pascal's identity
- $\binom{n}{k}$
- Number of $k$-subsets of $\{1, 2, \ldots, n\}$.
- $\binom{n-1}{k-1}$
- Number of those subsets that include element $n$ — pick $k-1$ more from the remaining $n-1$.
- $\binom{n-1}{k}$
- Number of those subsets that exclude element $n$ — pick all $k$ from the remaining $n-1$.
Why you care. This is how Pascal's triangle is built — each entry is the sum of the two above it. It's also the classic example of a recurrence you'd solve with dynamic programming.
The binomial theorem
Combinations show up as the coefficients when you expand $(x+y)^n$:
Binomial theorem
- $(x+y)^n$
- An algebraic expression raised to the $n$-th power.
- $\sum_{k=0}^{n}$
- Sum from $k = 0$ to $k = n$ — $n+1$ terms total.
- $\binom{n}{k}$
- The binomial coefficient — how many ways to choose $k$ copies of $y$ (and the remaining $n-k$ copies of $x$) when expanding the product $(x+y)(x+y)\cdots(x+y)$ with $n$ factors.
- $x^{n-k} y^{k}$
- The term you get when you've chosen $k$ factors to contribute $y$ and the other $n-k$ to contribute $x$.
Intuition. Every term in the expansion is one way of picking $y$ or $x$ from each of the $n$ factors. The number of ways to get exactly $k$ $y$'s is $\binom{n}{k}$. That's where the coefficients come from — it's a counting fact, not algebra magic. Setting $x = y = 1$ gives $2^n = \sum_k \binom{n}{k}$, which is another proof that $|\mathcal{P}(A)| = 2^{|A|}$.
Pigeonhole principle
The silliest-looking theorem in math, and one of the most useful:
It sounds like a tautology. It is. But it's shocking how often it's the argument you need. Examples: in any group of 367 people, at least two share a birthday (366 possible birthdays). Any list of $n+1$ distinct integers from $\{1, 2, \ldots, 2n\}$ contains two that sum to $2n+1$. Any sequence of $n^2 + 1$ distinct numbers has a monotonic subsequence of length $n+1$ (harder — Erdős-Szekeres).
Inclusion-exclusion
To count the size of a union, you add the parts and subtract the double-counted overlaps:
Inclusion-exclusion (two sets)
- $|A \cup B|$
- The number of elements in $A$ or $B$ (or both).
- $|A| + |B|$
- Naive sum. Double-counts anything in $A \cap B$ once extra.
- $|A \cap B|$
- The overlap — the thing we over-counted. Subtract it.
Three sets. It generalises: $|A \cup B \cup C| = |A|+|B|+|C| - |A \cap B| - |A \cap C| - |B \cap C| + |A \cap B \cap C|$. Alternating signs. This is the machine you use when counting "things with at least one of these properties."
9. Interactive combinatorics lab
Drag the sliders to vary $n$ and $k$. The figure draws a tiny Pascal's triangle with the current entry highlighted, and reports $P(n,k)$, $C(n,k)$, and $n!$. Use it to see how fast these numbers grow.
Choose n and k to see the binomial coefficient highlighted in Pascal's triangle.
10. Graph theory
A graph is a set of things with lines between some of them. Formally:
A graph
- $G$
- The graph itself.
- $V$
- The set of vertices (or nodes). The "things."
- $E$
- The set of edges. Each edge is a pair of vertices. In an undirected graph, edges are unordered pairs $\{u, v\}$. In a directed graph (digraph), they're ordered pairs $(u, v)$ — there's a source and a target.
Examples you already know. A social network (people + friendships). A road map (intersections + streets). A web (pages + links). A codebase dependency graph (modules + imports). Git history (commits + parent pointers). Every time you've said "which is connected to which," you've been talking about a graph.
Variants
- Undirected. Edges are symmetric. Friendship.
- Directed. Edges have a direction. Who follows whom. Web links.
- Weighted. Each edge has a number attached — a distance, cost, or capacity. Road distance. Network latency.
- Simple. No self-loops (edges from a vertex to itself) and no duplicate edges.
- Multigraph. Loops and duplicates allowed.
Representations
Two standard ways to store a graph in memory:
Adjacency matrix. An $n \times n$ matrix $M$ with $M_{ij} = 1$ if there's an edge from vertex $i$ to vertex $j$, and $0$ otherwise (or the edge weight instead of 1). Cost: $O(n^2)$ memory, even for sparse graphs. Lookup of a single edge is $O(1)$. Great when $n$ is small and the graph is dense.
Adjacency list. For each vertex $v$, a list of its neighbours. Cost: $O(|V| + |E|)$ memory. Iterating the neighbours of $v$ is $O(\deg(v))$. Great when $n$ is big and the graph is sparse — almost always the right choice in practice.
Basic vocabulary
- Degree of vertex $v$, written $\deg(v)$: the number of edges touching $v$. In a directed graph you split into $\deg^-(v)$ (in-degree) and $\deg^+(v)$ (out-degree).
- Path: a sequence of vertices where consecutive ones are connected by edges. A simple path has no repeated vertices.
- Cycle: a path that starts and ends at the same vertex.
- Connected: there's a path between every pair of vertices (in the undirected case). A disconnected graph splits into connected components.
- Tree: a connected, acyclic undirected graph.
- Bipartite: vertices split into two groups $U$ and $W$, with all edges going between $U$ and $W$ (never within). Models two-sided matching — students and schools, jobs and machines.
Trees
Trees are a special case of graphs so important that we give them their own name and theorems. A rooted tree picks one vertex as the root; the rest hang down in parent-child relationships. Every file system you've ever used is a rooted tree. Every DOM tree. Every AST in a compiler. Every HuffMan code.
A tree on $n$ vertices has exactly $n-1$ edges. Add one more edge and you must create a cycle. Remove one edge and the tree disconnects. That's the "just barely connected" characterisation.
11. Important graph facts
Handshake lemma
Handshake lemma
- $\sum_{v \in V}$
- Sum over every vertex in the graph.
- $\deg(v)$
- The degree of vertex $v$ — how many edges are incident to it.
- $|E|$
- The number of edges in the graph.
Why? Each edge has two endpoints. When you sum degrees, every edge gets counted twice — once for each end. So the sum of all degrees is exactly $2|E|$. Named for the party trick: in any room, the total number of "hands shaken so far" counts each handshake twice. A quick corollary — the number of odd-degree vertices in any graph is even.
Trees have $n-1$ edges
Any tree on $n$ vertices has $n - 1$ edges. The proof is a one-liner by induction on $n$: the base case is trivial, and removing a leaf (a degree-1 vertex) of a tree on $n$ vertices leaves a tree on $n - 1$ vertices. By the inductive hypothesis that smaller tree has $n - 2$ edges; we removed 1, so the original had $n - 1$.
Euler's formula for planar graphs
A graph is planar if you can draw it on paper without any edges crossing. For a connected planar graph:
Euler's formula
- $V$
- Number of vertices.
- $E$
- Number of edges.
- $F$
- Number of faces — the regions the drawing carves the plane into, including the infinite "outside" region.
- $2$
- The constant. Always $2$, for any connected planar graph, no matter how complicated.
Try it. A triangle has $V = 3$, $E = 3$, $F = 2$ (inside and outside): $3 - 3 + 2 = 2$. A cube, drawn flat, has $V = 8$, $E = 12$, $F = 6$: $8 - 12 + 6 = 2$. Euler noticed this in 1758 and it's still the easiest way to prove facts like "you can't draw $K_5$ (the complete graph on 5 vertices) without edge crossings."
12. Recurrence relations
A recurrence relation defines a sequence by saying what each term is in terms of earlier ones. You've seen this every time you've written a recursive function.
The most famous recurrence:
Fibonacci
- $F_n$
- The $n$-th Fibonacci number. $F_0 = 0, F_1 = 1, F_2 = 1, F_3 = 2, F_4 = 3, F_5 = 5, F_6 = 8, \ldots$
- $F_{n-1} + F_{n-2}$
- The recurrence — each term is the sum of the two before it.
- $F_0 = 0, F_1 = 1$
- The initial conditions. Without them, the recurrence has infinitely many solutions.
Closed form. Recurrences can often be "solved" to a closed-form expression that computes $F_n$ directly. For Fibonacci the answer is $F_n = \frac{1}{\sqrt{5}} \left[ \left(\frac{1+\sqrt{5}}{2}\right)^n - \left(\frac{1-\sqrt{5}}{2}\right)^n \right]$ — yes, the golden ratio, which appears here for reasons having nothing to do with art.
Linear homogeneous recurrences
Fibonacci is a special case of a general pattern. A linear homogeneous recurrence with constant coefficients of order $k$ looks like:
Linear recurrence
- $a_n$
- The $n$-th term of the sequence you're defining.
- $c_1, c_2, \ldots, c_k$
- Constants (don't depend on $n$). "Coefficients."
- $a_{n-1}, \ldots, a_{n-k}$
- The previous $k$ terms. Linear means each appears to the first power; homogeneous means there's no extra constant or function of $n$ stuck on.
How you solve it. Guess $a_n = r^n$ for some $r$. Plug in, divide through by $r^{n-k}$, and you get a polynomial in $r$ — the characteristic equation. Its roots give the general form of the closed-form solution. This is the discrete-math analogue of solving a linear differential equation, and if you've seen that, the two procedures are basically identical.
Divide-and-conquer recurrences
Recurrences also show up in algorithm analysis, where they describe the running time of divide-and-conquer algorithms. A classic form:
Master-theorem form
- $T(n)$
- The time it takes to solve a problem of size $n$.
- $a$
- Number of subproblems we recurse into.
- $b$
- Factor by which each subproblem's size shrinks.
- $f(n)$
- The "splitting and combining" work we do outside of the recursive calls.
Example. Mergesort splits an array into two halves ($a=2, b=2$), sorts each recursively, and merges in $f(n) = O(n)$ time: $T(n) = 2 T(n/2) + O(n)$. The master theorem gives a closed-form for the solution (here: $O(n \log n)$). We leave the statement and proof to the complexity page.
13. Number theory basics
Number theory is the study of the integers — their divisibility, factorizations, primes, and modular behaviour. It looks very "pure" until you realise it's the mathematical foundation of every cryptosystem your browser uses to talk to banks.
Divisibility
We say $a$ divides $b$, written $a \mid b$, if $b = ak$ for some integer $k$. So $3 \mid 12$ (because $12 = 3 \cdot 4$), but $5 \nmid 12$. "Divides" is a relation on $\mathbb{Z}$, and on the positive integers it's a partial order.
GCD and the Euclidean algorithm
The greatest common divisor $\gcd(a, b)$ is the largest positive integer that divides both $a$ and $b$. Every first-year student is taught to find it by factoring both numbers — which doesn't scale past small numbers. The algorithm that does scale is over 2000 years old:
Euclidean algorithm
- $\gcd(a, b)$
- Greatest common divisor of $a$ and $b$.
- $a \bmod b$
- The remainder when $a$ is divided by $b$. E.g. $17 \bmod 5 = 2$.
- $\gcd(a, 0) = a$
- Base case. Every non-zero $a$ divides itself, and 0 is divisible by anything, so their gcd is $a$.
Why it works. Any common divisor of $a$ and $b$ is also a common divisor of $b$ and $a - kb$ for any integer $k$. Taking $k = \lfloor a/b \rfloor$ gives the remainder. So you replace the pair with a smaller pair and repeat. It converges in $O(\log(\min(a,b)))$ steps — shockingly fast.
Example. $\gcd(48, 18) = \gcd(18, 12) = \gcd(12, 6) = \gcd(6, 0) = 6$.
Primes
A prime is a positive integer $p > 1$ whose only positive divisors are $1$ and $p$ itself. Primes are the multiplicative atoms of the integers — the Fundamental Theorem of Arithmetic says every integer $n > 1$ has a unique factorisation into primes (up to order). There are infinitely many primes (Euclid's proof: if only finitely many, multiply them all together and add 1 — the result has no prime factor on your list).
Modular arithmetic
Arithmetic "mod $n$" throws away multiples of $n$:
Modular congruence
- $a \equiv b \pmod n$
- "$a$ is congruent to $b$ modulo $n$." They leave the same remainder when divided by $n$.
- $n \mid (a - b)$
- The definition: $n$ divides their difference. E.g. $17 \equiv 2 \pmod 5$ because $5 \mid 15$.
Clock arithmetic. The clock on your wall is $\mathbb{Z} / 12\mathbb{Z}$ — addition modulo 12. "5 hours after 9 o'clock" is $9 + 5 \equiv 2 \pmod{12}$, which is 2 o'clock. Every programming language's % operator is computing this.
The integers modulo $n$ form a finite set $\{0, 1, 2, \ldots, n-1\}$. You can add, subtract, and multiply within this set. When $n$ is prime, you can also divide — every non-zero element has a multiplicative inverse. This is the core fact that makes modular arithmetic into a field, and fields are where cryptography lives.
Tiny intro to RSA
RSA, the public-key cryptosystem at the heart of HTTPS, TLS, and PGP, works like this (very simplified):
- Pick two huge primes $p$ and $q$ — say, 1024 bits each. Let $n = pq$.
- Pick a small exponent $e$ (often 65537). Compute $d$ such that $ed \equiv 1 \pmod{(p-1)(q-1)}$ using the extended Euclidean algorithm.
- Publish $(n, e)$. Keep $d$ secret.
- To encrypt a message $m$: send $c = m^e \bmod n$. To decrypt: compute $c^d \bmod n = m$.
The security rests on one conjecture: factoring $n$ back into $p$ and $q$ is hard for numbers that big. Every theorem used in RSA — Fermat's little theorem, Euler's totient — is elementary discrete math you could prove in a weekend. The whole internet's secure channels are secured by theorems that were pure curiosities in 1640.
14. Boolean algebra
Propositional logic, rewritten in the language of algebra — with $0$ for false, $1$ for true, $+$ for OR, $\cdot$ for AND, and overline for NOT. George Boole introduced this in 1854. Claude Shannon noticed in his 1937 master's thesis that switches in an electrical circuit obeyed exactly these rules, and the entire field of digital logic design was born.
The laws you already know, in the new notation:
- Commutative: $a + b = b + a$, $\ a \cdot b = b \cdot a$.
- Associative: $(a + b) + c = a + (b + c)$, and the same for $\cdot$.
- Distributive: $a \cdot (b + c) = a \cdot b + a \cdot c$, and — this one is new if you're coming from regular algebra — $a + (b \cdot c) = (a+b) \cdot (a+c)$.
- Identity: $a + 0 = a$, $\ a \cdot 1 = a$.
- Complement: $a + \bar a = 1$, $\ a \cdot \bar a = 0$.
- De Morgan: $\overline{a + b} = \bar a \cdot \bar b$, $\ \overline{a \cdot b} = \bar a + \bar b$.
Every digital circuit you've ever touched — the ALU in your CPU, the address decoder in your RAM, the carry logic in your adder — is a Boolean expression rendered in silicon. Simplifying that expression using the laws above literally saves transistors, which saves area, heat, and cost. Tools like Karnaugh maps and Quine-McCluskey minimisation automate this, but they're mechanical rewrites using the identities above.
Propositional logic, Boolean algebra, and digital circuits are the same structure under three different notations. A truth table is a specification; a Boolean formula is a compact description; a circuit is a physical implementation. Going between them is the job description of every hardware designer.
15. Code
A tour of the standard discrete-math building blocks in Python: a graph class, breadth-first and depth-first traversal, the Euclidean GCD, and a few combinatorial helpers. Everything here is stdlib-only.
from collections import defaultdict, deque
class Graph:
# Adjacency-list representation — cheap for sparse graphs.
def __init__(self, directed=False):
self.adj = defaultdict(set)
self.directed = directed
def add_edge(self, u, v):
self.adj[u].add(v)
if not self.directed:
self.adj[v].add(u)
def neighbors(self, u):
return sorted(self.adj[u])
def bfs(self, start):
# Breadth-first: visits by distance from start. O(V + E).
seen = {start}
order = []
q = deque([start])
while q:
u = q.popleft()
order.append(u)
for v in self.neighbors(u):
if v not in seen:
seen.add(v)
q.append(v)
return order
def dfs(self, start):
# Depth-first: dives as deep as it can before backtracking. O(V + E).
seen, order = set(), []
def visit(u):
if u in seen: return
seen.add(u)
order.append(u)
for v in self.neighbors(u):
visit(v)
visit(start)
return order
# Example: a small social graph
g = Graph()
for u, v in [("a","b"), ("a","c"), ("b","d"),
("c","d"), ("d","e"), ("e","f")]:
g.add_edge(u, v)
print(g.bfs("a")) # → ['a', 'b', 'c', 'd', 'e', 'f']
print(g.dfs("a")) # → ['a', 'b', 'd', 'c', 'e', 'f']
def gcd(a, b):
# Euclidean algorithm — the original asymptotically fast algorithm, ~300 BCE.
while b:
a, b = b, a % b
return a
def extended_gcd(a, b):
# Returns (g, x, y) where g = gcd(a,b) and a*x + b*y = g.
# The x is what you need to invert a modulo b — the RSA key-generation step.
if b == 0:
return a, 1, 0
g, x1, y1 = extended_gcd(b, a % b)
return g, y1, x1 - (a // b) * y1
def modinv(a, n):
# Modular inverse: the x such that a*x ≡ 1 (mod n). Exists iff gcd(a,n) = 1.
g, x, _ = extended_gcd(a, n)
if g != 1:
raise ValueError(f"{a} has no inverse mod {n}")
return x % n
def is_prime(n):
# Trial division — fine up to ~10^12. For real crypto, use Miller-Rabin.
if n < 2: return False
if n % 2 == 0: return n == 2
i = 3
while i * i <= n:
if n % i == 0: return False
i += 2
return True
print(gcd(48, 18)) # 6
print(modinv(3, 11)) # 4 (because 3*4 = 12 ≡ 1 mod 11)
print([p for p in range(2, 30) if is_prime(p)])
# [2, 3, 5, 7, 11, 13, 17, 19, 23, 29]
from math import factorial
def perm(n, k):
# Ordered arrangements: P(n, k) = n! / (n-k)!
return factorial(n) // factorial(n - k)
def comb(n, k):
# Unordered choices: C(n, k) = n! / (k! (n-k)!)
return factorial(n) // (factorial(k) * factorial(n - k))
def pascal_row(n):
# nth row of Pascal's triangle via Pascal's identity — no factorials needed.
row = [1]
for k in range(n):
row.append(row[-1] * (n - k) // (k + 1))
return row
def subsets(xs):
# Yield every subset of xs — there are 2^|xs| of them.
if not xs:
yield []
return
head, *tail = xs
for s in subsets(tail):
yield s
yield [head] + s
print(perm(8, 3)) # 336
print(comb(5, 2)) # 10
print(pascal_row(6)) # [1, 6, 15, 20, 15, 6, 1]
print(list(subsets(["a","b","c"])))
# every subset of {a,b,c}
16. Connections to everything else
Discrete math isn't a subject you learn and put down. It's the language that every CS and ML page on this site is written in. The bridges worth walking:
Probability
Probability starts with counting. The probability of an event is (favourable outcomes) divided by (total outcomes), and both those are combinatorial questions. Every permutation and combination formula shows up again there, usually with a $1/n!$ in front.
Complexity
Every "how many operations does this algorithm do?" question is a counting problem. Recurrences, sums of geometric series, and big-$O$ analysis all live at the boundary between discrete math and CS.
Algorithms
Graph algorithms — BFS, DFS, Dijkstra, minimum spanning trees, topological sort — are a direct application of the graph vocabulary on this page. You can't read the algorithms chapter without knowing what a degree, a path, or a cycle is.
Theoretical CS
Proofs are how theoretical CS works. Every theorem about Turing machines, automata, and decidability is proved using the techniques in §6. If you skipped the proof section, you'll bounce off theoretical CS fast.
17. Further reading
Textbooks
- Kenneth H. Rosen — Discrete Mathematics and Its Applications. The standard undergraduate textbook. Encyclopedic, boringly reliable, covers everything on this page and more. If you want one book on your shelf, this is it.
- Ralph P. Grimaldi — Discrete and Combinatorial Mathematics: An Applied Introduction. A little friendlier than Rosen, strong on the combinatorics chapters.
- Ronald Graham, Donald Knuth, Oren Patashnik — Concrete Mathematics. The delightful, sometimes terrifying textbook that came out of Knuth's "Mathematics for Computer Science" course at Stanford. Heavy on summations, generating functions, and asymptotics. This is the book you read after Rosen if you want to actually become dangerous.
- Eric Lehman, F. Thomson Leighton, Albert R. Meyer — Mathematics for Computer Science. Free online from MIT OCW. Written for the MIT 6.042 course, emphasising proofs, probability, and graphs. Excellent if you prefer a CS-first framing.
- Oscar Levin — Discrete Mathematics: An Open Introduction. Free, under an open license. Shorter than Rosen, with interactive online exercises.
Papers and originals
- Euler (1736) — Solutio problematis ad geometriam situs pertinentis. The paper that started graph theory — the Seven Bridges of Königsberg.
- Boole (1854) — An Investigation of the Laws of Thought. Where Boolean algebra came from.
- Shannon (1937) — A Symbolic Analysis of Relay and Switching Circuits. Shannon's master's thesis — arguably the most influential master's thesis ever written. Boolean algebra meets hardware.
- Rivest, Shamir, Adleman (1978) — A Method for Obtaining Digital Signatures and Public-Key Cryptosystems. The original RSA paper. Short, readable, and the math on this page is enough to follow it.