Biochemistry

Biochemistry is organic chemistry running inside a cell, at body temperature, catalyzed by machines that can hit $10^{17}$-fold rate accelerations and single-molecule selectivity. Once you know the twenty amino acids, how they fold into proteins, what enzymes do to rate constants, and how electrons flow from food to ATP, every headline about gene therapy, cancer metabolism, GLP-1 agonists, or synthetic biology stops being jargon and starts being chemistry you can follow.

Prereq: organic chemistry, equilibrium, kinetics Read time: ~50 min Interactive figures: 1 Code: NumPy / BioPython

1. Why biochemistry matters — and why you should care

Everything in a cell is an organic molecule doing organic chemistry. The twist is that a cell runs thousands of reactions at once, each one tuned, each one talking to the others, and each one so selective that the organic chemist in the lab can only dream of matching it. A single enzyme molecule in your liver can break down 40 million hydrogen peroxides per second without breaking a sweat. A ribosome can pick the correct amino acid out of a cytoplasmic soup of twenty choices with an error rate below 1 in 10,000. Your mitochondria pull electrons off sugar and nudge them down a chain of four protein complexes to make a proton gradient that drives a rotary turbine, and that turbine spits out ATP at tens of rotations per second per molecule.

This is chemistry, and it is astonishing chemistry. It is also the chemistry that every modern medicine touches in some way. GLP-1 agonists (Ozempic, Wegovy) are peptide drugs that mimic a human hormone. Imatinib/Gleevec is a small-molecule kinase inhibitor that revolutionized chronic myeloid leukemia treatment. PCSK9 inhibitors that lower LDL cholesterol are antibodies. Cas9 gene editors are RNA-guided DNA endonucleases. mRNA vaccines are an IVT-synthesized nucleic acid wrapped in a lipid shell. Every one of those products only makes sense if you understand the biochemistry underneath.

THE PUNCHLINE

Biochemistry is built on three layers. First, molecules: proteins, nucleic acids, carbohydrates, lipids, and small-molecule metabolites. Second, catalysis: enzymes that accelerate reactions by $10^6$ to $10^{17}$ and choose one substrate out of many. Third, networks: the metabolic pathways and regulatory circuits that connect it all. If you learn the vocabulary of each layer and the rules that connect them, the whole subject becomes coherent.

Concrete reasons to care, whether you are a clinician, a drug designer, a synthetic biologist, a data scientist working on biology, or an investor trying to read a biotech prospectus:

This page teaches biochemistry from zero. You need the organic basics (functional groups, hydrogen bonds, acids/bases) and a feel for equilibrium. We build the rest.

2. Vocabulary cheat sheet

Skim. Every term gets a fuller treatment below.

Symbol / termRead asMeans
$K_m$"K-m" or Michaelis constantSubstrate concentration at which an enzyme works at half its maximum rate. A proxy for binding affinity.
$V_{max}$"V-max"Maximum rate the enzyme can achieve when saturated with substrate.
$k_{cat}$"k-cat" or turnover numberNumber of substrate molecules converted per active site per second at saturation.
$k_{cat}/K_m$"catalytic efficiency"How good the enzyme is at finding and processing substrate. Diffusion limit $\approx 10^8$-$10^9$ M$^{-1}$s$^{-1}$.
1°, 2°, 3°, 4°"primary through quaternary"Four levels of protein structure: sequence, local folds, 3D tertiary, and multi-chain assembly.
mRNA, tRNA, rRNA"messenger, transfer, ribosomal RNA"Three RNA families that do the heavy lifting of translation.
ATP, NADH, NADPH, FADH$_2$"A-T-P, N-A-D-H, etc."Currency molecules — energy and electrons that every pathway trades in.
Central dogma"central dogma"DNA → RNA → protein. Information flows forward (with some exceptions).
Glycolysis, TCA, OxPhosThe three stages of oxidative fuel metabolism.
$\Delta G^{\circ\prime}$"delta-G-standard-prime"Standard free energy change at biological conditions (pH 7, 1 M, 25°C).

3. Amino acids — the building blocks

All proteins are built from the same twenty amino acids. Every amino acid shares the same backbone — an $\alpha$-carbon with an $-NH_3^+$ (at physiological pH), a $-COO^-$, an H, and a side chain $R$ — and differs only in $R$.

$$ \underset{\text{amine}}{H_3N^+}\!-\!\overset{\displaystyle R}{\underset{\displaystyle H}{C}}\!-\!\underset{\text{carboxylate}}{COO^-}. $$

The amino acid zwitterion

$H_3N^+$
Protonated amino group. At pH 7 the amine is protonated ($pK_a \approx 9$) and the acid is deprotonated ($pK_a \approx 2$).
$COO^-$
Deprotonated carboxylate.
$R$
The side chain. Twenty common choices, from a simple H (glycine) to complex aromatic rings (tryptophan).
$\alpha$-C
The central carbon. Except in glycine, it is a stereocenter. All natural amino acids in proteins are L (equivalent to S for most).

Why the zwitterion. At physiological pH (7.4), every amino acid is a dipolar ion with a positive and a negative group. That makes them very water-soluble and gives them a high melting point — they behave more like inorganic salts than like typical organic molecules of similar size.

The twenty side chains are usually grouped by chemistry:

When two amino acids join, the carboxyl of one attacks the amine of the next, water leaves, and a peptide bond (an amide) forms. A chain of peptide-linked amino acids is a polypeptide; once it folds into a defined 3D shape with a job to do, it's a protein. Peptide bonds are planar, trans-biased, and rigid — the backbone can only rotate around two angles per residue, $\phi$ and $\psi$, which makes Ramachandran plots such a useful analytical tool.

4. Protein structure — four levels

A folded protein is described at four nested levels:

  1. Primary (1°) — the linear sequence of amino acids, N-terminus to C-terminus. A protein of $N$ residues has $20^N$ possible sequences; insulin has 51 residues, hemoglobin has 574, titin has ~34,000.
  2. Secondary (2°) — local regular structures stabilized by backbone H-bonds. The two main patterns are the $\alpha$-helix (every residue H-bonds to the one four positions away, 3.6 residues per turn) and the $\beta$-sheet (adjacent strands H-bond to each other, either parallel or antiparallel).
  3. Tertiary (3°) — the full 3D fold of a single polypeptide chain, driven by hydrophobic collapse (nonpolar side chains bury inside), H-bonds, salt bridges, and occasionally disulfide bonds.
  4. Quaternary (4°) — the assembly of two or more folded chains into a larger complex. Hemoglobin is a tetramer (2$\alpha$ + 2$\beta$). GroEL is a 14-mer. The ribosome is a 50+ subunit assembly of RNA and proteins.
Anfinsen's principle. For many small proteins, the native fold is determined entirely by the primary sequence — denature in urea, remove the urea, and the protein refolds correctly. Folding is a thermodynamic search for a global minimum, not a choreographed program. Anfinsen won the 1972 Nobel for showing this with ribonuclease A.

How folding works (the paradox that wasn't)

A polypeptide with 100 residues and 3 choices per $\phi/\psi$ has $3^{200} \approx 10^{95}$ possible conformations. If it visited each one for a picosecond, it would take longer than the age of the universe to find the native state. Yet real proteins fold in milliseconds. This is Levinthal's paradox, and the resolution is that folding is not a random search — the energy landscape is a funnel that biases the chain toward the native state at every step.

Modern structure prediction, led by AlphaFold 2 (2020) and its successors, has essentially solved the single-chain protein structure problem for proteins with evolutionary relatives in the database. A multiple sequence alignment plus a transformer is enough. Co-evolving residue pairs leak 3D information that the model exploits. For novel proteins without homologs, RFdiffusion and ESM2/ESMFold now fill in, and de novo protein design has become routine.

5. Enzymes and Michaelis-Menten kinetics

An enzyme is a biological catalyst — almost always a folded protein, occasionally an RNA. It speeds a specific reaction without being consumed, and it often selects one substrate out of a large pool. Rate accelerations routinely reach $10^6$-$10^{12}$ and in extreme cases ($OMP$ decarboxylase) $10^{17}$.

The standard model of enzyme kinetics was written down by Leonor Michaelis and Maud Menten in 1913:

$$ E + S \;\underset{k_{-1}}{\overset{k_1}{\rightleftharpoons}}\; ES \;\overset{k_{\text{cat}}}{\to}\; E + P. $$

The Michaelis-Menten mechanism

$E$
Free enzyme.
$S$
Substrate — what the enzyme will transform.
$ES$
Enzyme-substrate complex. The substrate is bound in the active site, pre-positioned for reaction.
$P$
Product. Released after the chemistry happens.
$k_1, k_{-1}$
Forward and reverse rate constants for binding. Fast on the timescale of catalysis for most enzymes.
$k_{\text{cat}}$
Turnover number — rate constant for the chemistry step once the substrate is bound.

Why this minimal mechanism is enough. Real enzymes have many steps (binding, induced fit, chemistry, product release). But on the rapid-equilibrium or steady-state approximation, every one of them reduces to a two-parameter rate law with a $K_m$ and a $V_{\max}$.

Assume a steady state: $[ES]$ is constant after a short initial transient. Set formation equal to decay, solve for $[ES]$, plug into $v = k_{\text{cat}}[ES]$, and you get the Michaelis-Menten equation:

$$ v \;=\; \frac{V_{\max}\,[S]}{K_m + [S]}, \qquad V_{\max} = k_{\text{cat}}[E]_T, \qquad K_m = \frac{k_{-1} + k_{\text{cat}}}{k_1}. $$

The Michaelis-Menten equation

$v$
Initial rate of product formation, in M/s.
$V_{\max}$
Maximum rate when every enzyme molecule is saturated with substrate.
$K_m$
Michaelis constant — the substrate concentration at which $v = V_{\max}/2$. Units of molarity.
$[S]$
Current substrate concentration.
$[E]_T$
Total enzyme concentration (bound + free).
$k_{\text{cat}}$
Turnover number — how many reactions each active site performs per second at saturation. Catalase: $\sim 4 \times 10^7$. Carbonic anhydrase: $\sim 10^6$.

Shape of the curve. At low $[S] \ll K_m$, $v \approx (V_{\max}/K_m)[S]$ — linear in substrate, as if the enzyme were a second-order catalyst. At high $[S] \gg K_m$, $v \approx V_{\max}$ — the enzyme is saturated and no matter how much more substrate you add, nothing happens faster. The crossover between the two regimes sits at $[S] = K_m$. A low $K_m$ means the enzyme is saturated even at very dilute substrate — common for high-affinity enzymes like hexokinase ($K_m \sim 0.1$ mM for glucose).

The quantity $k_{\text{cat}}/K_m$ is the catalytic efficiency — how good the enzyme is at finding substrate, binding it, and converting it. The diffusion limit is about $10^8$-$10^9$ M$^{-1}$s$^{-1}$: beyond that, the substrate and enzyme can't collide any faster in water. A handful of enzymes (catalase, triose phosphate isomerase, fumarase) are at or near this limit. They are as fast as physics allows.

Inhibition

Drugs that target enzymes work by inhibiting them. Three classic patterns:

6. Interactive: Michaelis-Menten plotter

Slide $K_m$ and $V_{\max}$ and watch the rate-vs-substrate curve. Compare the standard MM plot with a Lineweaver-Burk (double reciprocal) plot, where the curve becomes a straight line — historically how experimentalists extracted parameters, before nonlinear fitting was cheap.

$V_{\max}$ (μM/s): 5.0 $K_m$ (μM): 10.0

Michaelis-Menten curve. Dashed line = $V_{\max}$. Pink dot = ($K_m$, $V_{\max}/2$).

Things to try:

7. DNA, RNA, and the central dogma

DNA and RNA are polymers of nucleotides. A nucleotide has three pieces: a five-carbon sugar (deoxyribose in DNA, ribose in RNA), a phosphate group, and one of four nitrogenous bases. DNA uses A, T, G, C. RNA uses A, U (instead of T), G, C. The bases stack on the inside of a double helix, pairing A–T (or A–U) via two hydrogen bonds and G–C via three. The sugar-phosphate backbone runs on the outside, polyanionic.

$$ \underbrace{\text{DNA}}_{\text{storage}} \;\overset{\text{transcription}}{\longrightarrow}\; \underbrace{\text{mRNA}}_{\text{working copy}} \;\overset{\text{translation}}{\longrightarrow}\; \underbrace{\text{protein}}_{\text{machine}}. $$

The central dogma

DNA
Long-term information storage. Double-stranded, stable, heritable.
Transcription
RNA polymerase reads one DNA strand and synthesizes a complementary RNA copy.
mRNA
Single-stranded message carrying the protein-coding sequence. Short-lived.
Translation
The ribosome reads the mRNA three bases at a time (a codon) and attaches the corresponding amino acid to a growing polypeptide.
Protein
The machine that folds, moves, catalyzes, and builds — the doer of cellular work.

Why the dogma is "almost" universal. Retroviruses like HIV run DNA ← RNA via reverse transcriptase. Prions replicate in a protein-only cycle. Some RNA viruses replicate without ever going through DNA. The core flow DNA → RNA → protein holds in every organism, but information can also go backward (RT) or stay as RNA.

Three practical consequences for the modern medicine you care about:

8. Metabolism — glycolysis, TCA, oxidative phosphorylation

Food has chemical energy stored in C–H and C–C bonds. Metabolism is the orderly process of extracting that energy and converting it into ATP. For glucose, the pathway has three stages:

  1. Glycolysis — cytosolic. Glucose ($C_6$) is split into two pyruvates ($C_3$). Net yield: 2 ATP and 2 NADH per glucose. Works aerobically or anaerobically.
  2. TCA cycle (Krebs / citric acid cycle) — mitochondrial matrix. Pyruvate is decarboxylated and handed to acetyl-CoA. Each acetyl group is fed into a cycle that releases 2 CO$_2$, 3 NADH, 1 FADH$_2$, and 1 GTP per cycle.
  3. Oxidative phosphorylation — inner mitochondrial membrane. NADH and FADH$_2$ deliver electrons to a chain of four protein complexes. Complexes I, III, and IV pump protons across the membrane, building an electrochemical gradient. ATP synthase (Complex V) lets the protons flow back through a rotor, coupling flow to ATP synthesis.

Net from one glucose (aerobic): ~30 ATP. Anaerobic (fermentation): just 2 ATP. Which is why you can sprint without breathing but not for long — you accumulate lactate, pH drops, and the machine stalls.

$$ C_6H_{12}O_6 + 6\,O_2 \;\to\; 6\,CO_2 + 6\,H_2O, \qquad \Delta G^{\circ\prime} = -2870 \text{ kJ/mol}. $$

Total combustion of glucose (biochemical standard)

$C_6H_{12}O_6$
Glucose, the universal fuel molecule.
$6 O_2$
Oxygen — the terminal electron acceptor in aerobic metabolism.
$6 CO_2$
Carbon dioxide — the oxidized carbon waste.
$6 H_2O$
Water.
$\Delta G^{\circ\prime}$
Biochemical standard free energy change at pH 7, 25°C, 1 M reactants. Roughly -686 kcal/mol.

Energy budget. ATP hydrolysis releases about $-30$ kJ/mol under cellular conditions. So the ~2870 kJ/mol from glucose combustion could in principle make about 95 ATP per glucose. Real cells capture ~30, a 32% efficiency — on par with a good car engine. The rest becomes heat, which is partly why animals stay warm.

Why the machinery is built this way

Two design principles explain most of metabolism. First, stepwise oxidation: burning glucose in one step would release 2870 kJ of heat and make no ATP. Chopping it into dozens of small steps, each releasing a manageable amount of energy, lets enzymes couple those steps to ATP synthesis and capture the energy chemically. Second, redox currencies: NADH and FADH$_2$ act as portable electron carriers, ferrying reducing equivalents from many catabolic reactions to a single, dedicated machinery (the electron transport chain) that does the final oxidation to water.

9. ATP, NADH, and the cofactors

A handful of small molecules do most of the cell's chemical work. They are called cofactors or coenzymes, and they are the currencies every metabolic pathway trades in:

Almost every vitamin is the precursor of a cofactor. Vitamin deficiencies are cofactor deficiencies, which are metabolic stalls at specific steps. Scurvy is deficiency of vitamin C (needed for collagen hydroxylation). Beriberi is deficiency of thiamine (TCA cycle stalls). Pellagra is deficiency of niacin (NAD$^+$ falls). Pernicious anemia is B12 deficiency (methionine synthase and methylmalonyl-CoA mutase fail).

Why ATP is "energy currency"

The $\Delta G^{\circ\prime}$ for ATP → ADP + P$_i$ is about $-30$ kJ/mol. That is enough to drive an otherwise unfavorable reaction if the two are coupled through a shared intermediate. Classic example: phosphorylating glucose to glucose-6-phosphate has $\Delta G^{\circ\prime} = +14$ kJ/mol on its own. Couple it to ATP hydrolysis and the net is $-16$ kJ/mol — spontaneous. Hexokinase catalyzes the coupled reaction so no free phosphate intermediate ever exists; the reaction goes through a single-transition-state enzymatic step.

10. Biochemistry in code

Two things: a Michaelis-Menten fit using nonlinear least squares and a simple translation routine that converts an mRNA sequence into amino acids.

biochemistry primitives
import numpy as np
from scipy.optimize import curve_fit

# ---------- Michaelis-Menten fit ----------
def mm(S, Vmax, Km):
    return Vmax * S / (Km + S)

# Synthetic data from a fictional enzyme assay
S_obs = np.array([0.5, 1, 2, 5, 10, 20, 50, 100])   # uM
v_obs = np.array([0.31, 0.56, 0.93, 1.75, 2.45, 3.12, 3.65, 3.85])  # uM/s

popt, _ = curve_fit(mm, S_obs, v_obs, p0=[4, 5])
Vmax_fit, Km_fit = popt
print(f"Vmax ≈ {Vmax_fit:.2f} uM/s,  Km ≈ {Km_fit:.2f} uM")
print(f"kcat/Km (assuming [E]_T = 10 nM) ≈ {Vmax_fit/10e-3/Km_fit:.2e} / (M·s)")

# ---------- mRNA translation ----------
# Codon table abbreviated to the 64 entries.
CODON = {
    "UUU":"F","UUC":"F","UUA":"L","UUG":"L",
    "CUU":"L","CUC":"L","CUA":"L","CUG":"L",
    "AUU":"I","AUC":"I","AUA":"I","AUG":"M",
    "GUU":"V","GUC":"V","GUA":"V","GUG":"V",
    # ... (55 more entries; UAA/UAG/UGA are stop)
}

def translate(mrna):
    protein = []
    for i in range(0, len(mrna) - 2, 3):
        codon = mrna[i:i+3]
        aa = CODON.get(codon, "?")
        if aa == "*": break
        protein.append(aa)
    return "".join(protein)

print(translate("AUGUUUGUG"))  # M F V
import math

# Same MM curve, no dependencies.
def mm_rate(S, Vmax, Km):
    return Vmax * S / (Km + S)

def iptg_induction_profile(t, K, n=2):
    # Hill function for inducer-driven gene expression
    return (t ** n) / (K ** n + t ** n)

def gibbs_from_K(K, T=310):
    # Biological T: 37 C = 310 K
    return -8.314e-3 * T * math.log(K)   # kJ/mol

# Example: a reaction with Keq = 1000 under cellular conditions
print(f"dG = {gibbs_from_K(1000):.1f} kJ/mol")  # ~ -17.8
print(f"v at S=Km: {mm_rate(10, 5, 10):.2f}")         # Vmax/2 = 2.5

Two practical notes:

11. Cheat sheet

Amino acid backbone

$H_3N^+$-C(R)(H)-$COO^-$

Zwitterion at pH 7. R is the side chain.

Protein levels

1° sequence → 2° helix/sheet → 3° fold → 4° assembly

Anfinsen: 1° → 3° is deterministic (for most small proteins).

Michaelis-Menten

$v = V_{\max}[S]/(K_m + [S])$

Rectangular hyperbola. $K_m$ is the half-saturation $[S]$.

Catalytic efficiency

$k_{\text{cat}}/K_m$

Diffusion limit ~$10^8$-$10^9$ M$^{-1}$s$^{-1}$.

Central dogma

DNA → RNA → protein

With RT exceptions, and prion/RNA corner cases.

Glucose yield

~30 ATP aerobic / 2 ATP anaerobic

Glycolysis + TCA + OxPhos vs fermentation.

ATP hydrolysis

$\Delta G^{\circ\prime} \approx -30$ kJ/mol

Couples uphill reactions to make them spontaneous.

Redox carriers

NADH / NADPH / FADH$_2$

Catabolism / biosynthesis / flavoenzymes.

Hemoglobin

Tetramer ($\alpha_2\beta_2$), cooperative $O_2$ binding

Canonical quaternary structure example.

Codon

Three bases → one amino acid

64 codons, 20 amino acids + 3 stops. Degenerate.

See also

Organic chemistry

Every biomolecule is a bag of functional groups. Imines, esters, amides, and thioesters run the cell's chemistry.

Chemical equilibrium

Protein folding, substrate binding, and hemoglobin-O$_2$ cooperativity are all equilibrium problems with a biological flavor.

Kinetics

Michaelis-Menten is kinetics with a bound-complex twist. Enzyme inhibition is kinetics with a drug in the mix.

Quantum chemistry

Computational chemists use DFT and QM/MM to predict enzyme transition states and inhibitor binding energies.

AI: Foundation models for biology

AlphaFold, ESM, and RFdiffusion are protein-structure prediction and design systems trained on the same data you just learned to read.

Further reading

  • Jeremy M. Berg, John L. Tymoczko, and Lubert Stryer — Biochemistry (9th ed., W.H. Freeman, 2019). The modern standard textbook; Chapter 8 on enzymes is the clearest MM treatment in print.
  • David L. Nelson and Michael M. Cox — Lehninger Principles of Biochemistry (8th ed., 2021). An alternative standard, with more metabolism detail.
  • Bruce Alberts et al. — Molecular Biology of the Cell (7th ed., Garland, 2022). The cell-biology companion. Structure, signaling, trafficking.
  • Athel Cornish-Bowden — Fundamentals of Enzyme Kinetics (4th ed., Wiley, 2012). The deep dive on Michaelis-Menten and its many refinements.
  • John Jumper et al. — "Highly accurate protein structure prediction with AlphaFold." Nature 596, 583-589 (2021). The paper that changed protein structural biology.
  • Protein Data Bank — rcsb.org. The worldwide archive of experimentally determined biomolecular structures.
NEXT UP
→ Quantum Chemistry

Now that you have seen how biomolecules react under enzymatic catalysis, the natural next question is "how do we predict the electronic structure and reaction energetics from first principles?" That is what quantum chemistry delivers.