Biochemistry
Biochemistry is organic chemistry running inside a cell, at body temperature, catalyzed by machines that can hit $10^{17}$-fold rate accelerations and single-molecule selectivity. Once you know the twenty amino acids, how they fold into proteins, what enzymes do to rate constants, and how electrons flow from food to ATP, every headline about gene therapy, cancer metabolism, GLP-1 agonists, or synthetic biology stops being jargon and starts being chemistry you can follow.
1. Why biochemistry matters — and why you should care
Everything in a cell is an organic molecule doing organic chemistry. The twist is that a cell runs thousands of reactions at once, each one tuned, each one talking to the others, and each one so selective that the organic chemist in the lab can only dream of matching it. A single enzyme molecule in your liver can break down 40 million hydrogen peroxides per second without breaking a sweat. A ribosome can pick the correct amino acid out of a cytoplasmic soup of twenty choices with an error rate below 1 in 10,000. Your mitochondria pull electrons off sugar and nudge them down a chain of four protein complexes to make a proton gradient that drives a rotary turbine, and that turbine spits out ATP at tens of rotations per second per molecule.
This is chemistry, and it is astonishing chemistry. It is also the chemistry that every modern medicine touches in some way. GLP-1 agonists (Ozempic, Wegovy) are peptide drugs that mimic a human hormone. Imatinib/Gleevec is a small-molecule kinase inhibitor that revolutionized chronic myeloid leukemia treatment. PCSK9 inhibitors that lower LDL cholesterol are antibodies. Cas9 gene editors are RNA-guided DNA endonucleases. mRNA vaccines are an IVT-synthesized nucleic acid wrapped in a lipid shell. Every one of those products only makes sense if you understand the biochemistry underneath.
Biochemistry is built on three layers. First, molecules: proteins, nucleic acids, carbohydrates, lipids, and small-molecule metabolites. Second, catalysis: enzymes that accelerate reactions by $10^6$ to $10^{17}$ and choose one substrate out of many. Third, networks: the metabolic pathways and regulatory circuits that connect it all. If you learn the vocabulary of each layer and the rules that connect them, the whole subject becomes coherent.
Concrete reasons to care, whether you are a clinician, a drug designer, a synthetic biologist, a data scientist working on biology, or an investor trying to read a biotech prospectus:
- Drug discovery. Most drugs target an enzyme, a receptor, or a transporter. Understanding $K_m$, $V_{max}$, competitive vs non-competitive inhibition is the minimum to read a kinetics paper.
- Clinical medicine. Almost every disease is a metabolic or signaling dysfunction. Diabetes is a glucose-handling failure. Cancer is, among other things, a metabolic rewiring. Inborn errors of metabolism are missing enzymes. Cardiovascular disease is lipid biochemistry.
- Genomics and RNA biology. Sequencing, CRISPR editing, and mRNA therapeutics are all chemistry on nucleic acids.
- Environmental and industrial biotech. Fermentations, biofuels, bioplastics, and enzymatic manufacturing of pharmaceuticals all run on metabolic engineering.
- AI for biology. AlphaFold, RFdiffusion, and ESM all learn from protein sequence and structure data. The better your biochemistry intuition, the better you can interpret their outputs.
This page teaches biochemistry from zero. You need the organic basics (functional groups, hydrogen bonds, acids/bases) and a feel for equilibrium. We build the rest.
2. Vocabulary cheat sheet
Skim. Every term gets a fuller treatment below.
| Symbol / term | Read as | Means |
|---|---|---|
| $K_m$ | "K-m" or Michaelis constant | Substrate concentration at which an enzyme works at half its maximum rate. A proxy for binding affinity. |
| $V_{max}$ | "V-max" | Maximum rate the enzyme can achieve when saturated with substrate. |
| $k_{cat}$ | "k-cat" or turnover number | Number of substrate molecules converted per active site per second at saturation. |
| $k_{cat}/K_m$ | "catalytic efficiency" | How good the enzyme is at finding and processing substrate. Diffusion limit $\approx 10^8$-$10^9$ M$^{-1}$s$^{-1}$. |
| 1°, 2°, 3°, 4° | "primary through quaternary" | Four levels of protein structure: sequence, local folds, 3D tertiary, and multi-chain assembly. |
| mRNA, tRNA, rRNA | "messenger, transfer, ribosomal RNA" | Three RNA families that do the heavy lifting of translation. |
| ATP, NADH, NADPH, FADH$_2$ | "A-T-P, N-A-D-H, etc." | Currency molecules — energy and electrons that every pathway trades in. |
| Central dogma | "central dogma" | DNA → RNA → protein. Information flows forward (with some exceptions). |
| Glycolysis, TCA, OxPhos | — | The three stages of oxidative fuel metabolism. |
| $\Delta G^{\circ\prime}$ | "delta-G-standard-prime" | Standard free energy change at biological conditions (pH 7, 1 M, 25°C). |
3. Amino acids — the building blocks
All proteins are built from the same twenty amino acids. Every amino acid shares the same backbone — an $\alpha$-carbon with an $-NH_3^+$ (at physiological pH), a $-COO^-$, an H, and a side chain $R$ — and differs only in $R$.
The amino acid zwitterion
- $H_3N^+$
- Protonated amino group. At pH 7 the amine is protonated ($pK_a \approx 9$) and the acid is deprotonated ($pK_a \approx 2$).
- $COO^-$
- Deprotonated carboxylate.
- $R$
- The side chain. Twenty common choices, from a simple H (glycine) to complex aromatic rings (tryptophan).
- $\alpha$-C
- The central carbon. Except in glycine, it is a stereocenter. All natural amino acids in proteins are L (equivalent to S for most).
Why the zwitterion. At physiological pH (7.4), every amino acid is a dipolar ion with a positive and a negative group. That makes them very water-soluble and gives them a high melting point — they behave more like inorganic salts than like typical organic molecules of similar size.
The twenty side chains are usually grouped by chemistry:
- Hydrophobic aliphatic: Gly, Ala, Val, Leu, Ile, Met, Pro. These go to the interior of folded proteins.
- Aromatic: Phe, Tyr, Trp. Tyr and Trp absorb UV at 280 nm — used to quantify proteins.
- Polar uncharged: Ser, Thr, Cys, Asn, Gln. Hydrogen-bonding. Cys can form disulfide bridges.
- Positively charged (basic): Lys, Arg, His. His has a $pK_a$ near 6, crucial for acid-base catalysis.
- Negatively charged (acidic): Asp, Glu. Carboxylate side chains; often in enzyme active sites.
When two amino acids join, the carboxyl of one attacks the amine of the next, water leaves, and a peptide bond (an amide) forms. A chain of peptide-linked amino acids is a polypeptide; once it folds into a defined 3D shape with a job to do, it's a protein. Peptide bonds are planar, trans-biased, and rigid — the backbone can only rotate around two angles per residue, $\phi$ and $\psi$, which makes Ramachandran plots such a useful analytical tool.
4. Protein structure — four levels
A folded protein is described at four nested levels:
- Primary (1°) — the linear sequence of amino acids, N-terminus to C-terminus. A protein of $N$ residues has $20^N$ possible sequences; insulin has 51 residues, hemoglobin has 574, titin has ~34,000.
- Secondary (2°) — local regular structures stabilized by backbone H-bonds. The two main patterns are the $\alpha$-helix (every residue H-bonds to the one four positions away, 3.6 residues per turn) and the $\beta$-sheet (adjacent strands H-bond to each other, either parallel or antiparallel).
- Tertiary (3°) — the full 3D fold of a single polypeptide chain, driven by hydrophobic collapse (nonpolar side chains bury inside), H-bonds, salt bridges, and occasionally disulfide bonds.
- Quaternary (4°) — the assembly of two or more folded chains into a larger complex. Hemoglobin is a tetramer (2$\alpha$ + 2$\beta$). GroEL is a 14-mer. The ribosome is a 50+ subunit assembly of RNA and proteins.
How folding works (the paradox that wasn't)
A polypeptide with 100 residues and 3 choices per $\phi/\psi$ has $3^{200} \approx 10^{95}$ possible conformations. If it visited each one for a picosecond, it would take longer than the age of the universe to find the native state. Yet real proteins fold in milliseconds. This is Levinthal's paradox, and the resolution is that folding is not a random search — the energy landscape is a funnel that biases the chain toward the native state at every step.
Modern structure prediction, led by AlphaFold 2 (2020) and its successors, has essentially solved the single-chain protein structure problem for proteins with evolutionary relatives in the database. A multiple sequence alignment plus a transformer is enough. Co-evolving residue pairs leak 3D information that the model exploits. For novel proteins without homologs, RFdiffusion and ESM2/ESMFold now fill in, and de novo protein design has become routine.
5. Enzymes and Michaelis-Menten kinetics
An enzyme is a biological catalyst — almost always a folded protein, occasionally an RNA. It speeds a specific reaction without being consumed, and it often selects one substrate out of a large pool. Rate accelerations routinely reach $10^6$-$10^{12}$ and in extreme cases ($OMP$ decarboxylase) $10^{17}$.
The standard model of enzyme kinetics was written down by Leonor Michaelis and Maud Menten in 1913:
The Michaelis-Menten mechanism
- $E$
- Free enzyme.
- $S$
- Substrate — what the enzyme will transform.
- $ES$
- Enzyme-substrate complex. The substrate is bound in the active site, pre-positioned for reaction.
- $P$
- Product. Released after the chemistry happens.
- $k_1, k_{-1}$
- Forward and reverse rate constants for binding. Fast on the timescale of catalysis for most enzymes.
- $k_{\text{cat}}$
- Turnover number — rate constant for the chemistry step once the substrate is bound.
Why this minimal mechanism is enough. Real enzymes have many steps (binding, induced fit, chemistry, product release). But on the rapid-equilibrium or steady-state approximation, every one of them reduces to a two-parameter rate law with a $K_m$ and a $V_{\max}$.
Assume a steady state: $[ES]$ is constant after a short initial transient. Set formation equal to decay, solve for $[ES]$, plug into $v = k_{\text{cat}}[ES]$, and you get the Michaelis-Menten equation:
The Michaelis-Menten equation
- $v$
- Initial rate of product formation, in M/s.
- $V_{\max}$
- Maximum rate when every enzyme molecule is saturated with substrate.
- $K_m$
- Michaelis constant — the substrate concentration at which $v = V_{\max}/2$. Units of molarity.
- $[S]$
- Current substrate concentration.
- $[E]_T$
- Total enzyme concentration (bound + free).
- $k_{\text{cat}}$
- Turnover number — how many reactions each active site performs per second at saturation. Catalase: $\sim 4 \times 10^7$. Carbonic anhydrase: $\sim 10^6$.
Shape of the curve. At low $[S] \ll K_m$, $v \approx (V_{\max}/K_m)[S]$ — linear in substrate, as if the enzyme were a second-order catalyst. At high $[S] \gg K_m$, $v \approx V_{\max}$ — the enzyme is saturated and no matter how much more substrate you add, nothing happens faster. The crossover between the two regimes sits at $[S] = K_m$. A low $K_m$ means the enzyme is saturated even at very dilute substrate — common for high-affinity enzymes like hexokinase ($K_m \sim 0.1$ mM for glucose).
The quantity $k_{\text{cat}}/K_m$ is the catalytic efficiency — how good the enzyme is at finding substrate, binding it, and converting it. The diffusion limit is about $10^8$-$10^9$ M$^{-1}$s$^{-1}$: beyond that, the substrate and enzyme can't collide any faster in water. A handful of enzymes (catalase, triose phosphate isomerase, fumarase) are at or near this limit. They are as fast as physics allows.
Inhibition
Drugs that target enzymes work by inhibiting them. Three classic patterns:
- Competitive — inhibitor binds the same site as the substrate. Raises apparent $K_m$, leaves $V_{\max}$ unchanged (enough substrate still wins). Statins are competitive inhibitors of HMG-CoA reductase.
- Non-competitive — inhibitor binds an allosteric site and lowers apparent $V_{\max}$ without changing $K_m$. You lose active enzyme no matter how much substrate you pile on.
- Uncompetitive — inhibitor binds only the $ES$ complex. Lowers both $K_m$ and $V_{\max}$. Lithium (in treating bipolar disorder) is thought to work partly this way on IMPase.
6. Interactive: Michaelis-Menten plotter
Slide $K_m$ and $V_{\max}$ and watch the rate-vs-substrate curve. Compare the standard MM plot with a Lineweaver-Burk (double reciprocal) plot, where the curve becomes a straight line — historically how experimentalists extracted parameters, before nonlinear fitting was cheap.
Michaelis-Menten curve. Dashed line = $V_{\max}$. Pink dot = ($K_m$, $V_{\max}/2$).
Things to try:
- Drop $K_m$ to 1 μM — the curve is now nearly saturated at a few μM, characteristic of a high-affinity enzyme.
- Raise $K_m$ to 50 μM — linear behavior persists well into the plot.
- Double $V_{\max}$ — the curve scales vertically without changing its shape. Doubling enzyme concentration has exactly this effect.
7. DNA, RNA, and the central dogma
DNA and RNA are polymers of nucleotides. A nucleotide has three pieces: a five-carbon sugar (deoxyribose in DNA, ribose in RNA), a phosphate group, and one of four nitrogenous bases. DNA uses A, T, G, C. RNA uses A, U (instead of T), G, C. The bases stack on the inside of a double helix, pairing A–T (or A–U) via two hydrogen bonds and G–C via three. The sugar-phosphate backbone runs on the outside, polyanionic.
The central dogma
- DNA
- Long-term information storage. Double-stranded, stable, heritable.
- Transcription
- RNA polymerase reads one DNA strand and synthesizes a complementary RNA copy.
- mRNA
- Single-stranded message carrying the protein-coding sequence. Short-lived.
- Translation
- The ribosome reads the mRNA three bases at a time (a codon) and attaches the corresponding amino acid to a growing polypeptide.
- Protein
- The machine that folds, moves, catalyzes, and builds — the doer of cellular work.
Why the dogma is "almost" universal. Retroviruses like HIV run DNA ← RNA via reverse transcriptase. Prions replicate in a protein-only cycle. Some RNA viruses replicate without ever going through DNA. The core flow DNA → RNA → protein holds in every organism, but information can also go backward (RT) or stay as RNA.
Three practical consequences for the modern medicine you care about:
- Sequencing. Modern short-read (Illumina) and long-read (PacBio, Nanopore) sequencing decode DNA and RNA with base-level accuracy. A human genome now costs under $1000 and assembles overnight.
- mRNA therapeutics. COVID-19 vaccines are mRNA encoding the spike protein, wrapped in a lipid nanoparticle, injected into muscle, taken up by cells, translated into spike antigen — and then the mRNA is degraded. It is short-lived by design.
- CRISPR-Cas9. A bacterial RNA-guided DNA nuclease repurposed to cut any target sequence in any genome. The guide RNA binds the DNA; Cas9 cleaves. Base editing (no cut) and prime editing (rewrite) are refinements.
8. Metabolism — glycolysis, TCA, oxidative phosphorylation
Food has chemical energy stored in C–H and C–C bonds. Metabolism is the orderly process of extracting that energy and converting it into ATP. For glucose, the pathway has three stages:
- Glycolysis — cytosolic. Glucose ($C_6$) is split into two pyruvates ($C_3$). Net yield: 2 ATP and 2 NADH per glucose. Works aerobically or anaerobically.
- TCA cycle (Krebs / citric acid cycle) — mitochondrial matrix. Pyruvate is decarboxylated and handed to acetyl-CoA. Each acetyl group is fed into a cycle that releases 2 CO$_2$, 3 NADH, 1 FADH$_2$, and 1 GTP per cycle.
- Oxidative phosphorylation — inner mitochondrial membrane. NADH and FADH$_2$ deliver electrons to a chain of four protein complexes. Complexes I, III, and IV pump protons across the membrane, building an electrochemical gradient. ATP synthase (Complex V) lets the protons flow back through a rotor, coupling flow to ATP synthesis.
Net from one glucose (aerobic): ~30 ATP. Anaerobic (fermentation): just 2 ATP. Which is why you can sprint without breathing but not for long — you accumulate lactate, pH drops, and the machine stalls.
Total combustion of glucose (biochemical standard)
- $C_6H_{12}O_6$
- Glucose, the universal fuel molecule.
- $6 O_2$
- Oxygen — the terminal electron acceptor in aerobic metabolism.
- $6 CO_2$
- Carbon dioxide — the oxidized carbon waste.
- $6 H_2O$
- Water.
- $\Delta G^{\circ\prime}$
- Biochemical standard free energy change at pH 7, 25°C, 1 M reactants. Roughly -686 kcal/mol.
Energy budget. ATP hydrolysis releases about $-30$ kJ/mol under cellular conditions. So the ~2870 kJ/mol from glucose combustion could in principle make about 95 ATP per glucose. Real cells capture ~30, a 32% efficiency — on par with a good car engine. The rest becomes heat, which is partly why animals stay warm.
Why the machinery is built this way
Two design principles explain most of metabolism. First, stepwise oxidation: burning glucose in one step would release 2870 kJ of heat and make no ATP. Chopping it into dozens of small steps, each releasing a manageable amount of energy, lets enzymes couple those steps to ATP synthesis and capture the energy chemically. Second, redox currencies: NADH and FADH$_2$ act as portable electron carriers, ferrying reducing equivalents from many catabolic reactions to a single, dedicated machinery (the electron transport chain) that does the final oxidation to water.
9. ATP, NADH, and the cofactors
A handful of small molecules do most of the cell's chemical work. They are called cofactors or coenzymes, and they are the currencies every metabolic pathway trades in:
- ATP (adenosine triphosphate) — the energy currency. Hydrolysis of the terminal phosphate releases ~$-30$ kJ/mol under cellular conditions, enough to drive many reactions that are otherwise uphill.
- NAD$^+$ / NADH — two-electron redox carrier. NADH carries hydride equivalents from catabolism to the electron transport chain.
- NADP$^+$ / NADPH — chemically identical to NAD$^+$ with one extra phosphate, but used in biosynthesis (reductive anabolism) rather than catabolism.
- FAD / FADH$_2$ — two-electron redox carrier tightly bound to many enzymes (flavoproteins).
- Coenzyme A (CoA) — carries acyl groups, especially acetyl. Acetyl-CoA is the central hub of carbon metabolism.
- Pyridoxal phosphate (PLP, vitamin B6) — shuttles amino groups in transaminases via imine chemistry.
- Thiamine pyrophosphate (TPP, vitamin B1) — catalyzes decarboxylation of $\alpha$-keto acids.
- Biotin (vitamin B7) — carries carboxyl groups in carboxylation reactions.
Almost every vitamin is the precursor of a cofactor. Vitamin deficiencies are cofactor deficiencies, which are metabolic stalls at specific steps. Scurvy is deficiency of vitamin C (needed for collagen hydroxylation). Beriberi is deficiency of thiamine (TCA cycle stalls). Pellagra is deficiency of niacin (NAD$^+$ falls). Pernicious anemia is B12 deficiency (methionine synthase and methylmalonyl-CoA mutase fail).
Why ATP is "energy currency"
The $\Delta G^{\circ\prime}$ for ATP → ADP + P$_i$ is about $-30$ kJ/mol. That is enough to drive an otherwise unfavorable reaction if the two are coupled through a shared intermediate. Classic example: phosphorylating glucose to glucose-6-phosphate has $\Delta G^{\circ\prime} = +14$ kJ/mol on its own. Couple it to ATP hydrolysis and the net is $-16$ kJ/mol — spontaneous. Hexokinase catalyzes the coupled reaction so no free phosphate intermediate ever exists; the reaction goes through a single-transition-state enzymatic step.
10. Biochemistry in code
Two things: a Michaelis-Menten fit using nonlinear least squares and a simple translation routine that converts an mRNA sequence into amino acids.
import numpy as np
from scipy.optimize import curve_fit
# ---------- Michaelis-Menten fit ----------
def mm(S, Vmax, Km):
return Vmax * S / (Km + S)
# Synthetic data from a fictional enzyme assay
S_obs = np.array([0.5, 1, 2, 5, 10, 20, 50, 100]) # uM
v_obs = np.array([0.31, 0.56, 0.93, 1.75, 2.45, 3.12, 3.65, 3.85]) # uM/s
popt, _ = curve_fit(mm, S_obs, v_obs, p0=[4, 5])
Vmax_fit, Km_fit = popt
print(f"Vmax ≈ {Vmax_fit:.2f} uM/s, Km ≈ {Km_fit:.2f} uM")
print(f"kcat/Km (assuming [E]_T = 10 nM) ≈ {Vmax_fit/10e-3/Km_fit:.2e} / (M·s)")
# ---------- mRNA translation ----------
# Codon table abbreviated to the 64 entries.
CODON = {
"UUU":"F","UUC":"F","UUA":"L","UUG":"L",
"CUU":"L","CUC":"L","CUA":"L","CUG":"L",
"AUU":"I","AUC":"I","AUA":"I","AUG":"M",
"GUU":"V","GUC":"V","GUA":"V","GUG":"V",
# ... (55 more entries; UAA/UAG/UGA are stop)
}
def translate(mrna):
protein = []
for i in range(0, len(mrna) - 2, 3):
codon = mrna[i:i+3]
aa = CODON.get(codon, "?")
if aa == "*": break
protein.append(aa)
return "".join(protein)
print(translate("AUGUUUGUG")) # M F V
import math
# Same MM curve, no dependencies.
def mm_rate(S, Vmax, Km):
return Vmax * S / (Km + S)
def iptg_induction_profile(t, K, n=2):
# Hill function for inducer-driven gene expression
return (t ** n) / (K ** n + t ** n)
def gibbs_from_K(K, T=310):
# Biological T: 37 C = 310 K
return -8.314e-3 * T * math.log(K) # kJ/mol
# Example: a reaction with Keq = 1000 under cellular conditions
print(f"dG = {gibbs_from_K(1000):.1f} kJ/mol") # ~ -17.8
print(f"v at S=Km: {mm_rate(10, 5, 10):.2f}") # Vmax/2 = 2.5
Two practical notes:
- Lineweaver-Burk is a historical crutch. The double-reciprocal plot $1/v$ vs $1/[S]$ gives a straight line whose intercepts encode $V_{\max}$ and $K_m$, but it distorts errors at low $[S]$. Modern practice is nonlinear least squares directly on the raw data.
- BioPython (
pip install biopython) has full codon tables, sequence alignment, and structure parsers built in. Use it for anything involving real sequences.
11. Cheat sheet
Amino acid backbone
$H_3N^+$-C(R)(H)-$COO^-$
Protein levels
1° sequence → 2° helix/sheet → 3° fold → 4° assembly
Michaelis-Menten
$v = V_{\max}[S]/(K_m + [S])$
Catalytic efficiency
$k_{\text{cat}}/K_m$
Central dogma
DNA → RNA → protein
Glucose yield
~30 ATP aerobic / 2 ATP anaerobic
ATP hydrolysis
$\Delta G^{\circ\prime} \approx -30$ kJ/mol
Redox carriers
NADH / NADPH / FADH$_2$
Hemoglobin
Tetramer ($\alpha_2\beta_2$), cooperative $O_2$ binding
Codon
Three bases → one amino acid
See also
Organic chemistry
Every biomolecule is a bag of functional groups. Imines, esters, amides, and thioesters run the cell's chemistry.
Chemical equilibrium
Protein folding, substrate binding, and hemoglobin-O$_2$ cooperativity are all equilibrium problems with a biological flavor.
Kinetics
Michaelis-Menten is kinetics with a bound-complex twist. Enzyme inhibition is kinetics with a drug in the mix.
Quantum chemistry
Computational chemists use DFT and QM/MM to predict enzyme transition states and inhibitor binding energies.
AI: Foundation models for biology
AlphaFold, ESM, and RFdiffusion are protein-structure prediction and design systems trained on the same data you just learned to read.
Further reading
- Jeremy M. Berg, John L. Tymoczko, and Lubert Stryer — Biochemistry (9th ed., W.H. Freeman, 2019). The modern standard textbook; Chapter 8 on enzymes is the clearest MM treatment in print.
- David L. Nelson and Michael M. Cox — Lehninger Principles of Biochemistry (8th ed., 2021). An alternative standard, with more metabolism detail.
- Bruce Alberts et al. — Molecular Biology of the Cell (7th ed., Garland, 2022). The cell-biology companion. Structure, signaling, trafficking.
- Athel Cornish-Bowden — Fundamentals of Enzyme Kinetics (4th ed., Wiley, 2012). The deep dive on Michaelis-Menten and its many refinements.
- John Jumper et al. — "Highly accurate protein structure prediction with AlphaFold." Nature 596, 583-589 (2021). The paper that changed protein structural biology.
- Protein Data Bank — rcsb.org. The worldwide archive of experimentally determined biomolecular structures.