Organic Chemistry

A working tour of the chemistry of carbon — the element that builds every drug, every polymer, every fuel, and every living cell. Once you can name a compound, identify its functional groups, spot its stereocenters, and push the electrons of its reactions, organic chemistry stops being memorization and starts being a reasoning system with rules.

Prereq: bonding, equilibrium Read time: ~45 min Interactive figures: 1 Code: RDKit / SMILES

1. Why carbon is special — and why you should care

Look at the label of any prescription bottle. The molecules listed are carbon skeletons decorated with a handful of functional groups. Look at a plastic bag, a nylon jacket, a gasoline pump, a vitamin tablet, a lithium battery separator, or the protein coat on the virus your flu shot is trained against — all carbon. Biology runs on carbon. Industrial chemistry is mostly carbon. The entire pharmaceutical industry is essentially "which carbon skeleton, with which functional groups, in which three-dimensional arrangement, binds best to the target protein." That is organic chemistry.

Carbon earns this unique status because of a few quiet properties. It forms four strong covalent bonds of comparable energy to itself and to almost every other nonmetal. Those bonds are stable at biological and industrial temperatures but not so stable that they can't be rearranged with the right reagent. Carbon can form long chains, rings, branches, double bonds, triple bonds, and aromatic systems, giving an almost unlimited structural vocabulary. Nitrogen, oxygen, sulfur, and phosphorus slot into that vocabulary as "functional groups" — reactive handles that let you convert one molecule into another on demand.

THE PUNCHLINE

Organic chemistry is four things in a trench coat: a naming system (IUPAC), a catalog of functional groups (the reactive handles), a language of stereochemistry (where atoms sit in 3D), and a library of mechanisms (how electrons flow when reactions run). Master those four and you can read any paper in the pharma or materials literature.

Why care, whether you are a drug designer, a battery chemist, a polymer engineer, a clinician, or a PM trying to understand a biotech pitch? A few concrete reasons:

Drug discovery. Lead optimization is organic chemistry — swap a methyl for a fluorine, change an ester to an amide, add a stereocenter, see what happens to binding and metabolism.
Materials. Conjugated polymers in OLED displays, the electrolyte in a lithium battery, the resin in a 3D printer — all organic.
Clinical pharmacology. Drug metabolism is mostly oxidation and conjugation of functional groups. Understanding first-pass metabolism and half-life means understanding which bond the liver is going to break first.
Environmental chemistry. Persistent organic pollutants, atmospheric oxidation of hydrocarbons, and microplastic degradation all hinge on organic mechanisms.
Synthesis planning. Retrosynthesis — working backward from a target to cheap starting materials — is how every total synthesis and every API (active pharmaceutical ingredient) gets made.

This page teaches organic chemistry from zero. You need to know what a covalent bond is, what "electron pair" means, and roughly what an equilibrium constant is. Everything else we build.

2. Vocabulary cheat sheet

Skim this now. Each term gets a full treatment below.

Symbol / term	Read as	Means
$\text{C}_n\text{H}_{2n+2}$	"alkane formula"	General formula for a saturated, acyclic hydrocarbon (alkane).
sp$^3$, sp$^2$, sp	"s-p-three, s-p-two, s-p"	Hybridization states of carbon — tetrahedral, trigonal planar, and linear.
R–	"R group"	A generic placeholder for "some alkyl/aryl chain." Like an $x$ in algebra.
–OH, –NH$_2$, –COOH	"hydroxyl, amine, carboxyl"	Three of the most common functional groups.
S$_N$1, S$_N$2	"S-N-one, S-N-two"	Unimolecular and bimolecular nucleophilic substitution — two ways a leaving group departs and a nucleophile takes its place.
E1, E2	"E-one, E-two"	Unimolecular and bimolecular elimination — making a double bond by kicking out a leaving group and a proton.
R/S	"R or S"	Cahn-Ingold-Prelog descriptors for a chiral center — right-hand or left-hand spatial arrangement.
cis/trans (Z/E)	"cis/trans" or "Z/E"	Geometric isomers across a double bond — same-side vs opposite-side.
$pK_a$	"p-K-a"	Acid strength on a log scale. Crucial for predicting acid/base reactivity.
retron	"retron"	A substructure in the target molecule that suggests a specific disconnection in retrosynthesis.

One convention. We draw organic molecules as skeletal structures: carbons are unlabeled vertices and line-ends, and hydrogens on carbon are implicit. A zig-zag of four lines is butane. A hexagon with a circle is benzene. A skeletal drawing packs enormous information into a tiny picture once you learn to read it.

3. Alkanes, alkenes, alkynes — the carbon skeletons

Start with the simplest organic molecules: compounds of only carbon and hydrogen. These come in three flavors based on the bonds between carbons.

Alkanes — only single C–C bonds. Saturated. General formula $C_n H_{2n+2}$ for a straight or branched chain, $C_n H_{2n}$ for a ring. Methane ($CH_4$), ethane ($C_2H_6$), propane, butane, and so on. Carbon is sp$^3$ hybridized, tetrahedral, with ~109.5° bond angles. Relatively unreactive — think natural gas, wax, motor oil.
Alkenes — at least one C=C double bond. General formula $C_n H_{2n}$. Ethylene ($C_2H_4$), propylene, butadiene. The sp$^2$ carbons are planar, bond angle ~120°. The $\pi$ bond is reactive — alkenes are how organic chemists make new C–C and C–X bonds.
Alkynes — at least one C$\equiv$C triple bond. General formula $C_n H_{2n-2}$. Acetylene ($C_2H_2$), propyne. sp carbons are linear, 180°.

The degree of unsaturation (double bonds + rings) of a hydrocarbon is

\text{DoU} \;=\; \frac{2n+2-m}{2},

Degree of unsaturation

$n$: Number of carbon atoms in the molecule.
$m$: Number of hydrogen atoms (plus number of halogens; subtract nitrogens).
$2n + 2$: The maximum hydrogen count for a fully saturated acyclic alkane.
DoU: How many H$_2$ "units" you are missing compared with that maximum. Each double bond or ring accounts for one.

Practical use. If a drug candidate has a molecular formula from mass spec and you compute DoU = 7, you know you have some combination of rings and $\pi$ bonds adding up to seven. Benzene alone is DoU 4 (three double bonds + one ring). A fused bicyclic aromatic like naphthalene is DoU 7. Mass spec plus DoU is the first sanity check every med chem group runs.

Why alkenes are reactive and alkanes are not

The $\pi$ electrons of a C=C double bond sit above and below the internuclear axis, exposed and polarizable. An electrophile — something electron-hungry — can grab them. A C–C single bond, by contrast, has its electron density buried between the nuclei, and is essentially inert under mild conditions. This is the whole reason alkenes are the workhorses of organic synthesis: they are reactive enough to transform but stable enough to handle.

RULE OF THUMB

If a molecule has a $\pi$ bond, a lone pair, or a polar bond, chemists can do something to it. If it has only C–C and C–H single bonds, it is inert to almost everything except free-radical chemistry and combustion.

4. IUPAC nomenclature — naming with rules

IUPAC names encode structure in a strict grammar. Once you learn the grammar, a name like "(2R,3S)-3-amino-2-hydroxy-4-methylpentanoic acid" stops being gibberish and starts being a complete recipe for drawing the molecule. Here are the rules for a straight-chain organic compound, in order:

Find the longest carbon chain containing the principal functional group. Its length gives the root: meth (1), eth (2), prop (3), but (4), pent (5), hex (6), hept (7), oct (8), non (9), dec (10).
Choose the suffix for the highest-priority functional group. Carboxylic acid (-oic acid) > ester (-oate) > amide (-amide) > nitrile (-nitrile) > aldehyde (-al) > ketone (-one) > alcohol (-ol) > amine (-amine) > alkene (-ene) > alkane (-ane).
Number the chain so the principal group gets the lowest locant. Ties? Go to the next-highest group, then substituents.
Prefix all substituents alphabetically, each with its locant. Lowercase Greek prefixes (di, tri) don't count for alphabetization. So "3-ethyl-2,2-dimethylpentane" sorts e before m.
Stereo descriptors (R/S, E/Z) go at the very front in parentheses.

An example. Take a molecule with six carbons in its longest chain, a carboxylic acid on one end, a methyl branch on carbon 3, and an OH on carbon 4. Acid wins; number from the acid end so it is C1; name is "4-hydroxy-3-methylhexanoic acid." You just named $\beta$-methylmevalonic acid without having to memorize the trivial name.

Test you can do it. Draw 2-chloro-3,3-dimethylbutane. It's a four-carbon chain (butane), with two methyls both on C3, and a chlorine on C2. The structure is $CH_3 \text{-} CHCl \text{-} C(CH_3)_2 \text{-} CH_3$. If you got that, you can read names going forward.

5. Functional groups — the reactive handles

A functional group is a small cluster of atoms that shows up in many different molecules and behaves about the same everywhere. That's the key insight: if you know how an ester reacts, you know how every ester reacts, modulo steric and electronic tweaks. There are maybe fifteen functional groups that cover 95% of what you'll meet in a first course, and about seven that show up in most drug molecules.

Group	Formula	Example	Key behavior
Alkane	R–H	Hexane	Inert. Solvent, fuel.
Alkene	R$_2$C=CR$_2$	Ethylene	Electron-rich $\pi$ bond; adds electrophiles.
Alkyne	RC$\equiv$CR	Acetylene	Like alkene but more so; terminal alkynes are weakly acidic.
Alcohol	R–OH	Ethanol	H-bond donor/acceptor. Weakly acidic ($pK_a \approx 16$).
Ether	R–O–R	Diethyl ether	Lewis base, inert solvent.
Amine (1°/2°/3°)	RNH$_2$, R$_2$NH, R$_3$N	Methylamine	Weakly basic (pK$_b \approx 4$ for aliphatic). Nucleophile.
Aldehyde	R–CHO	Acetaldehyde	Electrophilic carbonyl. Easily oxidized to acid.
Ketone	R–CO–R	Acetone	Electrophilic carbonyl. Not easily oxidized.
Carboxylic acid	R–COOH	Acetic acid	Acidic ($pK_a \approx 5$). Forms esters and amides.
Ester	R–COO–R	Ethyl acetate	Hydrolyzes to acid + alcohol. Many drugs are prodrug esters.
Amide	R–CO–NR$_2$	Acetamide	Not basic (N lone pair delocalized into C=O). Peptide backbone.
Nitrile	R–C$\equiv$N	Acetonitrile	Electrophilic C; hydrolyzes to amide, then acid.
Halide	R–X (X = F, Cl, Br, I)	Chloroform	Leaving group; substrate for S$_N$/E reactions.
Aromatic	Ph–	Benzene	$\pi$-delocalized ring. Undergoes EAS, not addition.

The carbonyl group — the single most important handle

Half of organic synthesis revolves around C=O chemistry. The carbonyl carbon is electrophilic because oxygen is more electronegative and pulls electron density toward itself, leaving a partial positive on carbon. Nucleophiles attack carbon; electrophiles grab oxygen's lone pair. Aldehydes, ketones, esters, amides, acids, acid chlorides, and anhydrides all share this reactivity pattern and interconvert among themselves.

R\text{-}CHO + H_2N\text{-}R' \;\rightleftharpoons\; R\text{-}CH(OH)\text{-}NHR' \;\rightleftharpoons\; R\text{-}CH=N\text{-}R' + H_2O.

Imine (Schiff base) formation

$R\text{-}CHO$: A generic aldehyde. The $R$ is any alkyl or aryl group.
$H_2N\text{-}R'$: A primary amine.
$R\text{-}CH(OH)\text{-}NHR'$: The unstable tetrahedral intermediate (a "hemiaminal") with OH and NHR' both on the former carbonyl carbon.
$R\text{-}CH=N\text{-}R'$: The imine product — a carbon-nitrogen double bond, also called a Schiff base.
$\rightleftharpoons$: Every step is reversible; imine formation is driven by removing water.

Why you care. Imine formation is how pyridoxal (vitamin B6) shuttles amino groups in transaminases. It's how biochemists label proteins with amine-reactive dyes. It's how reductive amination — the pharma industry's bread-and-butter amine-synthesis method — starts. One equilibrium, a hundred applications.

6. Isomerism and chirality — 3D matters

Two molecules with the same molecular formula but different structures are isomers. There are three broad types, in increasing subtlety:

Structural isomers — different connectivity. Butane vs isobutane: same $C_4H_{10}$, different skeleton.
Geometric (cis/trans or E/Z) isomers — same connectivity but different geometry across a rigid bond (usually C=C). Cis-2-butene vs trans-2-butene. These interconvert only by breaking the $\pi$ bond, which needs heat or light.
Optical isomers (enantiomers) — same connectivity and geometry, but mirror images that cannot be superimposed. Different hands of the same molecule. They interconvert only by breaking and remaking bonds at the chiral center.

A stereocenter (or chiral center) is a carbon bonded to four different groups. Four different groups means there are exactly two three-dimensional arrangements — like your left and right hands. The Cahn-Ingold-Prelog rules give you an unambiguous label:

Rank the four substituents by atomic number at the first point of difference. Higher Z wins. Break ties by the next shell of atoms.
Orient the molecule so the lowest-priority group points away from you.
Trace 1 → 2 → 3 around the front. Clockwise = R (rectus). Counterclockwise = S (sinister).

WHY THIS MATTERS

In the 1950s, thalidomide was sold as a racemic mixture (50/50 R and S). The R enantiomer sedated. The S enantiomer caused severe birth defects. Same molecular formula, same connectivity, same functional groups — different hand. Since then, every chiral drug must be characterized enantiomer by enantiomer, and most are sold as single enantiomers. "Enantiopure synthesis" is a multi-billion-dollar industry.

Optical rotation and specific rotation

Enantiomers rotate the plane of polarized light equally but in opposite directions. The measurable quantity is the specific rotation:

[\alpha]^{T}_\lambda \;=\; \frac{\alpha}{c \cdot \ell}.

Specific rotation

$[\alpha]^{T}_\lambda$: Specific rotation at temperature $T$ and wavelength $\lambda$ (usually the sodium D line at 589 nm), in deg·mL·g$^{-1}$·dm$^{-1}$.
$\alpha$: Measured rotation of the plane of polarized light, in degrees, on a polarimeter.
$c$: Concentration of the sample, in g/mL.
$\ell$: Path length of the polarimeter cell, in dm (decimeters — the instrument is weird and historical).

Why the quirky units. The decimeter convention comes from 19th-century sugar chemistry, where tube lengths were standardized to 1 dm. Modern chemists inherited the units even though nobody uses decimeters for anything else. The upshot: specific rotation is a molecular fingerprint of enantiomeric identity. Sucrose is $+66.4°$, fructose is $-92°$, hence "invert sugar" — the inversion of rotation sign when sucrose is hydrolyzed.

If a synthesis gives you an "enantiomeric excess" (ee) of 80%, it means you have 90% of one enantiomer and 10% of the other (the difference is 80%, hence the name). Drug regulators typically want >99% ee.

7. Reaction mechanisms — pushing electrons

Every organic reaction is a choreographed dance of electron pairs. Curly-arrow notation tracks the dance: a curly arrow starts at an electron pair (a lone pair, a $\pi$ bond, or a $\sigma$ bond) and ends where that pair is going. Learn to push arrows and you can predict reactions you have never seen.

Nucleophilic substitution: S$_N$1 vs S$_N$2

A nucleophile (Nu:, electron-rich) attacks a carbon bearing a leaving group (LG, electron-poor). The leaving group departs with the bonding pair. There are two distinct pathways.

S$_N$2 is concerted: the nucleophile attacks from the back side as the leaving group departs, passing through a single five-coordinate transition state. The backside attack inverts the stereochemistry at carbon — "Walden inversion" — like an umbrella flipping in the wind. Rate is first-order in nucleophile and first-order in substrate:

\text{rate}_{S_N 2} \;=\; k_2 \, [\text{Nu}] \, [\text{R-LG}].

S$_N$2 rate law

$k_2$: Second-order rate constant, units M$^{-1}$ s$^{-1}$. Depends on temperature, solvent, and identity of substrate/nucleophile.
$[\text{Nu}]$: Molar concentration of the nucleophile.
$[\text{R-LG}]$: Molar concentration of the substrate (the carbon with the leaving group).
rate: Rate of product formation, M/s. Doubling either concentration doubles the rate.

Intuition. Both pieces must meet at the same place at the same time, so the rate scales with how likely they are to collide — the product of the two concentrations. Backside attack is hindered by bulky substituents, so S$_N$2 strongly prefers primary (1°) and methyl substrates over tertiary (3°) ones.

S$_N$1 is stepwise. First, the leaving group leaves on its own, making a carbocation intermediate. Then the nucleophile attacks the flat, sp$^2$ carbocation from either face. The rate depends only on the rate-determining ionization, so it is first-order in substrate alone and zero-order in nucleophile:

\text{rate}_{S_N 1} \;=\; k_1 \, [\text{R-LG}].

S$_N$1 rate law

$k_1$: First-order rate constant, units s$^{-1}$. Strongly dependent on carbocation stability and solvent polarity.
$[\text{R-LG}]$: Molar concentration of substrate. Nucleophile concentration does not appear.

What the zero-order in Nu tells you. The slow step is the substrate falling apart on its own. Whatever the nucleophile concentration, the substrate can't break down any faster. Stability of the carbocation is everything: tertiary > secondary > primary > methyl. Polar protic solvents (water, alcohols) stabilize the developing cation and speed S$_N$1 up. S$_N$1 also racemizes the stereocenter because the flat cation can be attacked from either face.

Elimination: E1 vs E2

Instead of replacing the leaving group with a nucleophile, you can kick out both the leaving group and a proton from the neighboring (β) carbon, forming a double bond. Same two mechanisms:

E2 — concerted, one transition state. Base grabs β-H, the $\sigma$ bond goes into a new C=C $\pi$ bond, leaving group departs all at once. Requires antiperiplanar geometry between the departing H and LG (they must be on opposite sides of the C–C bond). Follows Zaitsev's rule: the more substituted (more stable) alkene is favored.
E1 — stepwise. Leaving group leaves first, making the carbocation. Base then deprotonates a β-H. Competes with S$_N$1 on tertiary substrates; usually you get a mixture.

CHOOSING BETWEEN THE FOUR

Primary substrate + good nucleophile: S$_N$2. Tertiary substrate + polar protic solvent: S$_N$1/E1 mixture. Strong bulky base: E2. Low-basicity nucleophile, cold: S$_N$2. The four mechanisms compete; temperature, solvent, base strength, and substrate steric bulk decide which one wins.

Electrophilic addition to alkenes

Alkenes react with electrophiles by adding across the double bond. Markovnikov's rule says: "the rich get richer" — when HX adds across an unsymmetrical alkene, the H goes to the carbon that already has more H's, so the X ends up on the more substituted carbon (where the more stable carbocation forms). Anti-Markovnikov outcomes happen under radical conditions (HBr with peroxides) or via hydroboration-oxidation.

Electrophilic aromatic substitution (EAS)

Benzene's $\pi$ electrons are stabilized by delocalization — the ring would rather keep its aromaticity than add across a $\pi$ bond. So benzene substitutes an H with an electrophile (E$^+$) rather than adding. The mechanism is:

E$^+$ attacks a carbon of the ring, making a resonance-stabilized "arenium ion" intermediate (also called a Wheland intermediate). Aromaticity is temporarily broken.
A base plucks off the proton on that same carbon, restoring aromaticity. Overall: H replaced by E.

Classic EAS reactions: nitration (HNO$_3$/H$_2$SO$_4$ → E$^+$ = NO$_2^+$), halogenation (Br$_2$/FeBr$_3$), Friedel-Crafts alkylation/acylation (RCl or RCOCl + AlCl$_3$), sulfonation. Substituents already on the ring direct the new group: electron-donors (–OH, –NH$_2$, –OR, –R) are ortho/para directors and activators; electron-withdrawers (–NO$_2$, –CN, –COR) are meta directors and deactivators. Halogens are the weird exception: ortho/para directors but deactivators.

8. Interactive: S$_N$1 vs S$_N$2 vs E1 vs E2 selector

Slide the four knobs — substrate class, nucleophile strength, base strength, and solvent — and watch the prediction shift among the four pathways. The underlying logic is a simple rule table based on energetics.

Substrate: Nucleophile: Base: Solvent:

Pick conditions and the bar chart shows the rough fraction of each mechanism predicted by the standard selectivity rules.

Things to try:

Set primary substrate with a strong nucleophile — S$_N$2 dominates.
Set tertiary with weak nucleophile in polar protic solvent — S$_N$1/E1 mixture.
Switch base to bulky tBuO$^-$ on a secondary or tertiary substrate — E2 takes over.
Primary substrate with bulky base — still E2, because steric hindrance favors deprotonation over backside attack.

9. Retrosynthesis — working backward from the target

Retrosynthesis is how real chemists plan syntheses. Instead of starting from cheap feedstocks and hoping, you draw the target molecule and ask: "What bond could I disconnect to arrive at simpler precursors?" Each disconnection is a hypothetical reverse reaction. You repeat until the precursors are all commercially available.

The notation uses a double-shafted open arrow ⇒ for a retrosynthetic step (as opposed to → for a forward reaction). The pieces you get after a disconnection are called synthons — idealized fragments that you then map to real reagents called synthetic equivalents. A "d$^1$" synthon like "acyl anion" corresponds to a real reagent like a Grignard or a dithiane.

The retrosynthetic rule. For every functional group in the target, ask: which reaction that I know how to run forward would make this group from something simpler? That reaction defines the disconnection. Repeat until you hit a starting material.

Example. Target: 2-phenylbutanoic acid, $PhCH(C_2H_5)COOH$.

Acid → nitrile + hydrolysis, or acid → ester + hydrolysis, or $\alpha$-alkylation of an enolate. Pick the enolate route as a concrete example.
Disconnect the C–C bond $\alpha$ to the carbonyl between the phenyl-bearing carbon and one of the ethyl-bearing carbons. The synthons are a carbanion $\alpha$ to a carboxyl (acyl anion equivalent) and an alkyl halide.
Real reagents: phenylacetic acid (or its ester), a strong non-nucleophilic base to form the enolate (LDA, NaH), and ethyl iodide as the alkylating agent.
Forward synthesis: phenylacetic acid → enolate → alkylate with EtI → 2-phenylbutanoic acid. Three steps on paper, one round in a flask.

Modern retrosynthesis is increasingly automated. Tools like the Chematica / Synthia software from Bartosz Grzybowski's group, and ML systems from Segler, Waller, Coley, and others (2018 onward), can propose multi-step routes to complex targets and rank them by feasibility. Underneath, they still run on the principle you just learned: disconnect at a functional group, check that the forward reaction is known, recurse.

10. Organic chemistry in code — SMILES and RDKit

Chemists represent molecules on a computer using SMILES strings (Simplified Molecular Input Line Entry System). "CCO" is ethanol. "c1ccccc1" is benzene. "CC(=O)Oc1ccccc1C(=O)O" is aspirin. A SMILES is a linear encoding of the molecular graph that any serious chemistry library — RDKit, OpenBabel, ChemAxon — can parse in microseconds.

organic chemistry primitives

from rdkit import Chem
from rdkit.Chem import Descriptors, AllChem, Draw

# Parse a SMILES into an RDKit molecule
aspirin = Chem.MolFromSmiles("CC(=O)Oc1ccccc1C(=O)O")
print("formula:", Chem.rdMolDescriptors.CalcMolFormula(aspirin))
print(f"MW: {Descriptors.MolWt(aspirin):.2f}")          # 180.16
print(f"logP: {Descriptors.MolLogP(aspirin):.2f}")        # ~1.19
print(f"HBD: {Descriptors.NumHDonors(aspirin)}")          # 1
print(f"HBA: {Descriptors.NumHAcceptors(aspirin)}")        # 3

# Count functional groups using SMARTS patterns
patterns = {
    "carboxylic_acid": "C(=O)[OH]",
    "ester":           "C(=O)O[C,c]",
    "amide":           "C(=O)N",
    "aromatic_ring":   "c1ccccc1",
}
for name, smarts in patterns.items():
    patt = Chem.MolFromSmarts(smarts)
    n = len(aspirin.GetSubstructMatches(patt))
    print(f"  {name}: {n}")

# Generate a 3D conformer and optimize with MMFF
mol3d = Chem.AddHs(aspirin)
AllChem.EmbedMolecule(mol3d, randomSeed=42)
AllChem.MMFFOptimizeMolecule(mol3d)
print("num atoms (incl. H):", mol3d.GetNumAtoms())

# Assign CIP stereochemistry labels (R/S, E/Z)
ibuprofen = Chem.MolFromSmiles("CC(C(=O)O)c1ccc(cc1)CC(C)C")
Chem.AssignStereochemistry(ibuprofen, cleanIt=True, force=True)
for atom in ibuprofen.GetAtoms():
    if atom.HasProp("_CIPCode"):
        print(f"atom {atom.GetIdx()} ({atom.GetSymbol()}): " + atom.GetProp("_CIPCode"))

# A hand-rolled degree-of-unsaturation calculator and Lipinski filter.

def degree_of_unsaturation(C, H, N=0, X=0):
    # X = total halogens; ignore O and S
    return (2 * C + 2 - H - X + N) // 2

print("benzene DoU:", degree_of_unsaturation(6, 6))          # 4
print("caffeine DoU:", degree_of_unsaturation(8, 10, N=4))    # 6

def lipinski_violations(mw, logP, hbd, hba):
    # Classic Rule of Five for oral bioavailability (Lipinski 1997).
    violations = 0
    if mw > 500: violations += 1
    if logP > 5: violations += 1
    if hbd > 5: violations += 1
    if hba > 10: violations += 1
    return violations

# Aspirin passes (0 violations); cyclosporine fails spectacularly (3+).
print("aspirin violations:", lipinski_violations(180.16, 1.19, 1, 3))

A few practical notes about how computational organic chemistry actually works in industry:

SMILES, InChI, and SMARTS are the three standard string representations. SMILES for humans and inputs, InChI for database deduplication, SMARTS for substructure search and rules (the patterns above).
Tanimoto similarity over Morgan fingerprints is the default for "find molecules like this one." Values above 0.4-0.5 usually mean meaningful resemblance; above 0.85 often means effectively the same scaffold.
Conformer generation with RDKit's ETKDG algorithm plus MMFF or UFF optimization gets you reasonable 3D geometries for docking. For anything serious (binding energies, transition states), you move to quantum chemistry. See quantum chemistry.

11. Cheat sheet

Hybridization

sp$^3$ tetrahedral 109.5°, sp$^2$ planar 120°, sp linear 180°

Single, double, triple bonds.

Degree of unsaturation

$(2n+2-m)/2$

Rings + $\pi$ bonds from formula alone.

IUPAC priority

acid > ester > amide > aldehyde > ketone > alcohol > amine

Highest gets the suffix and the lowest locant.

Carbonyl reactivity

Electrophilic at C, basic at O

Nucleophiles attack C; acids/H-bonds activate the C=O.

R/S assignment

Rank 1-4 by Z, orient #4 away, trace 1→2→3

CW = R, CCW = S.

S$_N$1 vs S$_N$2

1: stepwise, 3° favored, racemizes

2: concerted, 1° favored, inverts (Walden).

Markovnikov

"H goes where the H's are"

HX across alkene → more stable carbocation intermediate.

EAS directing

Donors → o/p activators. Acceptors → m deactivators. Halogens → o/p deactivators.

Exception exists.

Retrosynthesis

Disconnect at functional groups; map synthons to reagents

⇒ arrow is reverse of →.

Lipinski Ro5

MW < 500, logP < 5, HBD ≤ 5, HBA ≤ 10

Rough oral-bioavailability filter.

Organic Chemistry

1. Why carbon is special — and why you should care

2. Vocabulary cheat sheet

3. Alkanes, alkenes, alkynes — the carbon skeletons

Why alkenes are reactive and alkanes are not

4. IUPAC nomenclature — naming with rules

5. Functional groups — the reactive handles

The carbonyl group — the single most important handle

6. Isomerism and chirality — 3D matters

Optical rotation and specific rotation

7. Reaction mechanisms — pushing electrons

Nucleophilic substitution: S$_N$1 vs S$_N$2

Elimination: E1 vs E2

Electrophilic addition to alkenes

Electrophilic aromatic substitution (EAS)

8. Interactive: S$_N$1 vs S$_N$2 vs E1 vs E2 selector

9. Retrosynthesis — working backward from the target

10. Organic chemistry in code — SMILES and RDKit

11. Cheat sheet

Hybridization

Degree of unsaturation

IUPAC priority

Carbonyl reactivity

R/S assignment

S$_N$1 vs S$_N$2

Markovnikov

EAS directing

Retrosynthesis

Lipinski Ro5

See also

Bonding

Chemical equilibrium

Biochemistry

Quantum chemistry

Kinetics

Further reading