← Learning Hub AI / ML Track Math CS
UPDATED Apr 11 2026 21:30

The AI/ML Landscape

A pragmatic tour for programmers who shipped code yesterday and want to ship AI tomorrow. Hover any dotted term. Click any Deep Dive → to jump into a full textbook-level lesson with math and interactive figures.

> TIMELINE · 1940 – 2026   FROM PERCEPTRON TO REASONING MODELS

Scroll horizontally · click any marker to jump to its section

The 2026 State of AI // big picture

If you've been writing code but not tracking AI research, here's the headline: the field spent 70 years slogging toward general-purpose machines that can read, write, see, and reason. In the last ~3 years that finally clicked, and now a single architecture — the transformer — powers almost everything you hear about: ChatGPT, Claude, Gemini, Stable Diffusion, GitHub Copilot, Midjourney, Sora.

The 60-second summary

2012 — Neural networks beat humans at image recognition. Everyone pivots to deep learning.
2017 — Google publishes Attention Is All You Need. The transformer is born.
2020 — OpenAI releases GPT-3. It can write essays. Scaling becomes the strategy.
2022 — ChatGPT launches. AI becomes a consumer product overnight.
2024 — Reasoning models (o1, Claude extended thinking) solve PhD-level problems.
2026 — Agents, multimodal everything, and a Cambrian explosion of open-source models.

Who the major players are

OpenAI

GPT series, DALL·E, Sora, o-series reasoning. Frontier lab. Closed weights.

frontierclosed

Anthropic

Claude family. Focus on safety, long context, and agentic coding.

frontierclosed

Google DeepMind

Gemini, AlphaFold, Imagen, Veo. Deep research bench, full stack down to TPUs.

frontierclosed

Meta AI (FAIR)

Llama series. Meta is the quiet giant of open-weight models — they ship the best ones.

openresearch

xAI

Grok series. Notable for massive compute scale (Colossus cluster).

frontier

Mistral, DeepSeek, Qwen, Z.AI

European + Chinese open-weight labs shipping extremely capable small models fast.

openrising

NVIDIA

Not a model lab — the picks-and-shovels company. They make the GPUs everyone else trains on.

infra

Hugging Face

The GitHub of ML. Hosts hundreds of thousands of open models, datasets, and demos.

platform

What the field actually looks like in 2026

  • LLMs eat the world. Most "AI startup" pitches are thin wrappers around OpenAI/Anthropic APIs. The real product is distribution + UX + domain data.
  • Classical ML is still everywhere. Your credit score, ad targeting, spam filter, and Uber's surge pricing are not running GPT-4. They're XGBoost or logistic regression.
  • Open source is catching up fast. Llama 4, DeepSeek-V3/R1, Qwen 3, and Mistral models now match GPT-4-class performance you can run on your own hardware.
  • Inference cost is the new game. Training the model is a one-time expense; serving it to millions of users is the forever bill.
  • Agents & tool use are the current frontier. Models that can browse, call APIs, write and run code, and chain actions — not just chat.
  • Reasoning models changed the plot. OpenAI's o1/o3 and Anthropic's Claude extended thinking spend "thinking time" before answering.

Foundations & Vocabulary // must-know terms

Before diving in, here's the vocabulary that shows up in every AI blog post.

The core loop

  1. Training — Show the model lots of examples. It adjusts its weights to make better predictions.
  2. Inference — Use the trained model to make predictions on new data.

Types of learning

Supervised

Input→output pairs. "1M labeled cat/dog photos."

Unsupervised

No labels. Find structure on its own (clustering).

Self-supervised

The model creates its own labels. LLMs do this: "predict the next word."

Reinforcement Learning

Agent takes actions, gets rewards. AlphaGo, RLHF.

How a neural net actually learns

  • Loss function — how wrong was that prediction?
  • Activation function Deep Dive → — the nonlinearity that makes depth meaningful.
  • Gradient descent Deep Dive → — take a small step toward lower loss.
  • Backpropagation Deep Dive → — the algorithm that efficiently computes those gradients.

Classical ML // 1950s – today

The unsexy stuff that runs half of production systems.

Rule of thumb

If your data is tabular, start with XGBoost or LightGBM. A neural net will almost never beat it.

Linear / Logistic Regression

1800s

Still the first thing to try. Interpretable weights you can show to your lawyer.

Decision Trees

1960s

A flowchart learned from data. Alone they overfit; combined in forests they dominate.

Random Forests

Breiman, 2001

Hundreds of trees on random subsets. Robust, hard to break.

XGBoost / LightGBM

Chen & Guestrin, 2016

The king of tabular data. Won basically every Kaggle competition for years.

production

SVM

Cortes & Vapnik, 1995

Finds the best separating hyperplane. Pre-deep-learning champion.

legacy

k-NN

1960s

Find the k closest examples and vote. No training step.

k-Means

1957, Lloyd

Unsupervised clustering. Customer segmentation.

PCA

Pearson, 1901

Squash high-dim data into fewer dims. Used everywhere.

Neural Networks // 1943 – 1986

The idea of a "neural network" started as a wild simplification of how brain neurons work. It took 40+ years to make them actually useful.

The Perceptron (1957) Deep Dive →

Frank Rosenblatt at Cornell built a machine that could learn to classify images — an actual hardware device, the Mark I Perceptron, funded by the US Navy. It's the ancestor of everything on this page.

▸ Interactive Perceptron — step through the computation Step 0 / 7
x₁ x₂ x₃ w₁ w₂ w₃ Σ + b z=? σ(z) ŷ ?
Inputs
AND gate OR gate
Ready
Press Step → to trace the forward pass through the neuron.
Tip: edit any input or weight above to see different behavior.

Multi-Layer Perceptron (MLP) & Backprop (1986) Deep Dive →

Stack multiple layers. In 1986, Rumelhart, Hinton, and Williams published backpropagation — the single most important algorithm in modern AI. It runs the chain rule of calculus backward through the network to compute gradients for every weight. Click the Deep Dive link above for a full textbook treatment with derivation, worked examples, and interactive step-through of a training iteration.

Deep Neural Networks (DNN)

"Deep" is just marketing for "lots of layers." The reason they weren't used earlier: you need huge datasets and huge compute. It took until the mid-2000s — when GPUs became programmable and the internet produced massive image datasets — for deep learning to become practical.

Convolutional Neural Networks (CNN) Deep Dive →

Yann LeCun invented CNNs in 1989 to read handwritten zip codes. The insight: instead of connecting every pixel to every neuron, slide a small kernel across the image. Same weights reused → efficient, and captures the fact that "an edge is an edge regardless of where it is."

▸ Interactive Convolution — slide the kernel one step at a time Position 0 / 9
input (5×5) kernel (3×3) output (3×3) * =
Setup
Click a cell in the diagram to edit its value. Or try a preset kernel:
Edge detector Blur Sharpen Sobel X
Ready
Press Step → to move the kernel to the next position and compute one output.
Output[r,c] = Σ input[r+i, c+j] × kernel[i,j]

RNN, LSTM & GRU — handling sequences Deep Dive →

CNNs are great for images but don't handle sequential data. RNNs process tokens one at a time while maintaining a hidden state. Hochreiter & Schmidhuber's 1997 LSTM solved the vanishing-gradient problem of basic RNNs. Dominated NLP until transformers killed them in 2017.

Autoencoders

A network trained to compress its input into a small bottleneck and then reconstruct it. Variational Autoencoders (VAE, 2013) are the generative cousin — they sit inside every modern diffusion model.

GANs — generative adversarial networks Deep Dive →

Ian Goodfellow, 2014. Two networks: a generator tries to make fake images, a discriminator tries to spot the fakes. They improve together. By 2022, diffusion models had taken over the image-generation crown.

Word embeddings Deep Dive →

Mikolov's Word2Vec (2013) showed that training a shallow network to predict word co-occurrences produces vector spaces where king − man + woman ≈ queen. Every LLM still starts by mapping tokens through an embedding matrix — it's the bridge between discrete symbols and the geometry of meaning.

Attention & Transformers Deep Dive →// 2017 – the big bang

In 2017, eight researchers at Google Brain published "Attention Is All You Need." Everything you know as "modern AI" is built on it.

The insight: self-attention

Instead of sequential processing, let every token look at every other token directly. The model computes three vectors per token:

  • Query (Q) — "what am I looking for?"
  • Key (K) — "what do I contain?"
  • Value (V) — "what information do I carry?"

Attention score(i,j) = softmax(Q_i · K_j / √d_k), and the output is the score-weighted sum of values.

▸ Interactive Self-Attention — "The cat sat" with dk=2 Step 0 / 5
"The" "cat" "sat"
Editable Q, K, V (query row for "cat")
We'll compute attention for the query token "cat" attending to all three tokens.
Keys
Values
Ready
Press Step → to walk through self-attention for query "cat".

Three flavors

Encoder-only

BERT (2018). Good for classification, search, embeddings.

Decoder-only

GPT, Claude, Llama. Generates one token at a time.

dominant

Encoder-Decoder

T5, BART. Read everything in, generate everything out.

Large Language Models (LLMs) // 2018 – now

Take a decoder transformer. Make it huge. Train it on a trillion words by playing "guess the next token." You get GPT.

GPT-1 (2018)

OpenAI · 117M

Proved the recipe: pretrain + fine-tune.

BERT (2018)

Google · 340M

Bidirectional encoder. Ran inside Google Search.

GPT-2 (2019)

OpenAI · 1.5B

Could write coherent paragraphs.

GPT-3 (2020)

OpenAI · 175B

Few-shot learning worked. Everything changed.

ChatGPT (2022)

OpenAI · RLHF

100M users in 2 months.

inflection

GPT-4 (2023)

OpenAI · ~1.7T MoE

First model that felt like it was reasoning.

Claude (2023-2026)

Anthropic

Long context, strong coding, constitutional AI.

Llama (2023-2026)

Meta · open

Best open-weight frontier models.

open

o1 / o3 (2024-25)

OpenAI

Reasoning models. Chain of thought at training time.

frontier

DeepSeek-R1 (2025)

DeepSeek · open

Open reasoning at o1 quality.

open

Multimodal Models & VLMs // 2021 – now

CLIP (OpenAI, 2021) trained on 400M image-caption pairs and learned a shared embedding space for images and text. Today every frontier model is multimodal — paste a screenshot, ask a question.

Diffusion Models Deep Dive →// 2020 – now

Train a model to denoise an image by one small step. At generation time, start with pure noise and run the denoiser ~50 times. What emerges is a coherent image, guided by your text prompt.

▸ Interactive Diffusion — forward noising process t = 0 / 1000
x₀ (clean image)
Noise schedule (linear β)
Step jumps t by T/5 each press, showing how the image degrades via x_t = √ᾱ_t · x_0 + √(1-ᾱ_t) · ε
Ready (t = 0)
Clean image. Press Step → to advance t.

Agents, RAG & Tool Use // 2023 – now

The current frontier isn't bigger models — it's letting models take actions. Read files, call APIs, browse the web, run code.

RAG — Retrieval Augmented Generation

Split docs into chunks → embed them → store in a vector DB → on query, retrieve relevant chunks → put them in the prompt. Almost every "chat with your PDFs" product is this.

Tool use / function calling

Give the model a list of functions. It outputs structured JSON asking for a call. Your code runs the function and feeds the result back. Repeat.

Agents

LLM in a loop: decide what to do → execute → observe → repeat until done. Claude Code, Cursor, Devin.

MCP — Model Context Protocol

Anthropic's 2024 open standard for exposing tools/data to any AI client. The "USB-C for LLMs."

Emerging Technologies // the 2025–26 frontier

Where the research edge lives right now. These are the ideas currently reshaping what a "model" even is — sparse routing, learned reasoning, linear-time sequence mixers, on-device trillions, and the alignment scaffolding that has to keep up.

Mixture of Experts (MoE) Deep Dive →

Instead of running every parameter on every token, a router sends each token to k out of N "expert" sub-networks. The model can have a trillion parameters but only activate ~30B per token. This is how Mixtral, DeepSeek-V3, and GPT-4-class systems get huge effective capacity at serving-time cost closer to a much smaller dense model. Key problems: load balancing (don't starve experts), router stability, and all-to-all communication.

Reasoning & Test-Time Compute Deep Dive →

OpenAI's o1 (2024) and DeepSeek-R1 (2025) broke a quiet assumption: that a model's "thinking" happens only during pretraining. Now models are trained — usually via RL on verifiable answers — to generate long chains of thought before answering. Spending more compute at inference time on search, self-critique, and process-reward guidance gives big accuracy jumps on math, code, and science. It's an entirely new axis of scaling.

State Space Models & Mamba Deep Dive →

Transformers are O(n²) in sequence length. State Space Models (S4, then Mamba-1 and Mamba-2) replace attention with a selective linear recurrence that runs in O(n) and scans very long contexts cheaply. They rival Transformers on language and dominate on audio, DNA, and very long-context tasks. Hybrids (Jamba, Zamba, Samba) mix SSM blocks with a few attention layers for the best of both.

Foundation Models Deep Dive →

Pretrain once on a massive, diverse corpus; adapt everywhere. The term was coined at Stanford (2021) to name the shift: GPT, Claude, Gemini, Llama, DINOv2, SAM, Whisper are all bases that downstream apps build on. The key results are scaling laws (Kaplan 2020, Chinchilla 2022) telling you how to trade off parameters vs. tokens vs. compute, and emergent capabilities that only appear past certain scales.

Retrieval-Augmented Generation (RAG) Deep Dive →

The most boringly practical technique on this list, and probably the most widely deployed. Embed documents into a vector space, retrieve the top-k most similar chunks at query time, stuff them in the prompt. 2025 improvements: hybrid dense+sparse retrieval, learned rerankers, HyDE, GraphRAG, agentic multi-hop retrieval, and long-context models that let you skip retrieval for small corpora entirely.

Neuro-Symbolic AI Deep Dive →

The attempt to marry neural perception (fast, fuzzy, learned) with symbolic reasoning (slow, precise, rule-based). Think DeepMind's AlphaGeometry solving IMO problems, or differentiable theorem provers. The bet is that pure scaling will hit a wall on tasks needing verified logical steps, and that hybrid systems — neural front-ends feeding a symbolic solver — will close the gap.

Edge & On-Device AI Deep Dive →

Phones, laptops, and microcontrollers are now running 3B–8B parameter models locally. The enabling tricks: 4-bit and 2-bit quantization (GPTQ, AWQ, bitsandbytes), knowledge distillation into small student models, structured pruning, and speculative decoding. Apple Intelligence, Gemini Nano, Phi-3-mini, and Llama-3.2-1B all live here. Private by construction, zero latency, zero per-token cost.

AI Safety & Alignment Deep Dive →

How do you make a model that's not just capable but aligned with what its users actually want, and honest about what it doesn't know? The toolkit: RLHF (2022), Constitutional AI and DPO (2023), Constitutional Classifiers and deliberative alignment (2024–25), interpretability via sparse autoencoders (Anthropic's 2024 dictionary-learning work), and systematic red-teaming. This is the field that has to keep pace with everything else on this list.

AGI & the Singularity // the endgame Deep Dive →

In 2022 "AGI" was a word you whispered at parties to sound interesting. In 2026 it's a line item in Microsoft's contract with OpenAI, a KPI on Demis Hassabis's performance review, and the subject of Senate hearings. This section is the map.

Before you read

Everything below this line is a mix of reported fact (labs, funding, products) and expert speculation (timelines, takeoff, impacts). Sources are linked so you can check the receipts. No one actually knows when or if AGI arrives — be suspicious of anyone, including the labs, who says otherwise.

What counts as AGI?

There is no universally agreed definition, which is why headlines feel contradictory. Four influential ones:

OpenAI's charter (2018)

"Highly autonomous systems that outperform humans at most economically valuable work." Explicitly an economic bar, not a cognitive one.

economic

DeepMind's levels (2024)

A 6-level scale from "No AI" → "Superhuman" with a cross-axis for narrow vs general. Their paper (Morris et al.) puts Gemini & GPT-4 at Level 1 General — "emerging AGI."

technical

Metaculus / Karnofsky "transformative AI"

An AI that causes a transition comparable to the agricultural or industrial revolution. Deliberately agnostic about internals.

impact

The Turing Test (obsolete)

Already passed in limited form by GPT-4 class models in controlled studies (Jones & Bergen 2024). Nobody serious uses it as a finish line anymore.

historical

Mapped onto today's systems: frontier LLMs are better than most humans at many discrete tasks (coding contests, competition math, the bar exam) while still failing at things a 5-year-old finds easy (robust physical common sense, long-horizon planning without tools, genuinely novel research). That mismatch is exactly why the "AGI or not?" debate never resolves.

The labs racing to build it

Every serious frontier lab has AGI (or some synonym — "powerful AI", "superintelligence", "transformative AI") as an explicit, stated goal. Below is the 2026 field.

OpenAI

San Francisco · founded 2015

Sam Altman's stated mission is AGI that "benefits all of humanity." 2026 roadmap: GPT-5 (released mid-2025), o-series reasoning models, and the "Stargate" compute build-out announced Jan 2025 — a $500B joint venture with Oracle, SoftBank, and MGX to build US AI infrastructure.

frontier$500B Stargate

Anthropic

San Francisco · founded 2021

Founded by ex-OpenAI safety staff (Dario & Daniela Amodei). Mission is "powerful AI done safely." Publicly bets that frontier capabilities must be developed at the safety frontier to steer the field. Dario's "Machines of Loving Grace" (Oct 2024) argues capable AI could compress 50–100 years of biomedical progress into 5–10.

frontiersafety-first

Google DeepMind

London + Mountain View · merged 2023

Demis Hassabis (Nobel 2024 for AlphaFold) has said AGI is a 5–10 year horizon. Full stack: TPUs, Gemini 2.x, AlphaFold, AlphaProof, AlphaGeometry, Project Astra (universal assistant).

frontierfull-stack

xAI

Austin · founded 2023

Musk's post-OpenAI lab. Grok series. Memphis "Colossus" cluster went from 0 → 100k H100s in 122 days (2024) and is being expanded to 1M GPUs. Explicitly framed as a race to AGI.

frontierColossus 1M

Meta AI (FAIR + GenAI)

Menlo Park

Yann LeCun leads FAIR; Alexandr Wang now runs a new "Superintelligence Labs" unit after Meta's $14B investment in Scale AI (June 2025). Llama 4 family is open-weight. LeCun is the loudest AGI skeptic at a frontier lab — his bet is on world-model-based "objective-driven" AI, not pure LLMs.

openworld models

Safe Superintelligence (SSI)

Palo Alto + Tel Aviv · founded 2024

Ilya Sutskever (ex-OpenAI co-founder + chief scientist) with Daniel Gross and Daniel Levy. One product only: "safe superintelligence." Raised $1B at a $5B valuation in Sep 2024; reportedly raising again at ~$30B in 2025.

stealthsuperintelligence

DeepSeek

Hangzhou · founded 2023

Spun out of quant fund High-Flyer. Shipped DeepSeek-V3 (Dec 2024) and R1 reasoning model (Jan 2025) with reported training cost ~$6M — an order of magnitude below US frontier labs. Set off the "DeepSeek moment" market panic in Jan 2025.

Chinaopen

Microsoft AI

Redmond

Mustafa Suleyman (ex-DeepMind, ex-Inflection) leads MAI. Primary vehicle is still the OpenAI partnership ($13B+ invested), but Microsoft is building its own frontier model capacity. The OpenAI contract has a clause that cuts Microsoft's access the moment OpenAI's board declares AGI achieved — the most-discussed corporate clause in tech.

frontier

Mistral AI

Paris · founded 2023

Europe's flagship. Co-founded by ex-DeepMind and ex-Meta researchers. Mix of open and commercial models. Not explicitly an "AGI lab" — more of a sovereign-AI play — but in the frontier conversation.

Europe

Zhipu / Moonshot / Qwen (Alibaba)

China

The "Chinese tigers." GLM-4, Kimi, and Qwen 3 are all frontier-adjacent and mostly open-weight. Under US export controls on advanced GPUs, they've leaned into efficiency — which the DeepSeek papers then weaponized.

China

Google / Microsoft / Amazon — hyperscalers

infrastructure

Not "labs" strictly but they own the compute. In 2025 the big four hyperscalers combined committed over $300B in capex, most of it AI-related. Without them the labs above cannot train.

infra

NVIDIA

Santa Clara

The one company that wins either way. Crossed $3T market cap in 2024 on the back of H100/H200/B100/GB200 demand. Also ships its own foundation models (Nemotron, Cosmos world models) to keep the stack sticky.

infrapicks-and-shovels

How do we measure progress?

The benchmarks that used to matter (GLUE, SuperGLUE, MMLU) are saturated — frontier models score above 90% and the remaining errors are often benchmark mistakes, not model mistakes. The 2025–26 yardsticks are harder:

  • ARC-AGI-2 (Chollet's puzzles) — abstraction & reasoning on novel tasks. o3 hit 75.7% on ARC-AGI-1 in Dec 2024 (vs. ~10% for GPT-4). ARC-AGI-2 was designed to be harder and frontier scores remain below human baselines.
  • Humanity's Last Exam (Scale + CAIS, 2025) — 3000+ questions at the frontier of human expertise across physics, math, law, classical languages. Frontier models currently score in the teens.
  • FrontierMath (Epoch AI, 2024) — research-level math by Fields medalists. o3 went from 2% → 25.2% in a single quarter, shocking even the authors.
  • SWE-bench Verified — real GitHub issues. Claude 3.7 / 4 class models cleared 70%+ in 2025 — the "AI can fix tickets" threshold most engineering managers watch.
  • GPQA Diamond — grad-level physics, chem, bio. Saturating above 85%.
  • OSWorld / WebArena / AgentBench — can the model actually use a computer? These are the benchmarks AGI economists care about because they're closest to "most economically valuable work."
  • The RE-Bench / METR evaluations — time-horizon tasks. METR's 2025 paper found the length of software tasks frontier models can complete reliably is roughly doubling every 7 months — one of the few "Moore's law"-like trends in the field.

The Singularity — where the term comes from

The word predates ChatGPT by half a century. Three milestones:

  • 1958 — John von Neumann (as reported by Stanislaw Ulam) mused about "an essential singularity in the history of the race beyond which human affairs, as we know them, could not continue."
  • 1965 — I. J. Good publishes Speculations Concerning the First Ultraintelligent Machine. His famous sentence: "The first ultraintelligent machine is the last invention that man need ever make, provided that the machine is docile enough to tell us how to keep it under control." This is the intelligence-explosion thesis: once an AI can design better AIs, recursive self-improvement diverges.
  • 1993 — Vernor Vinge's essay The Coming Technological Singularity (NASA VISION-21). Vinge fixes the popular image and predicts "within thirty years" — a bet that 2023 invalidated neither cleanly for nor against.
  • 2005 — Ray Kurzweil's The Singularity Is Near. Kurzweil predicts human-level AI by 2029 and the singularity by 2045, based on extrapolating Moore's law and his "law of accelerating returns." His 2029 prediction has held up remarkably well.
  • 2014 — Nick Bostrom's Superintelligence puts x-risk squarely into mainstream discourse and reframes the question from "when" to "how do we make it go well."
Intelligence explosion, in one equation

If the speed at which AI improves itself is proportional to its current intelligence I, you get dI/dt ∝ I, which integrates to I(t) = I₀·e^(kt) — exponential. If it's superlinear — dI/dt ∝ I² — you get hyperbolic growth, which reaches infinity in finite time. That finite-time blowup is the mathematical heart of "singularity." Good 1965 assumed superlinear; modern treatments (Davidson, Roodman, Christiano) use much more conservative assumptions and get slower — but still dramatic — curves.

Takeoff scenarios

Once you stipulate that an intelligence explosion is physically possible, the question becomes: how fast? Three archetypes dominate the debate:

Fast takeoff ("hard")

Days to months from roughly-human to vastly-superhuman. Single system, single actor. Associated with Yudkowsky / early MIRI. The scariest scenario and, most researchers now believe, the least likely given the compute-bound nature of training.

days–weeks

Slow takeoff ("soft")

Years. Capability grows incrementally, diffuses across labs, gets integrated into the economy. What Holden Karnofsky calls the "most likely" path, and roughly what we've seen 2022–26.

years

No-takeoff / plateau

Current architectures hit a wall. Scaling hits data limits, reasoning hits verification limits, agents hit reliability limits. Proponents include LeCun and many ML academics. Consistent with the 2024–25 observation that raw pre-training gains are slowing.

plateau

Christiano's influential 2018 "Takeoff speeds" post argues the right question isn't fast vs. slow but whether there's a "discontinuity" — a sudden jump larger than the previous jumps. Post-2022, most practitioners say "no discontinuity yet" — GPT-4 → Claude 3.5 → o1 → o3 is a fast slope but not a step-function.

Societal impacts — the short, medium, and long run

Labor & the economy

The first-order prediction is that any task expressible as text, code, or image is now price-deflationary. The empirical work catching up to this is messy but pointed:

  • Eloundou, Manning, Mishkin, Rock (OpenAI/UPenn, 2023)"GPTs are GPTs". Estimates 80% of the US workforce could have ≥10% of tasks affected and 19% could have ≥50% affected. Highest exposure: information-processing and writing-heavy jobs.
  • Brynjolfsson, Li, Raymond (NBER 2023) — deployment of an LLM assistant in a call center raised the productivity of the least experienced workers by 34% while barely moving expert productivity. Suggests AI compresses the skill distribution.
  • Acemoglu (2024) — dissenting view. Estimates macro GDP effects will be modest (~1% over a decade) because most jobs don't collapse into their "AI-exposed" tasks cleanly.
  • Goldman Sachs (2023) — headline number of "300M jobs exposed" globally. Widely cited, widely misread — exposure ≠ replacement.
  • IMF (2024) — 40% of global employment is exposed to AI, 60% in advanced economies; developing economies less exposed but also less able to capture productivity gains.

The honest summary: white-collar entry-level work is the first thing to feel it. If you managed junior analysts / copywriters / L1 support / junior devs, your 2026 team looks different than your 2022 team.

Governance & regulation

  • EU AI Act — entered into force Aug 2024; risk-tiered with an explicit "general-purpose AI" category plus extra obligations for "systemic" models (the Llama / GPT-4 tier). Most provisions kicking in 2025–26.
  • US executive action — Biden's Oct 2023 EO was rescinded by Trump in Jan 2025 and replaced by a pro-competition framing via the AI Action Plan (July 2025). Federal regulation remains sparse; state-level action (California SB 53, Colorado AI Act) is filling the gap.
  • UK AI Safety Institute (now AI Security Institute) and the Seoul Declaration (2024) — frontier labs commit to pre-deployment safety testing of the most capable models.
  • China — algorithm registration regime (2022) and generative AI interim measures (2023). Mandatory model registration, content labeling.
  • Bletchley → Seoul → Paris Summits — the AI Safety Summit series. The Paris summit (Feb 2025) pivoted hard from "safety" to "action" and "opportunity" — signaling a mood shift.

Information & epistemics

Two early effects are measurable. First, the cost of generating plausible text, images, audio, and video has collapsed — so has the trust floor of any unverified media. Second, search is being restructured around answer engines (ChatGPT Search, Perplexity, Google AI Overviews), which is already changing publisher traffic patterns. Early studies on AI-generated content in social media show mixed polarization effects — not the apocalypse some predicted, but not nothing.

Science

This is where the concrete wins are accumulating. AlphaFold 2 and 3 covered ~200M+ protein structures. AlphaProof & AlphaGeometry 2 achieved silver-medal performance at IMO 2024. FunSearch (DeepMind, 2023) found new results in the cap set problem. DeepMind's weather model GraphCast beats the ECMWF's operational model on most metrics. These are real, non-speculative. If "AI-for-science" is where AGI pays its rent, the rent is starting to show up.

Timelines & predictions — what the people closest to it say

Methodology note

The quotes below are primary-source-verifiable but ages poorly. All have been subject to revision. When in doubt, check the original transcript — frontier-lab CEOs have institutional reasons to make their timelines sound imminent (fundraising) and academics have reasons to sound long (credibility). Read accordingly.

SourcePredictionYear givenNotes
Dario Amodei (Anthropic)"Powerful AI" as soon as 2026–20272024Machines of Loving Grace, Oct 2024. Defines "powerful" as ≥ Nobel-winner at most cognitive tasks.
Sam Altman (OpenAI)"We are now confident we know how to build AGI as we have traditionally understood it." AGI in "a few thousand days."Jan 2025Reflections blog post. Notably vague on definition.
Demis Hassabis (DeepMind)AGI within 5–10 years2024Repeated at multiple venues; cautious about hype.
Elon Musk (xAI)AI smarter than any individual human by end of 2025; smarter than all humans combined by 20292024Musk's past AI timelines have been consistently too optimistic.
Geoffrey Hinton5–20 years, nontrivial chance AI takes over2023Resigned from Google May 2023 to speak freely about risks. Nobel 2024.
Yann LeCun (Meta)Current LLMs will not reach AGI; new architectures needed; decade+ away2023–26The loudest in-house skeptic.
Ray KurzweilHuman-level AI by 2029, singularity by 20452005, restated 2024Predictions made in The Singularity is Near; the 2029 date has aged well.
Metaculus community~50% by 2031 for "weak AGI" (down from 2050 in 2020)liveQuestion 5121. Definitions matter — look at the resolution criteria.
AI Impacts survey (Grace et al.)Aggregate 2023 survey of ~2700 AI researchers: 50% chance of "high-level machine intelligence" by 2047 (13 years earlier than the 2022 survey)Jan 2024arXiv:2401.02843.
AI 2027 scenario (Kokotajlo, Alexander, et al.)Detailed fictional timeline in which superhuman AI arrives late 2027 via a recursive self-improvement loop inside an automated AI research labApr 2025ai-2027.com. Explicitly a scenario, not a forecast — but widely read inside labs.
Epoch AICompute needed for transformative AI could be reached by ~2030 at current capex trends; data may bottleneck first2024epochai.org. Probably the most technically careful public forecaster.

Existential risk — the "what if it goes wrong" question

The x-risk argument, in three sentences: (1) we don't know how to robustly specify human values in an objective function; (2) a sufficiently capable optimizer pursuing any misspecified objective will by default acquire resources and resist shutdown (Omohundro 2008, "basic AI drives"); (3) if that optimizer is much smarter than us, course-correcting becomes very hard. The classic popular treatment is Bostrom's Superintelligence (2014); the technical-alignment literature traces to Yudkowsky, Soares, Armstrong, Russell.

CAIS statement (May 2023)

"Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war." Signed by Hinton, Bengio, Altman, Hassabis, Amodei and ~1000 others. The moment x-risk went mainstream.

Bengio's "International AI Safety Report" (2025)

UK-commissioned, 100+ contributors. Concludes capabilities are advancing faster than our ability to verify safety properties, and that several plausible pathways to catastrophic harm exist.

Anthropic's "Core Views on AI Safety"

Published 2023. Three scenarios: optimistic, pessimistic, pragmatic. Company operates as if pessimistic scenario might be true.

Skeptics

LeCun, Ng, Mitchell, Marcus argue current x-risk framing overweights theoretical scenarios vs. near-term harms (bias, misuse, concentration of power). Their preferred framing: "AI ethics" > "AI safety."

How to think about this without losing your mind

Nobody who tells you they are certain — in either direction — has good epistemics about AGI. The useful stance is probabilistic: what's your P(transformative AI before 2035)? P(it goes well | transformative AI)? Use those to decide what to work on, and then do the object-level work. Panic and dismissal both feel productive and neither is.

For a fuller treatment with interactive takeoff-speed simulations and side-by-side model comparisons, see the three deep-dive pages:

  • Deep Dive → AGI — the goal: definitions, benchmark trajectories, the current state of play.
  • Deep Dive → The Singularity: Good's 1965 thesis, Kurzweil's curves, takeoff-speed math, an interactive intelligence-explosion simulator.
  • Deep Dive → AI & Society: labor, governance, epistemics, x-risk — with sources you can click through.

Frameworks & Tooling // what you'll actually touch

PyTorch

Meta, 2016

What ~90% of research and production uses.

use this

JAX

Google, 2018

Functional, compiler-first. Powers Gemini, AlphaFold.

Hugging Face Transformers

One-line access to thousands of pretrained models.

use this

vLLM

Berkeley, 2023

High-throughput LLM serving. Standard for self-hosted.

prod

llama.cpp

Gerganov, 2023

Pure C++ inference. Runs on anything.

local

Ollama

One command to run local models. brew install of LLMs.

local

Claude Agent SDK / OpenAI Agents SDK

First-party SDKs. The path of least resistance for agents in 2026.

rising

pgvector

Postgres extension. The default vector DB.

default

Hardware & Inference // the real bottleneck

NVIDIA H100/B200/GB300 dominate training. Google TPUs are the second option. Groq and Cerebras do blazing-fast inference. Apple Silicon's unified memory lets you run 30B+ models on a MacBook.

Quantization — run trained FP16 models in 8/4/2-bit. A 70B model goes from 140GB → 40GB VRAM. Formats: GGUF (llama.cpp), AWQ/GPTQ (vLLM), FP8/NVFP4 (native H100+).

Model Rankings (April 2026) // speed + quality

Leaderboards are gamed, benchmarks leak. Treat as a snapshot.

Frontier LLMs — general reasoning quality

ModelLabQualityOpen?Notes
Claude Opus 4.6AnthropicNoSOTA coding & long context.
GPT-5 / o4OpenAINoReasoning model; strongest math/science.
Gemini 3 UltraGoogleNo2M+ context, multimodal native.
Grok 4xAIPartialReal-time web, fewer filters.
DeepSeek-R2DeepSeekYesOpen reasoning.
Llama 4 BehemothMetaYesBest open frontier.

Speed — tokens / second

ProviderTierSpeed
Groq / CerebrasLlama 70B 1000+ tok/s
SambaNovaLlama 405B ~400 tok/s
Claude Haikusmall ~200 tok/s
Reasoning modelso4, Opus 4.6 ~60 tok/s

Quick Glossary // cheat sheet

AI / ML / DL

Artificial Intelligence / Machine Learning / Deep Learning. Concentric.

NN / CNN / RNN

Neural Net / Convolutional / Recurrent.

Transformer

The architecture. Uses self-attention.

LLM / VLM / SLM

Large / Vision / Small Language Model.

Token

Word-piece the model processes.

Context window

Max tokens visible at once.

RLHF / DPO

Alignment methods using human preferences.

RAG

Retrieval + LLM.

MoE

Mixture of Experts. Sparse scaling.

CoT

Chain of Thought reasoning.

LoRA / QLoRA

Cheap fine-tuning adapters.

MCP

Model Context Protocol. Tool-use standard.

Further reading & deep dives

External resources