The 2026 State of AI // big picture
If you've been writing code but not tracking AI research, here's the headline: the field spent 70 years slogging toward general-purpose machines that can read, write, see, and reason. In the last ~3 years that finally clicked, and now a single architecture — the transformer — powers almost everything you hear about: ChatGPT, Claude, Gemini, Stable Diffusion, GitHub Copilot, Midjourney, Sora.
2012 — Neural networks beat humans at image recognition. Everyone pivots to deep learning.
2017 — Google publishes Attention Is All You Need. The transformer is born.
2020 — OpenAI releases GPT-3. It can write essays. Scaling becomes the strategy.
2022 — ChatGPT launches. AI becomes a consumer product overnight.
2024 — Reasoning models (o1, Claude extended thinking) solve PhD-level problems.
2026 — Agents, multimodal everything, and a Cambrian explosion of open-source models.
Who the major players are
OpenAI
GPT series, DALL·E, Sora, o-series reasoning. Frontier lab. Closed weights.
frontierclosedAnthropic
Claude family. Focus on safety, long context, and agentic coding.
frontierclosedGoogle DeepMind
Gemini, AlphaFold, Imagen, Veo. Deep research bench, full stack down to TPUs.
frontierclosedMeta AI (FAIR)
Llama series. Meta is the quiet giant of open-weight models — they ship the best ones.
openresearchxAI
Grok series. Notable for massive compute scale (Colossus cluster).
frontierMistral, DeepSeek, Qwen, Z.AI
European + Chinese open-weight labs shipping extremely capable small models fast.
openrisingNVIDIA
Not a model lab — the picks-and-shovels company. They make the GPUs everyone else trains on.
infraHugging Face
The GitHub of ML. Hosts hundreds of thousands of open models, datasets, and demos.
platformWhat the field actually looks like in 2026
- LLMs eat the world. Most "AI startup" pitches are thin wrappers around OpenAI/Anthropic APIs. The real product is distribution + UX + domain data.
- Classical ML is still everywhere. Your credit score, ad targeting, spam filter, and Uber's surge pricing are not running GPT-4. They're XGBoost or logistic regression.
- Open source is catching up fast. Llama 4, DeepSeek-V3/R1, Qwen 3, and Mistral models now match GPT-4-class performance you can run on your own hardware.
- Inference cost is the new game. Training the model is a one-time expense; serving it to millions of users is the forever bill.
- Agents & tool use are the current frontier. Models that can browse, call APIs, write and run code, and chain actions — not just chat.
- Reasoning models changed the plot. OpenAI's o1/o3 and Anthropic's Claude extended thinking spend "thinking time" before answering.
Foundations & Vocabulary // must-know terms
Before diving in, here's the vocabulary that shows up in every AI blog post.
The core loop
- Training — Show the model lots of examples. It adjusts its weights to make better predictions.
- Inference — Use the trained model to make predictions on new data.
Types of learning
Supervised
Input→output pairs. "1M labeled cat/dog photos."
Unsupervised
No labels. Find structure on its own (clustering).
Self-supervised
The model creates its own labels. LLMs do this: "predict the next word."
Reinforcement Learning
Agent takes actions, gets rewards. AlphaGo, RLHF.
How a neural net actually learns
- Loss function — how wrong was that prediction?
- Activation function Deep Dive → — the nonlinearity that makes depth meaningful.
- Gradient descent Deep Dive → — take a small step toward lower loss.
- Backpropagation Deep Dive → — the algorithm that efficiently computes those gradients.
Classical ML // 1950s – today
The unsexy stuff that runs half of production systems.
If your data is tabular, start with XGBoost or LightGBM. A neural net will almost never beat it.
Linear / Logistic Regression
Still the first thing to try. Interpretable weights you can show to your lawyer.
Decision Trees
A flowchart learned from data. Alone they overfit; combined in forests they dominate.
Random Forests
Hundreds of trees on random subsets. Robust, hard to break.
XGBoost / LightGBM
The king of tabular data. Won basically every Kaggle competition for years.
productionSVM
Finds the best separating hyperplane. Pre-deep-learning champion.
legacyk-NN
Find the k closest examples and vote. No training step.
k-Means
Unsupervised clustering. Customer segmentation.
PCA
Squash high-dim data into fewer dims. Used everywhere.
Neural Networks // 1943 – 1986
The idea of a "neural network" started as a wild simplification of how brain neurons work. It took 40+ years to make them actually useful.
The Perceptron (1957) Deep Dive →
Frank Rosenblatt at Cornell built a machine that could learn to classify images — an actual hardware device, the Mark I Perceptron, funded by the US Navy. It's the ancestor of everything on this page.
Inputs
Multi-Layer Perceptron (MLP) & Backprop (1986) Deep Dive →
Stack multiple layers. In 1986, Rumelhart, Hinton, and Williams published backpropagation — the single most important algorithm in modern AI. It runs the chain rule of calculus backward through the network to compute gradients for every weight. Click the Deep Dive link above for a full textbook treatment with derivation, worked examples, and interactive step-through of a training iteration.
Deep Neural Networks (DNN)
"Deep" is just marketing for "lots of layers." The reason they weren't used earlier: you need huge datasets and huge compute. It took until the mid-2000s — when GPUs became programmable and the internet produced massive image datasets — for deep learning to become practical.
Convolutional Neural Networks (CNN) Deep Dive →
Yann LeCun invented CNNs in 1989 to read handwritten zip codes. The insight: instead of connecting every pixel to every neuron, slide a small kernel across the image. Same weights reused → efficient, and captures the fact that "an edge is an edge regardless of where it is."
Setup
RNN, LSTM & GRU — handling sequences Deep Dive →
CNNs are great for images but don't handle sequential data. RNNs process tokens one at a time while maintaining a hidden state. Hochreiter & Schmidhuber's 1997 LSTM solved the vanishing-gradient problem of basic RNNs. Dominated NLP until transformers killed them in 2017.
Autoencoders
A network trained to compress its input into a small bottleneck and then reconstruct it. Variational Autoencoders (VAE, 2013) are the generative cousin — they sit inside every modern diffusion model.
GANs — generative adversarial networks Deep Dive →
Ian Goodfellow, 2014. Two networks: a generator tries to make fake images, a discriminator tries to spot the fakes. They improve together. By 2022, diffusion models had taken over the image-generation crown.
Word embeddings Deep Dive →
Mikolov's Word2Vec (2013) showed that training a shallow network to predict word co-occurrences produces vector spaces where king − man + woman ≈ queen. Every LLM still starts by mapping tokens through an embedding matrix — it's the bridge between discrete symbols and the geometry of meaning.
Attention & Transformers Deep Dive →// 2017 – the big bang
In 2017, eight researchers at Google Brain published "Attention Is All You Need." Everything you know as "modern AI" is built on it.
The insight: self-attention
Instead of sequential processing, let every token look at every other token directly. The model computes three vectors per token:
- Query (Q) — "what am I looking for?"
- Key (K) — "what do I contain?"
- Value (V) — "what information do I carry?"
Attention score(i,j) = softmax(Q_i · K_j / √d_k), and the output is the score-weighted sum of values.
Editable Q, K, V (query row for "cat")
Three flavors
Encoder-only
BERT (2018). Good for classification, search, embeddings.
Decoder-only
GPT, Claude, Llama. Generates one token at a time.
dominantEncoder-Decoder
T5, BART. Read everything in, generate everything out.
Large Language Models (LLMs) // 2018 – now
Take a decoder transformer. Make it huge. Train it on a trillion words by playing "guess the next token." You get GPT.
GPT-1 (2018)
Proved the recipe: pretrain + fine-tune.
BERT (2018)
Bidirectional encoder. Ran inside Google Search.
GPT-2 (2019)
Could write coherent paragraphs.
GPT-3 (2020)
Few-shot learning worked. Everything changed.
ChatGPT (2022)
100M users in 2 months.
inflectionGPT-4 (2023)
First model that felt like it was reasoning.
Claude (2023-2026)
Long context, strong coding, constitutional AI.
Llama (2023-2026)
Best open-weight frontier models.
openo1 / o3 (2024-25)
Reasoning models. Chain of thought at training time.
frontierDeepSeek-R1 (2025)
Open reasoning at o1 quality.
openMultimodal Models & VLMs // 2021 – now
CLIP (OpenAI, 2021) trained on 400M image-caption pairs and learned a shared embedding space for images and text. Today every frontier model is multimodal — paste a screenshot, ask a question.
Diffusion Models Deep Dive →// 2020 – now
Train a model to denoise an image by one small step. At generation time, start with pure noise and run the denoiser ~50 times. What emerges is a coherent image, guided by your text prompt.
Noise schedule (linear β)
Agents, RAG & Tool Use // 2023 – now
The current frontier isn't bigger models — it's letting models take actions. Read files, call APIs, browse the web, run code.
RAG — Retrieval Augmented Generation
Split docs into chunks → embed them → store in a vector DB → on query, retrieve relevant chunks → put them in the prompt. Almost every "chat with your PDFs" product is this.
Tool use / function calling
Give the model a list of functions. It outputs structured JSON asking for a call. Your code runs the function and feeds the result back. Repeat.
Agents
LLM in a loop: decide what to do → execute → observe → repeat until done. Claude Code, Cursor, Devin.
MCP — Model Context Protocol
Anthropic's 2024 open standard for exposing tools/data to any AI client. The "USB-C for LLMs."
Emerging Technologies // the 2025–26 frontier
Where the research edge lives right now. These are the ideas currently reshaping what a "model" even is — sparse routing, learned reasoning, linear-time sequence mixers, on-device trillions, and the alignment scaffolding that has to keep up.
Mixture of Experts (MoE) Deep Dive →
Instead of running every parameter on every token, a router sends each token to k out of N "expert" sub-networks. The model can have a trillion parameters but only activate ~30B per token. This is how Mixtral, DeepSeek-V3, and GPT-4-class systems get huge effective capacity at serving-time cost closer to a much smaller dense model. Key problems: load balancing (don't starve experts), router stability, and all-to-all communication.
Reasoning & Test-Time Compute Deep Dive →
OpenAI's o1 (2024) and DeepSeek-R1 (2025) broke a quiet assumption: that a model's "thinking" happens only during pretraining. Now models are trained — usually via RL on verifiable answers — to generate long chains of thought before answering. Spending more compute at inference time on search, self-critique, and process-reward guidance gives big accuracy jumps on math, code, and science. It's an entirely new axis of scaling.
State Space Models & Mamba Deep Dive →
Transformers are O(n²) in sequence length. State Space Models (S4, then Mamba-1 and Mamba-2) replace attention with a selective linear recurrence that runs in O(n) and scans very long contexts cheaply. They rival Transformers on language and dominate on audio, DNA, and very long-context tasks. Hybrids (Jamba, Zamba, Samba) mix SSM blocks with a few attention layers for the best of both.
Foundation Models Deep Dive →
Pretrain once on a massive, diverse corpus; adapt everywhere. The term was coined at Stanford (2021) to name the shift: GPT, Claude, Gemini, Llama, DINOv2, SAM, Whisper are all bases that downstream apps build on. The key results are scaling laws (Kaplan 2020, Chinchilla 2022) telling you how to trade off parameters vs. tokens vs. compute, and emergent capabilities that only appear past certain scales.
Retrieval-Augmented Generation (RAG) Deep Dive →
The most boringly practical technique on this list, and probably the most widely deployed. Embed documents into a vector space, retrieve the top-k most similar chunks at query time, stuff them in the prompt. 2025 improvements: hybrid dense+sparse retrieval, learned rerankers, HyDE, GraphRAG, agentic multi-hop retrieval, and long-context models that let you skip retrieval for small corpora entirely.
Neuro-Symbolic AI Deep Dive →
The attempt to marry neural perception (fast, fuzzy, learned) with symbolic reasoning (slow, precise, rule-based). Think DeepMind's AlphaGeometry solving IMO problems, or differentiable theorem provers. The bet is that pure scaling will hit a wall on tasks needing verified logical steps, and that hybrid systems — neural front-ends feeding a symbolic solver — will close the gap.
Edge & On-Device AI Deep Dive →
Phones, laptops, and microcontrollers are now running 3B–8B parameter models locally. The enabling tricks: 4-bit and 2-bit quantization (GPTQ, AWQ, bitsandbytes), knowledge distillation into small student models, structured pruning, and speculative decoding. Apple Intelligence, Gemini Nano, Phi-3-mini, and Llama-3.2-1B all live here. Private by construction, zero latency, zero per-token cost.
AI Safety & Alignment Deep Dive →
How do you make a model that's not just capable but aligned with what its users actually want, and honest about what it doesn't know? The toolkit: RLHF (2022), Constitutional AI and DPO (2023), Constitutional Classifiers and deliberative alignment (2024–25), interpretability via sparse autoencoders (Anthropic's 2024 dictionary-learning work), and systematic red-teaming. This is the field that has to keep pace with everything else on this list.
AGI & the Singularity // the endgame Deep Dive →
In 2022 "AGI" was a word you whispered at parties to sound interesting. In 2026 it's a line item in Microsoft's contract with OpenAI, a KPI on Demis Hassabis's performance review, and the subject of Senate hearings. This section is the map.
Everything below this line is a mix of reported fact (labs, funding, products) and expert speculation (timelines, takeoff, impacts). Sources are linked so you can check the receipts. No one actually knows when or if AGI arrives — be suspicious of anyone, including the labs, who says otherwise.
What counts as AGI?
There is no universally agreed definition, which is why headlines feel contradictory. Four influential ones:
OpenAI's charter (2018)
"Highly autonomous systems that outperform humans at most economically valuable work." Explicitly an economic bar, not a cognitive one.
economicDeepMind's levels (2024)
A 6-level scale from "No AI" → "Superhuman" with a cross-axis for narrow vs general. Their paper (Morris et al.) puts Gemini & GPT-4 at Level 1 General — "emerging AGI."
technicalMetaculus / Karnofsky "transformative AI"
An AI that causes a transition comparable to the agricultural or industrial revolution. Deliberately agnostic about internals.
impactThe Turing Test (obsolete)
Already passed in limited form by GPT-4 class models in controlled studies (Jones & Bergen 2024). Nobody serious uses it as a finish line anymore.
historicalMapped onto today's systems: frontier LLMs are better than most humans at many discrete tasks (coding contests, competition math, the bar exam) while still failing at things a 5-year-old finds easy (robust physical common sense, long-horizon planning without tools, genuinely novel research). That mismatch is exactly why the "AGI or not?" debate never resolves.
The labs racing to build it
Every serious frontier lab has AGI (or some synonym — "powerful AI", "superintelligence", "transformative AI") as an explicit, stated goal. Below is the 2026 field.
OpenAI
Sam Altman's stated mission is AGI that "benefits all of humanity." 2026 roadmap: GPT-5 (released mid-2025), o-series reasoning models, and the "Stargate" compute build-out announced Jan 2025 — a $500B joint venture with Oracle, SoftBank, and MGX to build US AI infrastructure.
frontier$500B StargateAnthropic
Founded by ex-OpenAI safety staff (Dario & Daniela Amodei). Mission is "powerful AI done safely." Publicly bets that frontier capabilities must be developed at the safety frontier to steer the field. Dario's "Machines of Loving Grace" (Oct 2024) argues capable AI could compress 50–100 years of biomedical progress into 5–10.
frontiersafety-firstGoogle DeepMind
Demis Hassabis (Nobel 2024 for AlphaFold) has said AGI is a 5–10 year horizon. Full stack: TPUs, Gemini 2.x, AlphaFold, AlphaProof, AlphaGeometry, Project Astra (universal assistant).
frontierfull-stackxAI
Musk's post-OpenAI lab. Grok series. Memphis "Colossus" cluster went from 0 → 100k H100s in 122 days (2024) and is being expanded to 1M GPUs. Explicitly framed as a race to AGI.
frontierColossus 1MMeta AI (FAIR + GenAI)
Yann LeCun leads FAIR; Alexandr Wang now runs a new "Superintelligence Labs" unit after Meta's $14B investment in Scale AI (June 2025). Llama 4 family is open-weight. LeCun is the loudest AGI skeptic at a frontier lab — his bet is on world-model-based "objective-driven" AI, not pure LLMs.
openworld modelsSafe Superintelligence (SSI)
Ilya Sutskever (ex-OpenAI co-founder + chief scientist) with Daniel Gross and Daniel Levy. One product only: "safe superintelligence." Raised $1B at a $5B valuation in Sep 2024; reportedly raising again at ~$30B in 2025.
stealthsuperintelligenceDeepSeek
Spun out of quant fund High-Flyer. Shipped DeepSeek-V3 (Dec 2024) and R1 reasoning model (Jan 2025) with reported training cost ~$6M — an order of magnitude below US frontier labs. Set off the "DeepSeek moment" market panic in Jan 2025.
ChinaopenMicrosoft AI
Mustafa Suleyman (ex-DeepMind, ex-Inflection) leads MAI. Primary vehicle is still the OpenAI partnership ($13B+ invested), but Microsoft is building its own frontier model capacity. The OpenAI contract has a clause that cuts Microsoft's access the moment OpenAI's board declares AGI achieved — the most-discussed corporate clause in tech.
frontierMistral AI
Europe's flagship. Co-founded by ex-DeepMind and ex-Meta researchers. Mix of open and commercial models. Not explicitly an "AGI lab" — more of a sovereign-AI play — but in the frontier conversation.
EuropeZhipu / Moonshot / Qwen (Alibaba)
The "Chinese tigers." GLM-4, Kimi, and Qwen 3 are all frontier-adjacent and mostly open-weight. Under US export controls on advanced GPUs, they've leaned into efficiency — which the DeepSeek papers then weaponized.
ChinaGoogle / Microsoft / Amazon — hyperscalers
Not "labs" strictly but they own the compute. In 2025 the big four hyperscalers combined committed over $300B in capex, most of it AI-related. Without them the labs above cannot train.
infraNVIDIA
The one company that wins either way. Crossed $3T market cap in 2024 on the back of H100/H200/B100/GB200 demand. Also ships its own foundation models (Nemotron, Cosmos world models) to keep the stack sticky.
infrapicks-and-shovelsHow do we measure progress?
The benchmarks that used to matter (GLUE, SuperGLUE, MMLU) are saturated — frontier models score above 90% and the remaining errors are often benchmark mistakes, not model mistakes. The 2025–26 yardsticks are harder:
- ARC-AGI-2 (Chollet's puzzles) — abstraction & reasoning on novel tasks. o3 hit 75.7% on ARC-AGI-1 in Dec 2024 (vs. ~10% for GPT-4). ARC-AGI-2 was designed to be harder and frontier scores remain below human baselines.
- Humanity's Last Exam (Scale + CAIS, 2025) — 3000+ questions at the frontier of human expertise across physics, math, law, classical languages. Frontier models currently score in the teens.
- FrontierMath (Epoch AI, 2024) — research-level math by Fields medalists. o3 went from 2% → 25.2% in a single quarter, shocking even the authors.
- SWE-bench Verified — real GitHub issues. Claude 3.7 / 4 class models cleared 70%+ in 2025 — the "AI can fix tickets" threshold most engineering managers watch.
- GPQA Diamond — grad-level physics, chem, bio. Saturating above 85%.
- OSWorld / WebArena / AgentBench — can the model actually use a computer? These are the benchmarks AGI economists care about because they're closest to "most economically valuable work."
- The RE-Bench / METR evaluations — time-horizon tasks. METR's 2025 paper found the length of software tasks frontier models can complete reliably is roughly doubling every 7 months — one of the few "Moore's law"-like trends in the field.
The Singularity — where the term comes from
The word predates ChatGPT by half a century. Three milestones:
- 1958 — John von Neumann (as reported by Stanislaw Ulam) mused about "an essential singularity in the history of the race beyond which human affairs, as we know them, could not continue."
- 1965 — I. J. Good publishes Speculations Concerning the First Ultraintelligent Machine. His famous sentence: "The first ultraintelligent machine is the last invention that man need ever make, provided that the machine is docile enough to tell us how to keep it under control." This is the intelligence-explosion thesis: once an AI can design better AIs, recursive self-improvement diverges.
- 1993 — Vernor Vinge's essay The Coming Technological Singularity (NASA VISION-21). Vinge fixes the popular image and predicts "within thirty years" — a bet that 2023 invalidated neither cleanly for nor against.
- 2005 — Ray Kurzweil's The Singularity Is Near. Kurzweil predicts human-level AI by 2029 and the singularity by 2045, based on extrapolating Moore's law and his "law of accelerating returns." His 2029 prediction has held up remarkably well.
- 2014 — Nick Bostrom's Superintelligence puts x-risk squarely into mainstream discourse and reframes the question from "when" to "how do we make it go well."
If the speed at which AI improves itself is proportional to its current intelligence I, you get dI/dt ∝ I, which integrates to I(t) = I₀·e^(kt) — exponential. If it's superlinear — dI/dt ∝ I² — you get hyperbolic growth, which reaches infinity in finite time. That finite-time blowup is the mathematical heart of "singularity." Good 1965 assumed superlinear; modern treatments (Davidson, Roodman, Christiano) use much more conservative assumptions and get slower — but still dramatic — curves.
Takeoff scenarios
Once you stipulate that an intelligence explosion is physically possible, the question becomes: how fast? Three archetypes dominate the debate:
Fast takeoff ("hard")
Days to months from roughly-human to vastly-superhuman. Single system, single actor. Associated with Yudkowsky / early MIRI. The scariest scenario and, most researchers now believe, the least likely given the compute-bound nature of training.
days–weeksSlow takeoff ("soft")
Years. Capability grows incrementally, diffuses across labs, gets integrated into the economy. What Holden Karnofsky calls the "most likely" path, and roughly what we've seen 2022–26.
yearsNo-takeoff / plateau
Current architectures hit a wall. Scaling hits data limits, reasoning hits verification limits, agents hit reliability limits. Proponents include LeCun and many ML academics. Consistent with the 2024–25 observation that raw pre-training gains are slowing.
plateauChristiano's influential 2018 "Takeoff speeds" post argues the right question isn't fast vs. slow but whether there's a "discontinuity" — a sudden jump larger than the previous jumps. Post-2022, most practitioners say "no discontinuity yet" — GPT-4 → Claude 3.5 → o1 → o3 is a fast slope but not a step-function.
Societal impacts — the short, medium, and long run
Labor & the economy
The first-order prediction is that any task expressible as text, code, or image is now price-deflationary. The empirical work catching up to this is messy but pointed:
- Eloundou, Manning, Mishkin, Rock (OpenAI/UPenn, 2023) — "GPTs are GPTs". Estimates 80% of the US workforce could have ≥10% of tasks affected and 19% could have ≥50% affected. Highest exposure: information-processing and writing-heavy jobs.
- Brynjolfsson, Li, Raymond (NBER 2023) — deployment of an LLM assistant in a call center raised the productivity of the least experienced workers by 34% while barely moving expert productivity. Suggests AI compresses the skill distribution.
- Acemoglu (2024) — dissenting view. Estimates macro GDP effects will be modest (~1% over a decade) because most jobs don't collapse into their "AI-exposed" tasks cleanly.
- Goldman Sachs (2023) — headline number of "300M jobs exposed" globally. Widely cited, widely misread — exposure ≠ replacement.
- IMF (2024) — 40% of global employment is exposed to AI, 60% in advanced economies; developing economies less exposed but also less able to capture productivity gains.
The honest summary: white-collar entry-level work is the first thing to feel it. If you managed junior analysts / copywriters / L1 support / junior devs, your 2026 team looks different than your 2022 team.
Governance & regulation
- EU AI Act — entered into force Aug 2024; risk-tiered with an explicit "general-purpose AI" category plus extra obligations for "systemic" models (the Llama / GPT-4 tier). Most provisions kicking in 2025–26.
- US executive action — Biden's Oct 2023 EO was rescinded by Trump in Jan 2025 and replaced by a pro-competition framing via the AI Action Plan (July 2025). Federal regulation remains sparse; state-level action (California SB 53, Colorado AI Act) is filling the gap.
- UK AI Safety Institute (now AI Security Institute) and the Seoul Declaration (2024) — frontier labs commit to pre-deployment safety testing of the most capable models.
- China — algorithm registration regime (2022) and generative AI interim measures (2023). Mandatory model registration, content labeling.
- Bletchley → Seoul → Paris Summits — the AI Safety Summit series. The Paris summit (Feb 2025) pivoted hard from "safety" to "action" and "opportunity" — signaling a mood shift.
Information & epistemics
Two early effects are measurable. First, the cost of generating plausible text, images, audio, and video has collapsed — so has the trust floor of any unverified media. Second, search is being restructured around answer engines (ChatGPT Search, Perplexity, Google AI Overviews), which is already changing publisher traffic patterns. Early studies on AI-generated content in social media show mixed polarization effects — not the apocalypse some predicted, but not nothing.
Science
This is where the concrete wins are accumulating. AlphaFold 2 and 3 covered ~200M+ protein structures. AlphaProof & AlphaGeometry 2 achieved silver-medal performance at IMO 2024. FunSearch (DeepMind, 2023) found new results in the cap set problem. DeepMind's weather model GraphCast beats the ECMWF's operational model on most metrics. These are real, non-speculative. If "AI-for-science" is where AGI pays its rent, the rent is starting to show up.
Timelines & predictions — what the people closest to it say
The quotes below are primary-source-verifiable but ages poorly. All have been subject to revision. When in doubt, check the original transcript — frontier-lab CEOs have institutional reasons to make their timelines sound imminent (fundraising) and academics have reasons to sound long (credibility). Read accordingly.
| Source | Prediction | Year given | Notes |
|---|---|---|---|
| Dario Amodei (Anthropic) | "Powerful AI" as soon as 2026–2027 | 2024 | Machines of Loving Grace, Oct 2024. Defines "powerful" as ≥ Nobel-winner at most cognitive tasks. |
| Sam Altman (OpenAI) | "We are now confident we know how to build AGI as we have traditionally understood it." AGI in "a few thousand days." | Jan 2025 | Reflections blog post. Notably vague on definition. |
| Demis Hassabis (DeepMind) | AGI within 5–10 years | 2024 | Repeated at multiple venues; cautious about hype. |
| Elon Musk (xAI) | AI smarter than any individual human by end of 2025; smarter than all humans combined by 2029 | 2024 | Musk's past AI timelines have been consistently too optimistic. |
| Geoffrey Hinton | 5–20 years, nontrivial chance AI takes over | 2023 | Resigned from Google May 2023 to speak freely about risks. Nobel 2024. |
| Yann LeCun (Meta) | Current LLMs will not reach AGI; new architectures needed; decade+ away | 2023–26 | The loudest in-house skeptic. |
| Ray Kurzweil | Human-level AI by 2029, singularity by 2045 | 2005, restated 2024 | Predictions made in The Singularity is Near; the 2029 date has aged well. |
| Metaculus community | ~50% by 2031 for "weak AGI" (down from 2050 in 2020) | live | Question 5121. Definitions matter — look at the resolution criteria. |
| AI Impacts survey (Grace et al.) | Aggregate 2023 survey of ~2700 AI researchers: 50% chance of "high-level machine intelligence" by 2047 (13 years earlier than the 2022 survey) | Jan 2024 | arXiv:2401.02843. |
| AI 2027 scenario (Kokotajlo, Alexander, et al.) | Detailed fictional timeline in which superhuman AI arrives late 2027 via a recursive self-improvement loop inside an automated AI research lab | Apr 2025 | ai-2027.com. Explicitly a scenario, not a forecast — but widely read inside labs. |
| Epoch AI | Compute needed for transformative AI could be reached by ~2030 at current capex trends; data may bottleneck first | 2024 | epochai.org. Probably the most technically careful public forecaster. |
Existential risk — the "what if it goes wrong" question
The x-risk argument, in three sentences: (1) we don't know how to robustly specify human values in an objective function; (2) a sufficiently capable optimizer pursuing any misspecified objective will by default acquire resources and resist shutdown (Omohundro 2008, "basic AI drives"); (3) if that optimizer is much smarter than us, course-correcting becomes very hard. The classic popular treatment is Bostrom's Superintelligence (2014); the technical-alignment literature traces to Yudkowsky, Soares, Armstrong, Russell.
CAIS statement (May 2023)
"Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war." Signed by Hinton, Bengio, Altman, Hassabis, Amodei and ~1000 others. The moment x-risk went mainstream.
Bengio's "International AI Safety Report" (2025)
UK-commissioned, 100+ contributors. Concludes capabilities are advancing faster than our ability to verify safety properties, and that several plausible pathways to catastrophic harm exist.
Anthropic's "Core Views on AI Safety"
Published 2023. Three scenarios: optimistic, pessimistic, pragmatic. Company operates as if pessimistic scenario might be true.
Skeptics
LeCun, Ng, Mitchell, Marcus argue current x-risk framing overweights theoretical scenarios vs. near-term harms (bias, misuse, concentration of power). Their preferred framing: "AI ethics" > "AI safety."
Nobody who tells you they are certain — in either direction — has good epistemics about AGI. The useful stance is probabilistic: what's your P(transformative AI before 2035)? P(it goes well | transformative AI)? Use those to decide what to work on, and then do the object-level work. Panic and dismissal both feel productive and neither is.
For a fuller treatment with interactive takeoff-speed simulations and side-by-side model comparisons, see the three deep-dive pages:
- Deep Dive → AGI — the goal: definitions, benchmark trajectories, the current state of play.
- Deep Dive → The Singularity: Good's 1965 thesis, Kurzweil's curves, takeoff-speed math, an interactive intelligence-explosion simulator.
- Deep Dive → AI & Society: labor, governance, epistemics, x-risk — with sources you can click through.
Frameworks & Tooling // what you'll actually touch
PyTorch
What ~90% of research and production uses.
use thisJAX
Functional, compiler-first. Powers Gemini, AlphaFold.
Hugging Face Transformers
One-line access to thousands of pretrained models.
use thisvLLM
High-throughput LLM serving. Standard for self-hosted.
prodllama.cpp
Pure C++ inference. Runs on anything.
localOllama
One command to run local models. brew install of LLMs.
Claude Agent SDK / OpenAI Agents SDK
First-party SDKs. The path of least resistance for agents in 2026.
risingpgvector
Postgres extension. The default vector DB.
defaultHardware & Inference // the real bottleneck
NVIDIA H100/B200/GB300 dominate training. Google TPUs are the second option. Groq and Cerebras do blazing-fast inference. Apple Silicon's unified memory lets you run 30B+ models on a MacBook.
Quantization — run trained FP16 models in 8/4/2-bit. A 70B model goes from 140GB → 40GB VRAM. Formats: GGUF (llama.cpp), AWQ/GPTQ (vLLM), FP8/NVFP4 (native H100+).
Model Rankings (April 2026) // speed + quality
Leaderboards are gamed, benchmarks leak. Treat as a snapshot.
Frontier LLMs — general reasoning quality
| Model | Lab | Quality | Open? | Notes |
|---|---|---|---|---|
| Claude Opus 4.6 | Anthropic | No | SOTA coding & long context. | |
| GPT-5 / o4 | OpenAI | No | Reasoning model; strongest math/science. | |
| Gemini 3 Ultra | No | 2M+ context, multimodal native. | ||
| Grok 4 | xAI | Partial | Real-time web, fewer filters. | |
| DeepSeek-R2 | DeepSeek | Yes | Open reasoning. | |
| Llama 4 Behemoth | Meta | Yes | Best open frontier. |
Speed — tokens / second
| Provider | Tier | Speed |
|---|---|---|
| Groq / Cerebras | Llama 70B | 1000+ tok/s |
| SambaNova | Llama 405B | ~400 tok/s |
| Claude Haiku | small | ~200 tok/s |
| Reasoning models | o4, Opus 4.6 | ~60 tok/s |
Quick Glossary // cheat sheet
AI / ML / DL
Artificial Intelligence / Machine Learning / Deep Learning. Concentric.
NN / CNN / RNN
Neural Net / Convolutional / Recurrent.
Transformer
The architecture. Uses self-attention.
LLM / VLM / SLM
Large / Vision / Small Language Model.
Token
Word-piece the model processes.
Context window
Max tokens visible at once.
RLHF / DPO
Alignment methods using human preferences.
RAG
Retrieval + LLM.
MoE
Mixture of Experts. Sparse scaling.
CoT
Chain of Thought reasoning.
LoRA / QLoRA
Cheap fine-tuning adapters.
MCP
Model Context Protocol. Tool-use standard.
Further reading & deep dives
- 📘 Perceptron — from biology to algorithm
- 📘 Activation Functions — the nonlinearity between layers
- 📘 Gradient Descent — the engine of learning (now with 3D)
- 📘 Backpropagation — full textbook treatment
- 📘 Convolution — how CNNs see
- 📘 RNNs & LSTMs — sequence models before Transformers
- 📘 Word Embeddings — king − man + woman ≈ queen
- 📘 Self-Attention — transformer math
- 📘 GANs — the forger and the inspector
- 📘 Diffusion Models — forward & reverse processes
- 🔬 Mixture of Experts — sparse scaling
- 🔬 Reasoning & Test-Time Compute — the new axis
- 🔬 State Space Models & Mamba — linear-time sequence mixing
- 🔬 Foundation Models — scaling laws & the Chinchilla recipe
- 🔬 Retrieval-Augmented Generation — dense retrieval & hybrid search
- 🔬 Neuro-Symbolic AI — differentiable logic & AlphaGeometry
- 🔬 Edge & On-Device AI — quantization, distillation, speculative decoding
- 🔬 AI Safety & Alignment — RLHF, DPO, Constitutional AI, SAEs
- 🌌 AGI — the goal, the labs, the benchmarks
- 🌌 The Singularity — intelligence explosion math & takeoff speeds
- 🌌 AI & Society — labor, governance, epistemics, x-risk