Billion-Parameter Theories

March 12, 2026

For most of human history, the things we couldn’t explain, we called mystical. The movement of stars, the trajectories of projectiles, the behavior of gases. Then, over the course of a few centuries, we pulled these phenomena into the domain of human inquiry. We called it science.

What’s remarkable, in retrospect, is how terse those explanations turned out to be. F=ma. E=mc². PV=nRT.

The Enlightenment and its intellectual descendants gave us a powerful toolkit for taming the complicated. And then we made the natural mistake of assuming that toolkit would scale to everything.

The concepts they developed were descriptive rather than prescriptive. Knowing that a system exhibits power law behavior tells you the shape of what will happen without telling you the specifics. You couldn’t pick these principles up and use them to intervene in the world with precision.

Take large language models. Fundamentally, a large language model is a compressed model of an extraordinarily complex system, the totality of human language use, which itself reflects human thought, culture, social dynamics, and reasoning. The compression ratio is enormous. The model is unimaginably smaller than the system it represents. That makes it a theory of that system, in every sense that matters, a lossy but useful representation that lets you make predictions and run counterfactuals.

So perhaps there are two layers of theory here. The system-specific layer, the trained weights, is large and particular to its domain. This will likely always be true. The theory of this economy or this climate will always be vast.

But the meta-layer, the minimal architecture that can learn to represent arbitrary complex systems, might be compact and universal. It might be exactly the kind of good explanation Deutsch would champion.

Source

I’ve long been very interested in complexity theory. It had its moment back in the late 1980s and early 1990s with chaos theory and a very popular science book from James Gleick.

Benoit Mandelbrot, he of the famed Mandelbrot set, and one of the originators of the science. was something of a rock star.

In this long but very readable and I found engaging essay, Sean Linehan Argues that large language models, attention-based models, are a new science. I highly recommend reading this, even if it’s not something you’ll apply in your everyday work.