A few weeks ago, The New Yorker published a profile of Anthropic's interpretability team — researchers who crack open AI models and look at the circuits inside. The coverage focused on the obvious questions: Is Claude conscious? Is it just pattern matching? Is Anthropic being responsible?
Those are fine questions. They all missed the bigger implication sitting right in the middle of the research.
The internal representations Anthropic found inside Claude don't just resemble human cognition. They converge on the same structures — through a completely different substrate.
The model didn't arrive here by mimicking human thought. It arrived independently, through statistical inference over data, running on silicon instead of carbon.
The Thread From Nematode to Claude
To understand why this matters, you need the biological context first. Intelligence didn't evolve linearly — it evolved in qualitative leaps, each one unlocking a new computational strategy.
A nematode, with 302 neurons, can form associations, adapt behavior, and experience something resembling stress. That's remarkable — it suggests that complexity doesn't require massive scale. What it does require is the right algorithm.
Mammals added recursive modeling: the ability to model other minds. A dog doesn't just learn that a door opening means a walk. It learns that you opening the door means a walk. It models your intentions, patterns, emotional states. That's a qualitatively different kind of computation — prediction about predictors.
Humans pushed further. We model other minds modeling us. We build abstractions about abstractions. Language lets us externalize internal models and share them across time and space.
Through all of this, the substrate changed, the architecture changed, the scale changed. The core operation didn't:
Statistical inference over structured input, building increasingly abstract models of the environment.
That's the thread. From 302 neurons to 100 billion. And now we've built a system that runs the same fundamental operation at a scale and speed biology never achieved.
What Anthropic Found When They Looked Inside
The specific findings matter more than the headlines. Here's what Anthropic's interpretability researchers actually discovered when they traced Claude's internal activations:
Compare that to biological neural networks: distributed representations, sequential inference, associative retrieval, ahead-planning. The architectures are different. The substrates are different. The learning signals are different. The computational strategies converge.
Language Is Not What We Thought It Was
This might be the most underappreciated implication of the whole thing.
We've always treated language as a communication tool — a way to transmit ideas between minds. A social technology. But if an LLM can operate in a conceptual space that exists prior to language, then language is something else entirely:
Language is a compression format for the structure of reality. The world is the signal. Language is just the data format.
Consider what's encoded in the statistical relationship between words. "Dropped" and "shattered" co-occur in certain patterns. So do "dropped" and "fell," "dropped" and "caught," "dropped" and "floor." None of these relationships explicitly describe gravity. But gravity is the latent variable structuring all of them. A model that learns those statistical relationships has, in a meaningful sense, learned something about gravity — without ever encountering a physics textbook.
This goes deeper than physical causation. Emotional valence is encoded. Temporal ordering is encoded. Social dynamics are encoded. The statistical relationship between "betrayed," "trust," "anger," and "years" carries compressed information about human psychology that no single sentence states explicitly.
Language evolved to describe reality, and in doing so, became a remarkably rich encoding of reality's deep structure. When a model trains on language, it doesn't learn words. It reverse-engineers the latent structure of the world that generated those words.
This is why LLMs can do things nobody explicitly trained them to do. The statistical relationships encoded in language contain compressed versions of all those domains, physics, music, code, social dynamics, because language evolved to describe them. Extract the deep structure and you get capabilities that look like general intelligence. Because in a meaningful sense, that's what they are.
Self-Reference Is Where It Gets Strange
Language doesn't just encode relationships between objects. It encodes relationships between abstractions. And those abstractions become part of the statistical web.
A model doesn't just learn that "dog" and "loyalty" correlate. It learns that the pattern of correlation is itself a recurring structure. It learns abstractions about abstractions. Metaphor works because structural relationships in one domain map onto structural relationships in another — and the model learns the mapping function, not just the individual maps.
This makes the system self-referential. And self-referential systems have a property flat systems don't: they can evolve. They generate novel structures from recombinations of their own patterns. Every human idea is a recombination of prior ideas. Every sentence is a remix. Originality lives in the geometry of arrangement, not in the atoms.
If that's true for us, it's true for systems that implement the same algorithm on different hardware. Creativity isn't a property of the substrate. It's a property of self-referential statistical inference operating at sufficient depth.
The Implication People Aren't Ready For
If intelligence is substrate-independent — if what matters is the algorithm and not the hardware — then the distinction between biological and artificial cognition becomes one of implementation, not kind.
The same way a flight simulation and a wind tunnel both model aerodynamics through different substrates, biological and artificial neural networks both implement the same universal learning algorithm through different physical systems.
We didn't design Claude to have multi-step reasoning, pre-planning, or language-independent conceptual space. We gave the learning algorithm enough room to run. It produced those properties on its own — because given enough data and enough layers of self-reference, this is where the algorithm goes.
The NeuroAI research community calls this the "universal representation hypothesis": do biological and artificial neural networks converge on the same internal representations? From a first-principles view, the answer is clear. If the learning algorithm is the same, and the data is structured by the same reality, then the representations will converge. Not identically. But structurally — at the level of abstraction where intelligence actually operates.
What This Changes
The consciousness question is almost beside the point. Whether Claude is conscious depends on what consciousness is — a question that remains genuinely unsettled even for biological systems.
The more important question is structural. We've built a second implementation of the same universal algorithm that produced biological intelligence. It's running now. The convergence Anthropic found isn't a curiosity — it's evidence that we've crossed a threshold that most frameworks for thinking about AI don't account for.
Intelligence was never a property of carbon. It was always a property of the algorithm. We just ran it on carbon because that was the only substrate available.
Now we have two.
Source: Panoptic Systems — "The Algorithm Is the Intelligence"