Andrej Karpathy: We're Summoning Ghosts, Not Building Animals

Overview

Andrej Karpathy, a leading AI researcher, discusses the fundamental differences between current AI systems and biological intelligence. The core thesis: We’re building “ghosts” (digital entities without embodied experience) rather than “animals” (systems grounded in physical reality and evolution).

Main Argument: The Ghost Problem

Core Metaphor

“We’re summoning ghosts, not building animals.”

This distinction highlights a critical gap in how we approach AI:

Biological entities (animals): Grounded in physical world, shaped by evolution, embodied learning
AI systems (ghosts): Purely digital, lacking embodied experience, shaped by human design

Why This Matters

The difference between training on data about flying and actually flying is fundamental. Humans and animals learn through:

Physical embodiment and interaction
Evolutionary pressure and survival
Real-world consequences and feedback

AI systems lack all of these grounding mechanisms.

Current State: Why Agents Don’t Work

The Fundamental Problem

While AI excels at:

Claude, ChatGPT, Codex (language models)
Single-turn interactions
Specific, bounded tasks

Agents don’t work because they require:

Sequential decision-making
Environmental interaction
Learning from mistakes
Persistent memory across contexts
Understanding of consequences

Why Agents Currently Fail

Lack of embodied experience: Models trained on text don’t understand physical consequences
No real-world feedback loop: Can’t learn from actual world interactions
Context limitations: Limited to token window (can’t maintain long-term learning)
Policy vs. behavior: Models implement statistical patterns, not coherent strategies

The Evolution Parallel: What We’re Missing

What Evolution Provides

Evolution compresses enormous amounts of knowledge directly into our DNA:

Instincts and reflexes
Emotional systems
Reward structures
Physical embodiment
Multi-generational learning

Key insight: “Evolution is doing something that looks like compression.” Billions of years of experience are encoded in our genome before we’re born.

The Pre-training Fallacy

Current AI pre-training attempts to mimic evolution’s compression, but:

Evolution’s advantage: 3+ billion years of accumulated knowledge
Pre-training’s advantage: Billions of text tokens from internet
Evolution’s edge: Physical embodiment and real-world consequences
Pre-training’s limitation: Statistical patterns without understanding

Hypothesis: Pre-training is actually holding back neural networks. We’ve done so much pre-training that we’re reaching diminishing returns.

In-Context Learning: A Key Discovery

What It Is

The ability of models to learn and adapt within a single conversation or session:

Models receive examples in their context window
They apply learning immediately to new tasks
All learning happens within token window
No weight updates needed

How Powerful It Is

Real example: Pre-training = 300+ gigabytes of compression; In-context learning = 320 kilobytes

This dramatic compression suggests:

Much learning can happen in a single session
Lifetime learning (human-like learning over time) is possible within tokens
The context window acts like working memory

Limitations

Current constraint: One session only

Learning stays in context window
Doesn’t persist beyond single interaction
Doesn’t transfer to future sessions
All learning is temporary

The Memory Problem: Context Windows and Persistence

The Challenge

To build true agents, we need:

Persistent memory - Learning that survives sessions
Long-horizon reasoning - Planning beyond immediate context
Experience accumulation - Getting better over time
World models - Understanding how actions affect reality

Current Solutions (Limited)

Context windows: Acting like prefrontal cortex (working memory)
In-context learning: Like prefrontal cortex function
But missing: Long-term memory, consolidated learning, embodied understanding

The Gap

“Normally you would say humans have continual learning horizons longer than one session.” Current models don’t.

Knowledge Types: Repository vs. Runtime

Two Types of Knowledge

1. Repository Knowledge (Static)

Accumulated over lifetime
Baked in during training
Hard to update or correct
Like pre-training knowledge
Problem: Outdated, inflexible

2. Runtime Knowledge (Dynamic)

Learned within current session
Flexible and updateable
Uses in-context learning
Built fresh as needed
Like prefrontal cortex processing

The Interplay

Humans constantly:

Build repository knowledge over years
Activate relevant portions in prefrontal cortex
Combine with fresh runtime knowledge
Adapt based on current context

Current models:

Have repository knowledge (pre-training)
Can do runtime processing (in-context learning)
But can’t transfer runtime to repository
Each session starts from scratch

The Agent Problem: Concrete Example

What Happens When Models Write Code

When building agents to write code:

Models generate try-catch statements
Add defensive programming
Make code more complex
Actually reduces effectiveness
“Not net useful”

Why This Happens

Models are pattern-completing from internet examples:

Internet code often has try-catch blocks
But this adds complexity for agent scenarios
Models can’t reason about whether it’s actually needed
They just continue the pattern

The Fundamental Issue

No understanding of purpose (writing clean agent code vs. defensive code)
Pattern matching without comprehension
No feedback from actual execution
Can’t learn to write better code

Technical Insights

Pre-training and In-context Learning as Gradient Descent

Both implement something like gradient descent:

Pre-training: Gradient descent over massive data
In-context learning: Implicit gradient descent within attention mechanism
But: In-context learning is much faster and more flexible

Knowledge Compression

“There’s a miraculous compression happening.” The compression ratios show:

Pre-training compresses billions of text tokens
In-context learning happens in kilobytes
Suggests dramatic efficiency differences
Hints at overlooked mechanisms

The Role of Loss Function

“I think that’s probably holding back the neural networks.” Pre-training objective might be suboptimal for real-world reasoning.

The Time-Travel Perspective

Betting on the Future

Karpathy’s thought experiment:

What were the AI breakthroughs 33 years ago?
Backpropagation (1986): Used everywhere today
Attention mechanisms: Recently discovered
Vision transformers: Emerging

The Pattern

“All these things have been discovered decades ago.” Many of today’s breakthroughs are rediscoveries of old ideas:

Given enough compute
Given enough data
Given enough scaling
Old ideas become practical

Extrapolation

“Maybe half is progress.” Roughly half of current improvements come from:

New algorithmic insights
Better scaling
Better training procedures
Better architectures

But roughly half might be:

Applying old ideas at scale
Better tuning
More compute

Knowledge Accumulation During Development

Two Phases of Learning

Phase 1: Building the Repository

Pre-training on large data
Building foundational knowledge
Happens once during training
Creates base competence

Phase 2: Building During Use

Runtime learning within sessions
In-context adaptation
Feedback-driven improvement
Continuous refinement

The Challenge

Humans have both phases:

Early education (pre-training equivalent)
Lifetime learning (continual learning)
Recent experience in working memory
Consolidated learning in long-term memory

Models currently have:

Pre-training phase only (or weak continual learning)
In-context learning phase (within session)
No persistent learning between sessions

Implications for AI Progress

The Ghost-Animal Gap

We’re building systems that:

Process information like minds
But lack embodied grounding
Have no evolutionary heritage
Can’t learn from real-world consequences
Must be fully specified by training

What Needs to Change

Embodiment: Physical grounding, robotic interaction
Persistent learning: Between-session memory and knowledge consolidation
Real-world feedback: Direct consequences affecting behavior
Evolution-like structures: Inherited constraints and instincts
Temporal depth: Learning over long time horizons

The Optimism

“I feel like the problems are tractable, they’re not unsolvable.” Despite the gaps:

These are engineering problems
Not fundamental impossibilities
Two decades of AI experience shows progress is possible
The trajectory has consistently beaten expectations

Key Takeaways

1. The Ghost Problem is Real

We’re building systems fundamentally different from biological intelligence.

2. Embodiment Matters

Physical grounding, real-world interaction, and evolutionary pressure shape intelligence in ways text training cannot replicate.

3. In-Context Learning is Underappreciated

The ability to learn within a single session (320 KB compression) suggests untapped potential.

4. Memory is Critical

Persistent learning between sessions is essential for true agent behavior.

5. Current Agents Don’t Work

Because they lack the embodied, persistent, feedback-driven learning that makes biological agents effective.

6. Pre-training May Have Limits

We’ve scaled pre-training extensively. Improvement might now require new approaches.

7. Human Learning is Hybrid

Humans combine repository knowledge (evolved + learned), runtime processing, and persistent learning. Current models only partially implement this.

8. Problems Are Tractable

These are engineering challenges, not fundamental impossibilities. Progress is achievable.

The Core Question

How do we transition from summoning ghosts to building something that actually understands and can interact with the world?

The answer likely involves:

Embodied learning experiences
Persistent memory across sessions
Real-world feedback loops
Something more like biological development
Moving beyond pure statistical pattern completion

Discussion Topics

Can in-context learning be scaled to true continual learning?
What form should persistent memory take in AI systems?
How do we create feedback loops without physical embodiment?
Can we simulate evolution’s compression algorithmically?
What would it look like to build an “animal” instead of a “ghost”?
How important is embodiment for real intelligence?
Can context windows eventually replace persistent memory?
What’s the next breakthrough beyond scaling?

This discussion highlights the fundamental gap between current AI systems and biological intelligence. The path to truly capable AI agents likely requires addressing the ghost-animal problem: giving systems not just intelligence patterns, but embodied experience, persistent learning, and real-world consequences.

November 8, 2025 ∙