Andrej Karpathy: We're Summoning Ghosts, Not Building Animals
Overview
Andrej Karpathy, a leading AI researcher, discusses the fundamental differences between current AI systems and biological intelligence. The core thesis: We’re building “ghosts” (digital entities without embodied experience) rather than “animals” (systems grounded in physical reality and evolution).
Main Argument: The Ghost Problem
Core Metaphor
“We’re summoning ghosts, not building animals.”
This distinction highlights a critical gap in how we approach AI:
- Biological entities (animals): Grounded in physical world, shaped by evolution, embodied learning
- AI systems (ghosts): Purely digital, lacking embodied experience, shaped by human design
Why This Matters
The difference between training on data about flying and actually flying is fundamental. Humans and animals learn through:
- Physical embodiment and interaction
- Evolutionary pressure and survival
- Real-world consequences and feedback
AI systems lack all of these grounding mechanisms.
Current State: Why Agents Don’t Work
The Fundamental Problem
While AI excels at:
- Claude, ChatGPT, Codex (language models)
- Single-turn interactions
- Specific, bounded tasks
Agents don’t work because they require:
- Sequential decision-making
- Environmental interaction
- Learning from mistakes
- Persistent memory across contexts
- Understanding of consequences
Why Agents Currently Fail
- Lack of embodied experience: Models trained on text don’t understand physical consequences
- No real-world feedback loop: Can’t learn from actual world interactions
- Context limitations: Limited to token window (can’t maintain long-term learning)
- Policy vs. behavior: Models implement statistical patterns, not coherent strategies
The Evolution Parallel: What We’re Missing
What Evolution Provides
Evolution compresses enormous amounts of knowledge directly into our DNA:
- Instincts and reflexes
- Emotional systems
- Reward structures
- Physical embodiment
- Multi-generational learning
Key insight: “Evolution is doing something that looks like compression.” Billions of years of experience are encoded in our genome before we’re born.
The Pre-training Fallacy
Current AI pre-training attempts to mimic evolution’s compression, but:
- Evolution’s advantage: 3+ billion years of accumulated knowledge
- Pre-training’s advantage: Billions of text tokens from internet
- Evolution’s edge: Physical embodiment and real-world consequences
- Pre-training’s limitation: Statistical patterns without understanding
Hypothesis: Pre-training is actually holding back neural networks. We’ve done so much pre-training that we’re reaching diminishing returns.
In-Context Learning: A Key Discovery
What It Is
The ability of models to learn and adapt within a single conversation or session:
- Models receive examples in their context window
- They apply learning immediately to new tasks
- All learning happens within token window
- No weight updates needed
How Powerful It Is
Real example: Pre-training = 300+ gigabytes of compression; In-context learning = 320 kilobytes
This dramatic compression suggests:
- Much learning can happen in a single session
- Lifetime learning (human-like learning over time) is possible within tokens
- The context window acts like working memory
Limitations
Current constraint: One session only
- Learning stays in context window
- Doesn’t persist beyond single interaction
- Doesn’t transfer to future sessions
- All learning is temporary
The Memory Problem: Context Windows and Persistence
The Challenge
To build true agents, we need:
- Persistent memory - Learning that survives sessions
- Long-horizon reasoning - Planning beyond immediate context
- Experience accumulation - Getting better over time
- World models - Understanding how actions affect reality
Current Solutions (Limited)
- Context windows: Acting like prefrontal cortex (working memory)
- In-context learning: Like prefrontal cortex function
- But missing: Long-term memory, consolidated learning, embodied understanding
The Gap
“Normally you would say humans have continual learning horizons longer than one session.” Current models don’t.
Knowledge Types: Repository vs. Runtime
Two Types of Knowledge
1. Repository Knowledge (Static)
- Accumulated over lifetime
- Baked in during training
- Hard to update or correct
- Like pre-training knowledge
- Problem: Outdated, inflexible
2. Runtime Knowledge (Dynamic)
- Learned within current session
- Flexible and updateable
- Uses in-context learning
- Built fresh as needed
- Like prefrontal cortex processing
The Interplay
Humans constantly:
- Build repository knowledge over years
- Activate relevant portions in prefrontal cortex
- Combine with fresh runtime knowledge
- Adapt based on current context
Current models:
- Have repository knowledge (pre-training)
- Can do runtime processing (in-context learning)
- But can’t transfer runtime to repository
- Each session starts from scratch
The Agent Problem: Concrete Example
What Happens When Models Write Code
When building agents to write code:
- Models generate try-catch statements
- Add defensive programming
- Make code more complex
- Actually reduces effectiveness
- “Not net useful”
Why This Happens
Models are pattern-completing from internet examples:
- Internet code often has try-catch blocks
- But this adds complexity for agent scenarios
- Models can’t reason about whether it’s actually needed
- They just continue the pattern
The Fundamental Issue
- No understanding of purpose (writing clean agent code vs. defensive code)
- Pattern matching without comprehension
- No feedback from actual execution
- Can’t learn to write better code
Technical Insights
Pre-training and In-context Learning as Gradient Descent
Both implement something like gradient descent:
- Pre-training: Gradient descent over massive data
- In-context learning: Implicit gradient descent within attention mechanism
- But: In-context learning is much faster and more flexible
Knowledge Compression
“There’s a miraculous compression happening.” The compression ratios show:
- Pre-training compresses billions of text tokens
- In-context learning happens in kilobytes
- Suggests dramatic efficiency differences
- Hints at overlooked mechanisms
The Role of Loss Function
“I think that’s probably holding back the neural networks.” Pre-training objective might be suboptimal for real-world reasoning.
The Time-Travel Perspective
Betting on the Future
Karpathy’s thought experiment:
- What were the AI breakthroughs 33 years ago?
- Backpropagation (1986): Used everywhere today
- Attention mechanisms: Recently discovered
- Vision transformers: Emerging
The Pattern
“All these things have been discovered decades ago.” Many of today’s breakthroughs are rediscoveries of old ideas:
- Given enough compute
- Given enough data
- Given enough scaling
- Old ideas become practical
Extrapolation
“Maybe half is progress.” Roughly half of current improvements come from:
- New algorithmic insights
- Better scaling
- Better training procedures
- Better architectures
But roughly half might be:
- Applying old ideas at scale
- Better tuning
- More compute
Knowledge Accumulation During Development
Two Phases of Learning
Phase 1: Building the Repository
- Pre-training on large data
- Building foundational knowledge
- Happens once during training
- Creates base competence
Phase 2: Building During Use
- Runtime learning within sessions
- In-context adaptation
- Feedback-driven improvement
- Continuous refinement
The Challenge
Humans have both phases:
- Early education (pre-training equivalent)
- Lifetime learning (continual learning)
- Recent experience in working memory
- Consolidated learning in long-term memory
Models currently have:
- Pre-training phase only (or weak continual learning)
- In-context learning phase (within session)
- No persistent learning between sessions
Implications for AI Progress
The Ghost-Animal Gap
We’re building systems that:
- Process information like minds
- But lack embodied grounding
- Have no evolutionary heritage
- Can’t learn from real-world consequences
- Must be fully specified by training
What Needs to Change
- Embodiment: Physical grounding, robotic interaction
- Persistent learning: Between-session memory and knowledge consolidation
- Real-world feedback: Direct consequences affecting behavior
- Evolution-like structures: Inherited constraints and instincts
- Temporal depth: Learning over long time horizons
The Optimism
“I feel like the problems are tractable, they’re not unsolvable.” Despite the gaps:
- These are engineering problems
- Not fundamental impossibilities
- Two decades of AI experience shows progress is possible
- The trajectory has consistently beaten expectations
Key Takeaways
1. The Ghost Problem is Real
We’re building systems fundamentally different from biological intelligence.
2. Embodiment Matters
Physical grounding, real-world interaction, and evolutionary pressure shape intelligence in ways text training cannot replicate.
3. In-Context Learning is Underappreciated
The ability to learn within a single session (320 KB compression) suggests untapped potential.
4. Memory is Critical
Persistent learning between sessions is essential for true agent behavior.
5. Current Agents Don’t Work
Because they lack the embodied, persistent, feedback-driven learning that makes biological agents effective.
6. Pre-training May Have Limits
We’ve scaled pre-training extensively. Improvement might now require new approaches.
7. Human Learning is Hybrid
Humans combine repository knowledge (evolved + learned), runtime processing, and persistent learning. Current models only partially implement this.
8. Problems Are Tractable
These are engineering challenges, not fundamental impossibilities. Progress is achievable.
The Core Question
How do we transition from summoning ghosts to building something that actually understands and can interact with the world?
The answer likely involves:
- Embodied learning experiences
- Persistent memory across sessions
- Real-world feedback loops
- Something more like biological development
- Moving beyond pure statistical pattern completion
Discussion Topics
- Can in-context learning be scaled to true continual learning?
- What form should persistent memory take in AI systems?
- How do we create feedback loops without physical embodiment?
- Can we simulate evolution’s compression algorithmically?
- What would it look like to build an “animal” instead of a “ghost”?
- How important is embodiment for real intelligence?
- Can context windows eventually replace persistent memory?
- What’s the next breakthrough beyond scaling?
This discussion highlights the fundamental gap between current AI systems and biological intelligence. The path to truly capable AI agents likely requires addressing the ghost-animal problem: giving systems not just intelligence patterns, but embodied experience, persistent learning, and real-world consequences.