Adijaya Inc


Andrej Karpathy: We're Summoning Ghosts, Not Building Animals

Overview

Andrej Karpathy, a leading AI researcher, discusses the fundamental differences between current AI systems and biological intelligence. The core thesis: We’re building “ghosts” (digital entities without embodied experience) rather than “animals” (systems grounded in physical reality and evolution).

Main Argument: The Ghost Problem

Core Metaphor

“We’re summoning ghosts, not building animals.”

This distinction highlights a critical gap in how we approach AI:

  • Biological entities (animals): Grounded in physical world, shaped by evolution, embodied learning
  • AI systems (ghosts): Purely digital, lacking embodied experience, shaped by human design

Why This Matters

The difference between training on data about flying and actually flying is fundamental. Humans and animals learn through:

  • Physical embodiment and interaction
  • Evolutionary pressure and survival
  • Real-world consequences and feedback

AI systems lack all of these grounding mechanisms.

Current State: Why Agents Don’t Work

The Fundamental Problem

While AI excels at:

  • Claude, ChatGPT, Codex (language models)
  • Single-turn interactions
  • Specific, bounded tasks

Agents don’t work because they require:

  • Sequential decision-making
  • Environmental interaction
  • Learning from mistakes
  • Persistent memory across contexts
  • Understanding of consequences

Why Agents Currently Fail

  1. Lack of embodied experience: Models trained on text don’t understand physical consequences
  2. No real-world feedback loop: Can’t learn from actual world interactions
  3. Context limitations: Limited to token window (can’t maintain long-term learning)
  4. Policy vs. behavior: Models implement statistical patterns, not coherent strategies

The Evolution Parallel: What We’re Missing

What Evolution Provides

Evolution compresses enormous amounts of knowledge directly into our DNA:

  • Instincts and reflexes
  • Emotional systems
  • Reward structures
  • Physical embodiment
  • Multi-generational learning

Key insight: “Evolution is doing something that looks like compression.” Billions of years of experience are encoded in our genome before we’re born.

The Pre-training Fallacy

Current AI pre-training attempts to mimic evolution’s compression, but:

  • Evolution’s advantage: 3+ billion years of accumulated knowledge
  • Pre-training’s advantage: Billions of text tokens from internet
  • Evolution’s edge: Physical embodiment and real-world consequences
  • Pre-training’s limitation: Statistical patterns without understanding

Hypothesis: Pre-training is actually holding back neural networks. We’ve done so much pre-training that we’re reaching diminishing returns.

In-Context Learning: A Key Discovery

What It Is

The ability of models to learn and adapt within a single conversation or session:

  • Models receive examples in their context window
  • They apply learning immediately to new tasks
  • All learning happens within token window
  • No weight updates needed

How Powerful It Is

Real example: Pre-training = 300+ gigabytes of compression; In-context learning = 320 kilobytes

This dramatic compression suggests:

  • Much learning can happen in a single session
  • Lifetime learning (human-like learning over time) is possible within tokens
  • The context window acts like working memory

Limitations

Current constraint: One session only

  • Learning stays in context window
  • Doesn’t persist beyond single interaction
  • Doesn’t transfer to future sessions
  • All learning is temporary

The Memory Problem: Context Windows and Persistence

The Challenge

To build true agents, we need:

  1. Persistent memory - Learning that survives sessions
  2. Long-horizon reasoning - Planning beyond immediate context
  3. Experience accumulation - Getting better over time
  4. World models - Understanding how actions affect reality

Current Solutions (Limited)

  • Context windows: Acting like prefrontal cortex (working memory)
  • In-context learning: Like prefrontal cortex function
  • But missing: Long-term memory, consolidated learning, embodied understanding

The Gap

“Normally you would say humans have continual learning horizons longer than one session.” Current models don’t.

Knowledge Types: Repository vs. Runtime

Two Types of Knowledge

1. Repository Knowledge (Static)

  • Accumulated over lifetime
  • Baked in during training
  • Hard to update or correct
  • Like pre-training knowledge
  • Problem: Outdated, inflexible

2. Runtime Knowledge (Dynamic)

  • Learned within current session
  • Flexible and updateable
  • Uses in-context learning
  • Built fresh as needed
  • Like prefrontal cortex processing

The Interplay

Humans constantly:

  • Build repository knowledge over years
  • Activate relevant portions in prefrontal cortex
  • Combine with fresh runtime knowledge
  • Adapt based on current context

Current models:

  • Have repository knowledge (pre-training)
  • Can do runtime processing (in-context learning)
  • But can’t transfer runtime to repository
  • Each session starts from scratch

The Agent Problem: Concrete Example

What Happens When Models Write Code

When building agents to write code:

  • Models generate try-catch statements
  • Add defensive programming
  • Make code more complex
  • Actually reduces effectiveness
  • “Not net useful”

Why This Happens

Models are pattern-completing from internet examples:

  • Internet code often has try-catch blocks
  • But this adds complexity for agent scenarios
  • Models can’t reason about whether it’s actually needed
  • They just continue the pattern

The Fundamental Issue

  • No understanding of purpose (writing clean agent code vs. defensive code)
  • Pattern matching without comprehension
  • No feedback from actual execution
  • Can’t learn to write better code

Technical Insights

Pre-training and In-context Learning as Gradient Descent

Both implement something like gradient descent:

  • Pre-training: Gradient descent over massive data
  • In-context learning: Implicit gradient descent within attention mechanism
  • But: In-context learning is much faster and more flexible

Knowledge Compression

“There’s a miraculous compression happening.” The compression ratios show:

  • Pre-training compresses billions of text tokens
  • In-context learning happens in kilobytes
  • Suggests dramatic efficiency differences
  • Hints at overlooked mechanisms

The Role of Loss Function

“I think that’s probably holding back the neural networks.” Pre-training objective might be suboptimal for real-world reasoning.

The Time-Travel Perspective

Betting on the Future

Karpathy’s thought experiment:

  • What were the AI breakthroughs 33 years ago?
  • Backpropagation (1986): Used everywhere today
  • Attention mechanisms: Recently discovered
  • Vision transformers: Emerging

The Pattern

“All these things have been discovered decades ago.” Many of today’s breakthroughs are rediscoveries of old ideas:

  • Given enough compute
  • Given enough data
  • Given enough scaling
  • Old ideas become practical

Extrapolation

“Maybe half is progress.” Roughly half of current improvements come from:

  • New algorithmic insights
  • Better scaling
  • Better training procedures
  • Better architectures

But roughly half might be:

  • Applying old ideas at scale
  • Better tuning
  • More compute

Knowledge Accumulation During Development

Two Phases of Learning

Phase 1: Building the Repository

  • Pre-training on large data
  • Building foundational knowledge
  • Happens once during training
  • Creates base competence

Phase 2: Building During Use

  • Runtime learning within sessions
  • In-context adaptation
  • Feedback-driven improvement
  • Continuous refinement

The Challenge

Humans have both phases:

  • Early education (pre-training equivalent)
  • Lifetime learning (continual learning)
  • Recent experience in working memory
  • Consolidated learning in long-term memory

Models currently have:

  • Pre-training phase only (or weak continual learning)
  • In-context learning phase (within session)
  • No persistent learning between sessions

Implications for AI Progress

The Ghost-Animal Gap

We’re building systems that:

  • Process information like minds
  • But lack embodied grounding
  • Have no evolutionary heritage
  • Can’t learn from real-world consequences
  • Must be fully specified by training

What Needs to Change

  1. Embodiment: Physical grounding, robotic interaction
  2. Persistent learning: Between-session memory and knowledge consolidation
  3. Real-world feedback: Direct consequences affecting behavior
  4. Evolution-like structures: Inherited constraints and instincts
  5. Temporal depth: Learning over long time horizons

The Optimism

“I feel like the problems are tractable, they’re not unsolvable.” Despite the gaps:

  • These are engineering problems
  • Not fundamental impossibilities
  • Two decades of AI experience shows progress is possible
  • The trajectory has consistently beaten expectations

Key Takeaways

1. The Ghost Problem is Real

We’re building systems fundamentally different from biological intelligence.

2. Embodiment Matters

Physical grounding, real-world interaction, and evolutionary pressure shape intelligence in ways text training cannot replicate.

3. In-Context Learning is Underappreciated

The ability to learn within a single session (320 KB compression) suggests untapped potential.

4. Memory is Critical

Persistent learning between sessions is essential for true agent behavior.

5. Current Agents Don’t Work

Because they lack the embodied, persistent, feedback-driven learning that makes biological agents effective.

6. Pre-training May Have Limits

We’ve scaled pre-training extensively. Improvement might now require new approaches.

7. Human Learning is Hybrid

Humans combine repository knowledge (evolved + learned), runtime processing, and persistent learning. Current models only partially implement this.

8. Problems Are Tractable

These are engineering challenges, not fundamental impossibilities. Progress is achievable.

The Core Question

How do we transition from summoning ghosts to building something that actually understands and can interact with the world?

The answer likely involves:

  • Embodied learning experiences
  • Persistent memory across sessions
  • Real-world feedback loops
  • Something more like biological development
  • Moving beyond pure statistical pattern completion

Discussion Topics

  1. Can in-context learning be scaled to true continual learning?
  2. What form should persistent memory take in AI systems?
  3. How do we create feedback loops without physical embodiment?
  4. Can we simulate evolution’s compression algorithmically?
  5. What would it look like to build an “animal” instead of a “ghost”?
  6. How important is embodiment for real intelligence?
  7. Can context windows eventually replace persistent memory?
  8. What’s the next breakthrough beyond scaling?

This discussion highlights the fundamental gap between current AI systems and biological intelligence. The path to truly capable AI agents likely requires addressing the ghost-animal problem: giving systems not just intelligence patterns, but embodied experience, persistent learning, and real-world consequences.