LLMs can find zero-day vulnerabilities but give wrong walking directions – Karpathy’s explanation for why

Understanding the Limitations and Strengths of Large Language Models: The Concept of “Jagged Intelligence”

In a recent presentation at Sequoia Ascent 2026, renowned AI researcher Andrej Karpathy introduced the idea of “jagged intelligence” to explain why large language models (LLMs) often demonstrate striking capabilities in some areas while underperforming in others. Rather than simply attributing inconsistencies to randomness or incompetence, Karpathy provided a foundational framework that clarifies this phenomenon, emphasizing the “jagged” nature of their abilities.

The Core Formula: Capability as a Function of Verifiability and Attention

Karpathy distilled the complexity of LLM performance into a key relationship: An LLM’s capability in a specific domain is roughly proportional to the product of verifiability and training attention. To unpack this:

Verifiability: The ease with which correct or incorrect outputs can be clearly identified.
Training Attention: The focus and reinforcement the model receives on relevant data during training.

In practical terms, the significant improvements seen in domains like coding stem largely from the fact that correctness is immediately verifiable—tests either pass or fail. Reinforcement learning techniques reinforce success, leading to rapid progress. As Karpathy notes, “Traditional computers automate what you can specify, whereas LLMs automate what you can verify.”

The Contrast: Car Washes vs. Chess Puzzles

Karpathy uses illustrative analogies to explain how different domains influence the model’s effectiveness:

Car Wash Scenario: Imagine a situation where users don’t report every instance of not finding a car wash after walking certain distances. Such sparse feedback means the model receives little to no direct signal about its performance, limiting its learning and reliability in this domain.
Chess Puzzles: Conversely, every move in a game of chess can be immediately evaluated as correct or wrong. This clear, immediate feedback, accumulated over countless games, greatly improves the model’s proficiency in understanding and predicting chess moves.

This highlights a critical point: models perform best in areas where the feedback loop is fast, cheap, and unambiguous. In domains lacking such signals, their abilities tend to be more “jagged” and less reliable.

The Hidden Complexity of Benchmark Scores

A common misconception is to gauge a model’s trustworthiness based on its overall benchmark score. However, Karpathy emphasizes that these scores are averages that can obscure the nuanced, “jagged” distribution of capabilities across different tasks.

A model might achieve 90% accuracy on a comprehensive benchmark—implying high competence overall—but still be unreliable on specific, less verifiable tasks. The aggregate score masks areas where the model may be prone to errors or overconfidence.

Practical Implications: How to Use LLMs Effectively

Rather than relying solely on general trustworthiness, a more nuanced approach involves evaluating whether the domain offers:

Fast Feedback: Can errors be quickly identified and corrected?
Cheap Feedback: Is it inexpensive to verify outputs?
Unambiguous Feedback: Is the correctness of outputs immediately clear?

Domains with these characteristics should be treated as areas where LLMs can act as advanced collaborators, assisting with high confidence. Conversely, in areas lacking rapid, inexpensive, and clear feedback, practitioners should verify outputs independently, regardless of overall benchmark scores.

Reflective Questions for Practitioners

Have you encountered scenarios where an LLM’s performance exceeded or fell short of expectations based on overall scores? Recognizing these “jagged” capabilities can inform better deployment strategies and risk management.

Conclusion

Karpathy’s concept of “jagged intelligence” offers a valuable lens to understand the variable performance of LLMs across domains. By focusing on the nature of feedback signals—fast, cheap, and unambiguous—practitioners can better identify where these models excel and where cautious verification remains essential. This framework promotes more effective and responsible integration of AI into real-world applications, acknowledging both their remarkable strengths and current limitations.

Holidays in Europe