Hot take: most of the “AI progress” people feel is from ReAct loops, not the LLMs themselves

Understanding the True Drivers of Progress in AI: Beyond Foundation Models

In recent discussions about artificial intelligence advancements, there’s a prevailing narrative that each new iteration of large language models (LLMs)—such as GPT-4—represents a monumental leap forward. However, a closer examination suggests that much of the perceived progress may be less about the core models themselves and more about the surrounding frameworks and techniques that enhance their capabilities.

The Myth of Sole Model Progress

While it’s undeniable that foundational models have improved over time, attributing the majority of recent AI breakthroughs solely to these models might be an oversimplification. The rapid evolution of AI capabilities often correlates with innovations in how we leverage these models rather than just improvements in the models’ architectures or training data.

The Power of System-Level Enhancements

Much of the enhanced performance attributed to current AI systems stems from supplementary processes such as:

Reinforcement and Recurrent Loops: Enabling models to think, act, observe, and respond over multiple iterations creates a form of pseudo-agency, significantly boosting effectiveness.
Tool Integration: Allowing models to access external tools—databases, calculators, APIs—extends their capabilities beyond text generation.
Structured Workflows & Memory: Implementing memory management and orchestrated workflows helps maintain context and manage complex tasks.
Retries and Planning: Incorporating mechanisms for retries and multi-step planning improves reliability and depth.

These system-level enhancements essentially transform static models into dynamic agents capable of complex task execution, leading to a dramatically different user experience.

The Impact of Multi-Step, Context-Aware Interactions

When an LLM transitions from generating isolated responses to participating in iterative, multi-step reasoning processes—such as think -> act -> observe -> respond—the perceived intelligence and usefulness increase exponentially. This shift is arguably more significant than raw improvements in the underlying model architecture alone.

Rethinking Progress Curves

Analyzing recent progress more critically suggests that recent gains may follow a pattern of diminishing returns in the base models themselves. Instead of continuous, groundbreaking innovations, much of the current “jump” in capabilities may be powered by sophisticated scaffolding—extensions, tools, and orchestrated workflows—that leverage existing models more effectively.

A Thought Experiment

Consider this scenario: taking an earlier version of GPT-3.5 or GPT-4, deploying it within a robust, multi-step reasoning framework equipped with tools, retries, and state management. Comparing this setup to the model’s standard, standalone performance would likely reveal closer capabilities than current hype suggests.

Although these older models weren’t explicitly trained with direct tool integration in mind, integrating them into such systems could yield surprising results—potentially matching much of the practical utility we associate with state-of-the-art models today.

Implications for AI Development and Perception

This perspective invites a reevaluation of how we perceive progress in AI. It suggests that:

The core models may have plateaued relative to their initial rapid improvements.
The real advancements come from innovative system architectures, orchestration, and tooling.
Future progress hinges on optimizing the interplay between models and their surrounding frameworks.

Conclusion

While the enthusiasm around large language models is well-founded, it’s crucial to recognize that much of the recent “progress” in AI is amplified by the systems and methodologies built around these models. These enhancements effectively function as trick lenses, magnifying the models’ capabilities and enabling more complex, useful interactions.

Discussion Points
Are we overestimating the capabilities of the models themselves? Could strategic improvements in system design and tooling account for a significant portion of perceived AI progress? Reflecting on these questions may help guide more realistic expectations and targeted innovation in the field.

Author’s Note: This perspective aims to foster a nuanced understanding of AI advancements, emphasizing the importance of system engineering alongside model development.

Holidays in Europe