The market for AI agent toolkits in 2026 is booming. LangGraph, CrewAI, Agno, Claude Agent SDK, Google ADK: the list just keeps growing. Whether you want to call it an agent framework, an orchestration engine, or agent harness, they all share a common goal: to make it easy for developers to build agents. They compete on aspects such as the number of tool integrations, how to manage guardrails against prompt injection, and time-to-first-demo. The engineering is undoubtedly impressive. But what about optimizing for the end user? After a deep code review of the leading frameworks, I found a gap that none of them address. What happens when facing requests that are vague, incomplete, or just plain confusing?
The prevailing trend on social media is that AGI is right around the corner, but cracks begin to appear as soon as we consider the type of progress we have made and just how far we have to go to create something useful for most people. Let’s examine the evidence. Many researchers base their super-intelligence timelines on the compounding growth in benchmark progress. But benchmarks aren’t real life. OK, then what about AI systems from big (and small labs winning IMO gold medals. Or what about coding agents deployed in production building entire applications in one shot? Surely, these are pretty real right?
The deep learning revolution was initiated not by intelligent algorithms alone, but with the help of powerful compute and access to data. While models have continued to improve and compute power has continued to advance, progress on data collection has not been so fortunate. Data labeling companies would have you believe that they alone are the solution to all our data problems - provided, of course, that we pay their exorbitant fees. However, relying on vendors avoids the core issue since the underlying cost and complexity of gathering high quality data remains untouched. As things currently stand, the effort needed to annotate each additional batch of data increases over time as the examples reach the long tail of the distribution. Any company able to figure out a way to transform data annotation from a cost-center into an area of innovation gains an enviable competitive advantage. But how?
On the surface, reinforcement learning (RL) seems like a great method for solving dialogue tasks. We can easily model the problem as a POMDP where the partially observed state represents the user’s intent. During each turn, the dialogue agent must make a decision about how to respond. This action space is represented as either a series of tokens or simplified even further into a single dialogue act. Lastly, task-oriented dialogue offers a natural reward — whether or not the dialogue succeeded. And yet, we don’t see (m)any deployed dialogue systems trained with RL. Why is that?
Conversational AI was all the rage a few years back, when people were shouting from the rooftops that chatbots were going to take over the world. But for all the fanfare and hullabaloo, the trumpeting of a new era has given away lately to a low, dull roar. Depending on who you ask, we either have AGI right around the corner, or all this noise is simply over-hyped technology soon to float away like vaporware of the past. I believe the more likely outcome is that the true answer lies somewhere in the middle – there will be a revolution, but it won’t happen overnight. Instead, changes will start out incremental as the technology is rolled out and users will slowly adopt new social norms around dealing with virtual assistants. I don’t claim to know when this will happen or exactly what it will look like, but certainly there are some clues.