How AI agents are evolving from following instructions to figuring it out

Your dream is to delegate most of this launch strategy to a colleague, but it can’t just be any old assistant. It has to be someone who won’t merely follow instructions but proactively figure out what needs to be done, create a plan, and deliver comprehensive results as good (or better) than you could on your own.

You already have access to that colleague, but it’s not a person on your team. It’s not a human, in fact.

In a recent AI Explainer Series podcast conversation, Box CTO Ben Kus and Senior Product Marketing Manager for AI, Meena Ganesh, discussed the major transformation happening around AI agents. They’re evolving from simple question-answerers to autonomous problem-solvers that can tackle complex, multi-step challenges without explicit instructions.

AI agents are evolving from simple question-answerers to autonomous problem-solvers that can tackle complex, multi-step challenges without explicit instructions.

Box CTO, Ben Kus

“I don’t think people realize that there’s actually a fundamental shift going on in the way AI agents work,” Ben says. He’s right — and this shift will redefine how knowledge workers approach their daily tasks.

Key takeaways:

AI agents are evolving from simple instruction-followers to autonomous problem-solvers that can create their own plans and deliver complex results without step-by-step guidance
The shift from AI tools to AI colleagues means knowledge workers will transition from doing tasks themselves to managing multiple AI agents working in parallel, similar to how they would manage human teams
Today’s AI models can now “think” — creating plans, iterating on approaches, checking their work, and adjusting strategies much like experienced professionals would

Three waves of AI assistance in a few short years

To understand where we’re heading with enterprise AI, let’s quickly trace the evolution.

Wave 1: Retrieval engines
Remember your first ChatGPT experience? You asked a direct question — ”summarize this podcast episode” or “find a good quote from Meena in this transcript” — and got an answer in seconds. These systems excelled at on-demand information retrieval: fast, focused, and transformative (for their moment).

Wave 2: Workflow executors

Next came agents that could follow structured processes. Provide them with templates, source materials, and clear steps, and they'd execute reliably. For instance, these agents can analyze multiple podcast episodes to extract three key points, then format the response according to specified guidelines. They're procedural experts — highly capable within defined boundaries.

Wave 3: Strategic reasoners
This is where deep agents emerge. Consider this request: "Create a new episode outline for this podcast series based on everything we've done and our goals going forward." No template. No step-by-step playbook. Just a strategic objective. These systems reason backward from outcomes, synthesize context across multiple sources, and make judgment calls about priorities and approach — much like a strategic partner would."

AI makes lists, just like a human would do

For contrast, Meena described how she would tackle this task as a human: “I would first look at all of the past episodes. I would then look at topics that did well and opportunities where we had additional things to talk about. And I would look at analysis and statistics, and start by coming up with additional feature topics.”

Without being given an explicit set of instructions, it figures out what to do to get to the outcome it’s being asked for.

Box CTO, Ben Kus

In other words, Meena’s instinct was to first create a to-do list for the project in order to define the process up front. And that’s exactly what an AI agent does. Without being given an explicit set of instructions, it figures out what to do to get to the outcome it’s being asked for.

Ben says, “One of the biggest differences with agents now is that they're getting very good at making to-do lists, making plans, thinking about it, iterating on it, and giving you something you want. This is a big evolution.”

From retrieval engine to strategic reasoner

This evolution represents more than just incremental improvement in generative AI capability. As Ben explains of the latest wave of AI agents, “Although they take longer [than basic genAI apps and agents], they’re relatively quick, and they do very complex work.”

The implications are profound when you consider Meena’s revelation about her daily reality: “There is only so much time in a given day. And the deadlines associated with these different tasks, they’re often non-negotiable.”

With autonomous agents, the calculus changes. Knowledge workers won’t just use AI tools; they’ll manage AI teams.While one agent develops your launch strategy, another drafts enablement materials, and a third analyzes competitor positioning. As a human, you can then shift from doing the work to orchestrating it.

3 safeguards every enterprise needs to prevent AI agent misalignment

Context rot, the silent threat to AI accuracy

The art of managing AI agents

In certain ways, managing AI agents mirrors managing human teams. When Ben asks how Meena would handle an agent that doesn’t deliver perfect results, her answer is telling: “Whatever they come to me with, I’d probably review it. I might give them some pointers — ‘Hey, this part was great, but can can you change this?’”

Using agentic AI is not about perfection on the first try. It’s about iteration, feedback, and guidance — skills every manager already possesses. As Ben noted, “Just like with people, sometimes you realize that you gave bad instructions.”

Using agentic AI is not about perfection on the first try. It’s about iteration, feedback, and guidance — skills every manager already possesses.

Box CTO, Ben Kus

The most successful users of these new agents are already adopting this iterative, collaborative mindset when working with AI agents. For instance, in software development, programmers are asking AI agents to build entire programs or refactor existing applications without having to provide detailed step-by-step instructions.

Instead of telling the agent exactly where to go and what to change, developers simply describe the desired outcome and let the agent figure out the implementation. The agent takes full ownership of the task, devising its own plan, writing code, testing it, debugging when it fails, and iterating until it works.

The outcome isn’t instant. It might take minutes or even hours. But developers are becoming managers of multiple AI agents, reviewing their output and providing guidance while the agents handle the complex execution work in parallel.

Why now? The technical leap forward

As Meena asks, “We’ve always talked about AI in the context of being so smart it can do all of these things. So why didn’t they work very well?”

Earlier attempts like BabyAGI and AutoGPT had the right concept but lacked capability. The AI models weren’t yet good enough. They’d get overwhelmed by context, stuck on complex tasks, or produce strange results.

What’s changed? Today’s models can think.

“The agent starts up front thinking about it a lot. It doesn’t just do what you’re saying,” Ben explains. They create plans, iterate on approaches, check their work, and adjust — much like an experienced professional would.

The practical reality for knowledge workers

Hopefully your wheels are turning. For instance, maybe you might use AI for small tasks within your marketing department:

Writing social media posts based on existing content
Putting pitch decks together from supporting materials
Generating performance reports and insights

But for something more complex, like creating a thorough, connected, multifaceted launch strategy for a new product, AI might seem too simple. This type of project involves a lot of materials, research, stakeholders, work streams, and higher knowledge of brand positioning.

Yet, this is exactly the type of complex project autonomous agents can now handle. Give them access to roadmaps, market research, customer feedback, and brand guidelines. Tell them the outcome you need. Let them figure out the steps.

Embracing the shift from simple AI tools to autonomous AI

The transition from simple AI tools to autonomous AI agents requires a mindset shift. Instead of asking “What can AI do for me?” knowledge workers should ask “What would I delegate to a smart, capable team member?”

The answer might surprise you. That launch strategy, competitive analysis, or process documentation you’ve been putting off? An autonomous agent could be working on it right now while you focus on strategic decisions only humans can make.

As organizations navigate this transition, the winners won’t be those who resist this evolution. They’ll be the ones who recognize that managing AI agents is simply the next evolution of knowledge work, where human creativity, judgment, and strategic thinking remain irreplaceable, but the execution can be dramatically amplified.

So what have you always wondered that an AI agent can do? It’s probably more than you think.

Watch the full AI Explainer episode The next gen of AI automation to learn more about what it means for an agent to be “autonomous,” and what this means for enterprises building intelligent workflows.