The model is rarely the problem

|
Share

Most companies are getting quick wins from AI: faster writing, smarter search, the everyday lift of a good chatbot. The harder thing, the thing that most companies are still working out, is using AI to change how the business itself runs. 

Box CEO Aaron Levie has been blunt about why: "While AI models have an incredible amount of capability packed into them, there's no shortcut to getting that intelligence applied to a business process in a stable way."

Closing that gap is becoming a job of its own: the Forward Deployed Engineer (FDE). 

To understand the thinking behind behind Box's new FDE program — and what Levie calls one of the most in-demand roles in tech — we spoke with Alex Leutenegger and Gilbert Ortega-Rivera, two Box AI architects who’ve spent the past year embedded with customers and helped design our FDE program.

Both have watched the same pattern play out. A customer arrives certain the model is the problem: it isn't accurate or smart enough. But usually the model is fine; what's missing is everything around it.

"What I've found over the last couple of years is that AI can do fascinating things very quickly out of the box” says Ortega-Rivera. “But producing very specific results takes real intention, and real experience working with enterprise context, to produce good results." 

Ortega-Rivera's background is in information architecture — shaping unstructured content so people, and now agents, can find what they need. He calls this work context engineering, with a deliberately wide definition of context. "It's not just your lengthy prompt," he says. "It's the data pipeline from your secured enterprise content that we help transform to feed into an LLM in the best way possible." Or, as Leutenegger puts it: the goal is to bring AI to your content, rather than make you bring your content to the AI.

That distinction is where most companies get stuck. 

While AI models have an incredible amount of capability packed into them, there's no shortcut to getting that intelligence applied to a business process in a stable way.

Aaron Levie, Box CEO

They start with the interface they know. There are "significant individual productivity gains" from a chat box, Ortega-Rivera says, "but the limiting factor is the hands on the keyboard" - the fact that a chat box tends to be used by one person, speeding up their individual work. 

His job, as he describes it, is to take a customer "from an interaction with a chatbot to an enterprise set of outcomes."

"As we understand what a customer is trying to accomplish and help them see how their process actually works,” he continues, “there are a ton of opportunities to inject AI into it to scale it — and to break away from that individual chat-box productivity. 

“Some steps belong to an AI agent; others are cheaper and more reliable as deterministic code; an API call instead of asking the agent to work it out every time. Knowing which is which is most of the job.”

For Leutenegger, the urgency is recent and specific. "What's really driving the ‘why now’” he says, “is that the capability of models have vastly increased in the last six months. Take what's becoming widespread in coding and software engineering and apply it to other functions: accounting, finance, audit, copywriting. Today they're often considered limited in scope to coding, but they can expand out so much more with this increase in capability."

The clearest example came up for Gilbert about eight months ago. A customer wanted to read medical charts and determine whether a specific insurance event, a motor vehicle accident, was present in the text. It mattered for revenue: how a chart is classified drives downstream billing.

The customer had run sample charts through ChatGPT and landed at roughly a 70% hit rate. They were convinced they had a model problem, and expected Box to tell them whether Claude beat Gemini beat ChatGPT. "What we came to instead was that their prompt needed real work," Gilbert says. Refined with industry expertise and a clearer picture of the process, the rate passed 90% — a significant jump in the revenue that automated assessment could produce. 

AI can do fascinating things very quickly out of the box. But, producing very specific results takes real intention, and real experience working with enterprise context, to produce good results.

Gilbert Ortega-Rivera, Box AI architect

But the team wasn't satisfied. After some 50 hours of prompt work in a single week, they were sure 99% was reachable, and that they were still missing context.So "We can do this better" became "We can do this better with you." Gilbert sat down with the people who were processing thousands of charts a day. "One of the things we found,” he recalls, “was that our prompt didn't tell the LLM that an accident on an electric scooter should count as a motor vehicle accident. It's those little things; every organization that handles any given business process thinks about it a little differently. So it was important for us to glean that context from the customer to get to that result."

Sometimes the missing context isn't a detail in a prompt — it's data the company never kept. Gilbert describes a customer that managed touring musicians and wanted help judging whether an incoming offer (this venue, this city, this date) was a good one. The catch was that they only uploaded the deals they accepted. "They lack a ton of perspective on what bad deals actually look like,” Gilbert says. “They only know what good deals look like." Box could enrich every offer that came in, but answering the harder question meant changing the process, not just the technology. "Here's how your process is going to have to change," he tells customers like that. "I'm going to ask you to upload way more context." 

The pattern is common: companies keep the records of wins and let everything else evaporate. Every rejected offer, every bad venue, every deal that fell through is context the model actually needs — but exactly what the company never kept. 

Keeping that kind of work from stalling at "good enough" is a discipline that Leutenegger insists must come first. "Everything we do is grounded in evals," he says, meaning  evaluation frameworks, the tests that measure whether a workflow still returns the right answers. Evals are what let you improve a system safely, and keep improving it as the models underneath it change and prompts and agents multiply.

You need someone who is very good at business-process transformation in the traditional sense, and very comfortable playing with LLMs to figure out what's actually possible

Gilbert Ortega-Rivera, Box AI architect

It's also why the work takes a particular kind of person. You need someone, Gilbert says, who is "very good at business-process transformation in the traditional sense, and very comfortable playing with LLMs to figure out what's actually possible." Few companies have that combination on staff, and the technology is moving too quickly for most to build it from scratch. The goal of forward deployed engineers is to fill that gap between what AI can now do and the people who can put it to work inside a real business.

The message through the whole conversation is consistent: the hard part of enterprise AI is rarely the model; it's the context around it, the process, and the judgment about which parts of the work belong to an agent and which don't. 

By Levie's reckoning, that work isn't going anywhere. "This is a job that is going to be around," he has written, "as long as AI keeps changing rapidly, which it inevitably will."

Learn more about Box's Forward Deployed Engineer program.