Today at OpenAI DevDay, OpenAI launched AgentKit, a new suite of tools designed to accelerate the development of sophisticated AI agents. AgentKit includes Agent Builder, a low-code platform for creating agentic workflows, and Evals, a tool for rigorously testing and evaluating agent performance. At Box, we believe the future of work is agentic, and we are excited to announce our support for AgentKit, enabling developers and enterprises to build and reliably deploy powerful, content-aware AI agents grounded in their own secure business data.
Building secure, context-aware AI agents with Agent Builder and the Box MCP server
An AI agent’s true value is unlocked when it can securely work with an enterprise's internal, proprietary data. OpenAI’s Agent Builder helps users stitch together these AI workflows using conversational interfaces and AI models, a task that previously required significant programming and manual setup. For these workflows to be truly transformative, they need to be grounded in your content - and that's where the Box MCP server comes in.
By connecting to the Box MCP server, Agent Builder is enhanced with a powerful toolkit for content-centric automation. Using plain-English instructions, you can give an agent the ability to perform tasks like structured and freeform data extraction, conduct Q&A across multiple documents, and even create new files directly within Box. This new level of interoperability allows you to build sophisticated agents that can, for example, analyze a financial report from a Box folder, combine its findings with data from another business system, and save a summary back to the folder. The entire process is completed without your sensitive content ever leaving the security of the Box environment, moving agentic AI from a theoretical concept to a practical tool for your business.
Ensuring reliability with AgentKit’s Evals
In the enterprise, accuracy is non-negotiable. Deploying agents requires certainty that they will perform reliably and correctly. This is why OpenAI's introduction of Evals is so significant; it provides an easy-to-use framework to evaluate agent performance, curate and manage evaluation data sets, and iterate on agent prompts, all of which are principles that are core to our own Box AI Enterprise Eval methodology.

Our own evaluation team has already put Evals to the test and seen its impact. The tool allowed us to automate our prompt optimization process, saving manual engineering effort that was previously spent documenting issues and tweaking prompts. With Evals, we can run multiple evaluations in the background, saving at least a full day of effort per run and allowing us to quickly identify the highest-performing model-and-prompt combination for a given task. This level of iterative and flexible testing is precisely what enterprises need to deploy AI agents with confidence.
Unlocking the agentic enterprise
The combination of OpenAI's AgentKit and Box addresses the core challenges that have limited the adoption of agentic AI in the enterprise. Organizations no longer need to choose between powerful AI and their own secure data. This partnership reinforces Box’s position at the cutting edge of enterprise AI by providing the secure and intelligent content layer - a key ingredient for building the next generation of real-world, agentic applications.
Start building today
The era of the enterprise AI agent is here. With OpenAI's AgentKit and the Box platform, you have the tools to build and the foundation of trust you need to innovate. Learn more about OpenAI’s AgentKit here.


