How Box built its AI agent with LangGraph

When we set out to transition Box AI into a multi-step agentic system, the team faced a fundamental architectural question: build a custom agent execution engine from scratch, or build on top of a proven framework? The answer was LangGraph — and the decision has shaped every layer of the Box Agent platform.

This article is a deep dive into how the platform uses LangGraph as its core execution engine, how agent graphs are defined and deployed, and what it looks like in practice to run production-grade AI agents on top of LangGraph at enterprise scale.

Key takeaways:

Box transitioned its AI agent to a multi-step agentic system by choosing LangGraph as its core execution engine over building a custom solution from scratch
The platform is structured into five distinct layers consisting of entry points, an intelligence service, an Agent Orchestrator, an LLM gateway, and model providers
Agents are defined using Box's agent definition language and compiled into runnable deep agents at runtime to allow behavior updates without code deployments

The Box Agent meets the growing challenge of enterprise content

Why LangGraph

When Box set out to build its next-generation Box Agent platform, LangGraph was the clear choice. It solves the core limitation of earlier frameworks — linearity — with a graph-based execution model that supports branching, parallel execution, typed shared state, built-in checkpointing, and real-time event streaming.

Critically, LangGraph’s architecture maps directly to Box’s most powerful agent type: the Deep Agent Node, which combines planning middleware, dynamic tool use, and subagent orchestration. In this model, the LLM doesn’t just react to individual tool results. It forms multi-step plans, sequences tool calls across Box’s full Tool Registry, and delegates to specialized subagents via LangGraph’s composable Subgraph Node pattern.

This progression — from simple tool-calling nodes to full deep agents capable of autonomous planning and delegation — is only possible because LangGraph treats the LLM as a first-class participant in control flow, not just a text generator. The team accepted a strong framework coupling as a deliberate trade-off for capabilities that would otherwise require significant custom infrastructure.

Architecture overview

The system is organized into five layers:

Entry points: Surfaces through which users and systems interact with the agent
Intelligence service: The API and routing layer for internal and external services
Agent Orchestrator: The LangGraph-based agentic execution engine
LLM gateway: Provider abstraction, quota management, and routing
Model providers: The underlying LLMs

Agents are defined using Box’s Agent Definition Language (ADL) and compiled into runnable Deep Agents at runtime, making it possible to ship new behaviors, prompt strategies, and model configurations without code deployments.

Entry points

The system accepts requests from three surfaces:

AI Center AX: The primary Box UI experience for end users
AI APIs: Programmatic access for developers and integrations
Box MCP: Model context protocol interface for agent-to-agent and external system interoperability

All three surfaces funnel into the intelligence service, providing a unified entry path regardless of how the request originates.

Intelligence service

The intelligence service exposes the AI APIs through which both internal Box services and external integrations interact with the agent platform. It acts as the system’s front door — receiving incoming requests, normalizing them, and routing them into the Agent Orchestrator. This separation keeps entry-point concerns — authentication, surface-specific formatting, rate limiting — cleanly decoupled from agent execution logic.

Agent Orchestrator — The LangGraph executor

The Agent Orchestrator is the LangGraph-based agentic execution service that enables running AI agents at scale. It’s responsible for fetching agent configurations at runtime, compiling them into runnable Deep Agents, and managing the full lifecycle of agent execution.

Agent definition language (ADL) and AI Studio

The Agents Platform is multi-tenant by design. Multiple teams at Box use it to build their own agentic product experiences. Teams use ADL to declare an agent’s tools, instructions, model configuration, and guardrails; the Agent Orchestrator compiles these into a runnable Deep Agent at runtime. This democratizes agent creation across Box, allowing product teams to ship agent behaviors without owning infrastructure.

{
  "id": "my_agent",
  "title": "Human-readable Title",
  "releaseState": "PREVIEW",
  "components": { /* Reusable type definitions */ },
  "state": { /* Shared memory variables */ },
  "graph": { /* Execution logic: nodes and edges */ },
  "stream": { /* UI integration and event transformers */ },
  "inputs": [ /* properties from the state that are required */ ],
  "outputs": [ /* properties that the client can retrieve after execution */ ],
  "config": [ /* properites for configuring an agent */ ]
}

For end users, AI Studio provides a configuration layer on top of ADL. Users can customize Box Agent by providing custom instructions, specifying knowledge sources, and selecting the model the agent uses. These configurations are fetched and applied by the Agent Orchestrator at runtime, enabling live modifications to agent behavior without redeployment.

The Box Agent and LangGraph's Deep Agents framework

The Box Agent is built on LangGraph's Deep Agents framework. Deep Agents gives us a pre-built planner-executor loop: The agent dynamically constructs an execution plan based on the user’s request, leverages the full set of tools available to it, and iterates through complex, multi-step tasks until it determines it has arrived at an appropriate response (rather than following a fixed execution path). Sub-agent delegation, todo-list planning, and a virtual filesystem for intermediate artifacts come built in.

The Deep Agent itself is expressed as a node within a broader LangGraph StateGraph. That outer StateGraph is where Box wires in everything that surrounds the agent loop — guardrails, middleware hooks, pre- and post-processing steps, and any custom routing logic the Agent Orchestrator needs. The Deep Agent node handles the planning and execution loop; the surrounding graph handles everything else.

That layering matters. We didn't have to author the planner-executor topology ourselves, but because the Deep Agent lives inside a StateGraph, we still get the full power of the LangGraph runtime: checkpointing, interrupts, the send API for parallel fan-out, and hierarchical subgraph composition. This is what lets the Box Agent handle both simple retrieval tasks and deeply nested, multi-agent workflows within the same execution model.

Global Agent and sub-agents

The Global Agent is the Deep Agents orchestrator. It receives a normalized request, classifies intent, and produces an execution plan. When the task warrants, it spawns sub-agents dynamically to handle pieces of the work in parallel. Rather than routing to a fixed roster of specialized sub-agents, the Global Agent decides at runtime what sub-agents to create, what tools and instructions to give them, and how to fan work out across them.

This is where Deep Agents’ delegation model pays off. The Global Agent can spin up a sub-agent scoped to analyze a single document, another scoped to searching across a folder, and another scoped to composing a summary — all within the same execution, all with isolated context windows, and all reporting back through the middleware layer. Sub-agents themselves can call tools in a loop and communicate with the parent agent or peer sub-agents as needed.

Because sub-agents are spawned dynamically rather than predefined, the system scales naturally to tasks Box’s product teams haven’t explicitly designed for. ADL defines the Global Agent’s tools, instructions, and overall behavior. The sub-agent topology emerges at runtime from the model’s plan.

What Deep Agents gives us:

Dynamic execution planning and on-demand sub-agent spawning
A built-in planning loop that iterates until the task is complete
Context isolation per sub-agent so parent context stays clean
Intent classification that shapes the execution plan

What the underlying LangGraph runtime gives us:

Parallel fan-out via the Send API for concurrent sub-agent execution
Inter-agent communication between parent and sub-agents through middleware
Interrupt-based human-in-the-loop for pause/resume flows without losing state
Hierarchical subgraph composition with proper state isolation between nested graphs
Checkpointing via a custom SessionCheckpointer built on LangGraph's checkpoint interface, enabling mid-execution fault recovery and session resumption

Foundation layer

Tools: LangChain’s tool interface gives agents access to 75+ capabilities, including BM25 keyword search, vector search, file operations, structured Q&A over spreadsheets, and citation generation. Because sub-agents spawned by the Global Agent are invoked through the same tool-call interface as primitive tools, the Global Agent’s invocation surface stays uniform whether it’s running a BM25 search or delegating to a freshly spawned sub-agent.

Middleware: Rather than modifying agent prompts or logic directly, Box uses middleware that intercepts LangChain model calls and tool calls to inject cross-cutting behavior: prompt caching to reduce cost and latency on multi-turn conversations, citation generation that runs post-response using embedding-based matching, and context management that summarizes conversation history beyond 170K tokens to prevent context overflow. Middleware also serves as the communication channel between sub-agents and the parent agent.

Guardrails: A dedicated guardrails layer enforces safety, compliance, and quality constraints on agent inputs and outputs, isolated from core agent logic.

Checkpointing: Persistence and resumability via LangGraph

One of the most operationally critical features of Box’s agent platform is resumability — the ability to recover an interrupted agent session and continue from where it left off. This is powered entirely by LangGraph’s checkpointing mechanism.

As a LangGraph graph executes, it periodically generates checkpoints that capture:

The full agent state (accumulated data, tool inputs/outputs)
Timestamps and version IDs
The current position in the graph

Box persists these checkpoints as floating AppData (Box’s internal metadata storage). When a session needs to be resumed — whether due to a failure, a timeout, or a human-in-the-loop pause — Agent Orchestrator reloads the LangGraph graph from the latest checkpoint and continues execution seamlessly.

This is also the foundation for human-in-the-loop workflows. The Interrupt Node type pauses graph execution at a defined point, writes a checkpoint, and waits for user input. When the user responds, the graph resumes from the checkpoint with the new input injected into state.

Streaming: From LangGraph events to real-time UI

Box’s agent platform streams execution updates to users in real time as the graph runs. This is handled by the stream section of the agent definition, which defines transformers — mappings from LangGraph node events to UI components.

Two transformer types are supported:

Input transformers: Map user events (e.g., a new message) to state variables
Output transformers: Map node events (e.g., a completed tool call) to UI components like ThinkingDetail or AdditionalProcessData

Transformer logic is executed via jq (not Python) to avoid security risks from arbitrary code execution. LangGraph’s event emission model makes this straightforward. Every node emits typed events as it executes, and the transformer layer maps those events to the appropriate UI updates via Server Sent Events.

Security: User-scoped execution

A critical design constraint for Box’s agent platform is that agents execute under the user’s identity, not a privileged agent identity. This is enforced at the platform level, not the LLM level.

When a user starts an agent session:

The user's JWT is passed to Agent Orchestrator
Every tool call made by the agent uses this JWT
Box's standard permission model applies — agents can only access what the user can access
If the user lacks permissions for a file or folder, the tool call fails with a permissions error

Even if an LLM generates a tool call requesting access to a resource the user doesn’t have permission to access, the platform enforces the boundary. The agent cannot escalate privileges, impersonate other users, or bypass access controls. LangGraph’s tool invocation model makes this clean — tool calls are intercepted by Agent Orchestrator, which injects the user’s JWT before dispatching.

LLM gateway

All model calls from the Agent Orchestrator flow through a centralized LLM gateway before reaching any provider. The gateway integrates with multiple LLM providers and handles quota management, cost attribution, load balancing, and token-based rate limiting — abstracting away the complexity of managing multiple provider integrations behind a single proxy interface.

There’s a nice payoff to this design: Because the LLM Gateway acts as a proxy that presents a uniform interface to the Agent Orchestrator, Box can use LangGraph’s native provider libraries directly without modification. Provider-specific concerns are handled at the gateway, not inside the agent logic.

Model providers

Box runs the same agent configurations across multiple LLM providers with per-provider settings like temperature, prompt caching behavior, and reasoning effort levels managed through LangChain’s model abstraction layer. This makes it straightforward to route specific task types to the most cost-effective or highest-quality model available without changing agent logic.

Conclusion

Building a production-grade AI agent system for the enterprise requires more than connecting a model to a few tools. It requires a layered architecture that can handle dynamic task planning, parallel execution, fault recovery, cross-cutting concerns like safety and cost, and live configuration changes — all without sacrificing reliability or developer velocity.

LangGraph gave Box the foundation to do that without starting from scratch. LangChain’s model and tool abstractions kept provider and capability integrations clean. LangGraph’s StateGraph runtime unlocked the stateful, interrupt-aware, parallelizable execution model the system needed. And Deep Agents provided the planner-executor loop at the center of it all, so Box could focus on enterprise-specific concerns — ADL, the LLM Gateway, middleware, AI Studio — rather than rebuilding core agent infrastructure.

The result is a system that scales from a simple file lookup to a deeply nested, multi-agent workflow across dozens of documents, all within a single, coherent execution model. That's the architecture powering Box AI today.

Want to build something like this? If you're an enterprise developer looking to build intelligent document workflows,explore Box's developer platform.