Upload and download binary with MCP: How Box solved the last mile of agentic file editing

In our previous post, we outlined practical approaches to delivering unstructured content over the model context protocol (MCP), from choosing the right representation to using signed URLs. We detailed the architectural trade-offs, security considerations, and governance patterns required to make AI agents truly useful in production enterprise environments.

Today, we want to look closer at one of the most complex technical hurdles we tackled: enabling AI agents to reliably upload and download binary files without ever placing that raw binary data into the LLM context window. Here’s how we built it, the architecture behind it, and the enterprise guardrails we established along the way.

Key takeaways:

The Base64 Bottleneck: The traditional approach of encoding binary files into base64 strings and passing them through the LLM context window causes severe context bloat, payload failures, high latency, and data corruption risks
Box bypasses the LLM context window entirely by using temporary, single-use Signed URLs — which allows AI agents to transfer binary files directly between Box storage and local execution environments using standard curl commands
The architecture ensures security and compliance through short-lived URL expirations, native OAuth session binding for complete user attribution, and strict admin opt-in controls

AI agents are increasingly capable of advanced reasoning, planning, and executing multi-step workflows. However, a subtle gap continues to undermine their full potential: Agents work comfortably with text-native formats like Markdown, while human workflows still center on complex binary documents — PDFs, PowerPoints, Word files, and spreadsheets.

If an agent can only parse a text representation of a file but lacks the capability to write back to the original binary, its enterprise utility is fundamentally limited. You can ask it to summarize a presentation, but you can’t ask it to update one. Without bidirectional editing capabilities, an agent behaves less like an autonomous coworker and more like an advanced search interface.

Our objective was clear: Enable seamless, agentic file editing within Box that functions reliably and securely, without binary data ever touching the LLM context.

Why the “obvious” approach breaks in production

The naive solution to this problem is to base64 encode the binary data and pass it directly through the LLM context window. A typical transaction using this pattern follows these steps:

AI client calls download: The AI client triggers the download tool via the MCP server
Server encodes binary: The MCP server fetches the target file, base64 encodes the binary, and returns it as a string payload within the MCP response
Agent edits string: The AI client receives the base64 string in its context window, decodes it, applies the necessary edits, and re-encodes the content back to base64
AI client calls upload: The AI client invokes the upload tool, passing the modified base64 string back to the server
Server decodes and saves: The MCP server decodes the base64 string and uploads the restored binary to the final storage destination

While this approach works conceptually on paper, it completely breaks down under production conditions for several key reasons:

Context bloat: Binary tokens crowd out useful context and degrade model attention. A base64-encoded file consumes a massive chunk of the context window, leaving little room for the actual prompt instructions and reasoning tokens.
Payload limits: Many MCP clients cap tool-call payload sizes. Because base64 encoding inflates file sizes by roughly one-third (e.g., a 75KB file expands to ~100KB of encoded text), larger enterprise files simply fail to transmit.
Latency: LLM inference over long base64 strings is incredibly slow. What should be a near-instantaneous file operation turns into a multi-minute wait, severely dragging down system responsiveness.
Corruption risk: Non-deterministic LLM inference can subtly alter characters within the base64 data during the round-trip, silently corrupting the underlying binary data structure.

When benchmarked against our approach using the Claude Desktop App, the performance variations were stark:

The traditional base64 approach doesn’t just underperform. It actively fails when exposed to real-world enterprise file dimensions.

The architecture: Signed URL tools on the MCP server

To bridge this gap, we implemented Signed URL tools directly on the Box MCP server. Instead of routing massive binary strings through the LLM context, the agent utilizes a temporary, secured URL to transfer files directly between Box and the local execution environment — completely bypassing the model context window.

The core execution flow operates through four clear phases:

URL request: The agent calls a dedicated tool requesting a signed upload or download URL
Token generation: The MCP server generates a temporary, single-use URL dynamically bound to the user’s active OAuth session
Direct transfer: The agent executes a standard curl command to transfer the binary file data directly between storage and the local environment, ensuring no binary data enters the LLM context window
Audit and attribution: The entire transfer payload is logged, maintaining complete visibility and attributing actions to both the specific user and the application that generated the credential

Designing for enterprise governance

When translating this architecture to enterprise environments, we anchored our design decisions around security, visibility, and control:

Single-use, short-lived URLs: The generated URLs feature aggressive expiration windows and can only be resolved once, minimizing potential intercept attack surfaces.
OAuth session binding: Comprehensive attribution is preserved natively. All downloads and uploads are hard-tied to the authenticated user and the application, maintaining the strict audit trails required by enterprise compliance frameworks.
Admin opt-in control: This capability is turned off by default. Enterprises retain ultimate sovereignty over when and where to authorize this feature, providing IT and security teams with granular control prior to a broad rollout. Admins can easily activate the feature by navigating to Admin Console → Integrations → Box MCP Server → Files and Folders → Custom Configuration and enabling get_upload_url and get_download_url.

As a result, agents can seamlessly generate a PDF report and upload it to Box, pull down a financial spreadsheet to append a summary page, or convert documents to alternative formats and save them back as a new version — all reliably, at true enterprise scale.

Key takeaways for MCP developers

If you’re building or deploying MCP servers to handle enterprise file content, our engineering experience surfaces a few critical guardrails:

Binary data belongs in storage, not in context: Utilize text, markdown, or clean structured representations for read-only use cases. Transition exclusively to signed URLs or programmatic execution paths when an agent needs to edit, modify, or create files.
Balance functionality with security: The optimal architecture isn't the most permissive one; it’s the one that equips agents with the exact capabilities they need while operating within enterprise-grade guardrails.
Performance at scale is non-negotiable: The base64 approach uploaded a corrupted file at 175 KB and entirely failed at 20MB, whereas Box's signed URL design handles the transfer in ~30 seconds. This performance gap represents the difference between an isolated proof-of-concept and a resilient production system.
Admin control is a core feature: Shipping advanced capabilities with strict admin opt-in configurations ensures governance. Enterprises must have the visibility to stage rollouts and enforce continuous data governance over autonomous agent activities.

A note on domain allowlisting

Depending on your organization's specific architecture, certain MCP clients require explicit domain allowlisting for upload and download URLs to function correctly. The required target domains will vary depending on your specific Box Zone deployment. For an exhaustive list, please visit our developer resource on New Box Zones Domains. Alternatively, you can configure wildcard rules within your environment to capture all necessary endpoints:

upload.*.box.com
*.boxcloud.com
*.box.com

Bring agentic editing to your workflows

The historical gap between the static files agents interact with and the live files enterprise knowledge workers use daily is closing. With signed URL tools integrated into the Box MCP server, AI agents can finally replicate standard workflows safely: open a file, optimize its content, and save it back securely at scale.

See it in action: Try prompting your agent with complex file workflows, such as: “Download the Acme financial overview PDF, add a summary to the first page, and re-upload as a new version.”

To explore the architecture further and enable the Box MCP server for your enterprise, visit our documentation.