Building a Multi-Agent System with the Claude Agent SDK and Box

AI agents are great at reasoning and tool use. But to do useful work, they need access to the content your work depends on. Documents, spreadsheets, receipts, tax forms — the things that lives in cloud storage, not in a prompt.

This post walks through a lightweight app where a team of Claude agents uses the Box CLI to work with real files in Box. The agents search for documents, extract structured data, create a summary spreadsheet, upload it, and share the result. All from a single high-level prompt.

The full source is available at box-community/tax-with-box.

The problem

Start with a Box folder full of tax documents — W-2s, 1099s, receipts — and the goal is to produce a summary spreadsheet, upload it back to Box, and share it with a tax preparer.

That takes several steps: explore the folder, figure out what's there, extract the key fields from each document, build the spreadsheet, upload it, and create a share link. The Claude Agent SDK fits here because it lets you build agents that plan, use tools, and break a goal into smaller tasks.

How the agents are defined

The entire agent configuration lives in a single file. Each agent is a description, a set of tools, and a system prompt built from markdown skill files:

// src/config/agents.ts

function readSkill(skillDir: string): string {
  const skillPath = join(PROJECT_ROOT, ".claude", "skills", skillDir, "SKILL.md");
  return readFileSync(skillPath, "utf-8");
}
const boxCliSkill = readSkill("box-cli");
const explorerSkill = readSkill("explorer");
const extractorSkill = readSkill("extractor");
const xlsxSkill = readSkill("xlsx");

Each skill is a markdown file, much like a runbook you'd hand to a coworker. The Box CLI skill teaches the agent how to search, upload, download, and manage sharing through npx @box/cli commands. The agent-specific skill tells it what to do with those capabilities.

An agent definition looks like this:

explorer: {
  description: "Discovers and classifies tax documents in Box using the Box CLI.",
  model: "sonnet",
  tools: ["Bash", "Read"],
  prompt: `${preamble}
${boxCliSkill}
${explorerSkill}`,
},

Same pattern for every agent. The Extract agent pulls structured data from documents. The Excel agent builds spreadsheets. The Collaboration Manager handles sharing and permissions. Each one is a different combination of tools and skills:

"excel-agent": {
  description: "Creates spreadsheets (Tax Data Summary, P&L) from structured data.",
  model: "sonnet",
  tools: ["Bash", "Read", "Write"],
  prompt: `${preamble}
${boxCliSkill}
${xlsxSkill}`,
},

None of these agents know about each other. They're workers waiting for a task. The Plan Agent is the one that decides which agents to run, and when.

The Plan Agent

The Plan Agent sits above the others and handles orchestration. Given a high-level goal, it breaks the work into discrete tasks, each assigned to a specialist.

It's intentionally limited; it can read files and run commands to understand what's in Box, but it can't write files or delegate directly:

const isPlan = !agentKey || agentKey === "plan";
const allowedTools = isPlan
  ? ["Bash", "Read", "AskUserQuestion"]
  : [...ALLOWED_TOOLS, "AskUserQuestion"];

To create tasks, the Plan Agent writes structured text with markers:

[TASK:Explorer] Find and classify all tax documents in Box folder 123 [/TASK]
[TASK:Extractor] Extract W-2 and 1099 data from the identified documents [DEPENDS:#1] [/TASK]
[TASK:Excel Agent] Build a tax summary spreadsheet and upload to Box [DEPENDS:#2] [/TASK]

The streaming layer picks up these markers and turns them into task cards in the UI:

// Scan the Plan Agent's streamed text for [TASK:...] markers.
// Strip them from the visible output and emit task_created events instead.

planTextBuffer += event.delta.text;
const { cleanText, tasks, updates } =
  extractAndReplaceTaskMarkers(planTextBuffer);
if (tasks.length > 0) {
  for (const task of tasks) {
    yield {
      type: "task_created",
      agent: task.agent,
      prompt: task.prompt,
      summary: task.prompt.slice(0, 80),
      ...(task.dependsOn ? { dependsOn: task.dependsOn } : {}),
    };
  }
}

Task cards appear in real time as the plan takes shape, and the user stays in control of what runs and when.

Running the agents

The entry point to the Claude Agent SDK is a single call to query(). Pass it a prompt, a model, the allowed tools, and (for specialist agents) the agent definitions from agents.ts. The SDK handles the conversation loop, tool execution, and streaming:

const sdkStream = query({
  prompt: message,
  options: {
    model: AGENT_MODEL,
    env: {
      ...process.env,
      ...(boxCredentials
        ? { BOX_DEVELOPER_TOKEN: await getBoxAccessToken(boxCredentials) }
        : {}),
    },
    systemPrompt: buildSystemPrompt(workDir, agentKey),
    allowedTools,
    ...(agentDefinitions ? { agents: agentDefinitions } : {}),
    maxTurns: 50,
    permissionMode: "default",
    includePartialMessages: true,
    canUseTool,
  },
});

The BOX_DEVELOPER_TOKEN gets injected into the environment so every npx @box/cli command the agent runs is authenticated against Box. No SDK wiring or token refresh logic in the agent — the CLI handles that.

What happens end to end

Given the prompt "Create a tax summary spreadsheet from my tax documents in Box, upload it, and share with my tax preparer":

Plan Agent breaks the goal into tasks: explore, extract, build spreadsheet, share.

Explore Agent scans the Box folder and classifies the documents (W-2s, 1099-NECs, receipts).

Extract Agent pulls structured fields from each document — employer names, wages, federal tax withheld.

Excel Agent builds the summary spreadsheet, uploads it to Box, and creates a shared link.

Each agent runs Box CLI commands under the hood: files:search, files:upload, files:shared-links:create — and streams results back through the UI.

Wrapping up

The whole system comes down to two things:

Each agent is simple. A system prompt, a set of tools, and access to Box through the CLI. No custom abstractions.

The Plan Agent handles orchestration. It turns a goal into a task board. The user decides what runs and when.

To try it out, the source is at box-community/tax-with-box. Sign up for a free Box developer account to get started.