Building Multi-Agent workflows with OpenAI’s new SDK and Box

|
Share
Building Multi-Agent workflows with OpenAI’s new SDK and Box

Today, OpenAI introduced the new Agents SDK, a lightweight yet powerful framework designed to help developers build multi-agent workflows with ease. This new SDK simplifies the process of orchestrating LLM-powered agents, each equipped with specific instructions, tools, guardrails, and handoffs, enabling seamless execution of complex tasks.

Core concepts of OpenAI’s agents SDK

At its foundation, the Agents SDK introduces several key capabilities:

  • Agents: Configurable LLMs equipped with instructions, tools, and execution constraints.
  • Handoffs: A mechanism that enables agents to transfer control to other agents for specialized tasks.
  • Guardrails: A way to validate inputs and outputs, ensuring adherence to constraints and guidelines.
  • Tracing: Built-in execution tracing to capture, debug, and optimize agent workflows.

Using OpenAI agents with Box

To explore the capabilities of this new SDK, we created a demo that integrates OpenAI Agents with Box, allowing an agent to locate and query documents stored in Box. The goal was to give the agent a set of tools to reach content stored in Box while using OpenAI’s powerful new WebSearchTool and the LLM to analyze and generate insights from that content.

If you would like to tinker with this demo the source code is available in our community GitHub repo.

Use Case: Analyzing Q4 financial research

To test our implementation, we uploaded Q4 earnings reports from various tech companies into Box. Our OpenAI-powered agent was then tasked with:

  1. Locating specific reports based on user queries.
  2. Extracting relevant information from each document.
  3. Generating an analysis report comparing financial performance across companies.
  4. Combining Box and WebSearch to get deeper insights into specific topics based on this analysis.

The agent approach allowed us to separate concerns effectively — one agent specialized in Box capable of document retrieval, and data extraction.From here we handover the context back to OpenAI.

Key benefits of this Integration

  • Automation of document search: The agent can efficiently locate reports stored in Box using metadata and natural language queries.
  • Seamless extraction of structured data: With Box’s AI capabilities and OpenAI’s processing power, key financial metrics can be automatically extracted.
  • Multi-step reasoning with handoffs: Agents collaborate, passing tasks to the right specialized agent at each step.
  • WebSearchTool(): With a single import and a single line of code, you can easily add live internet data into your proprietary enterprise data to get even broader insights.

How was this done

The core of this sample is very straight forward. We define an agent, and the tools available to it:

box_agent = Agent(
    name="Box Agent",
    instructions="""
    You are a very helpful agent. You are a financial expert. 
    You have access to a number of tools from Box that allow you
    to search for files in Box either holistically or by set criteria.
    You can also ask Box AI to answer questions about the files or you
    can retriever the text from the files. Your goal is to help the user
    find the information they need.
    """,
    tools=[
        file_search,
        ask_box,
        get_text_from_file,
        box_search_folder_by_name,
        box_list_folder_content_by_folder_id,
        WebSearchTool(),
    ],
)

A simple user input to interact with the agent:

async def main():
    user_msg = input("How can I help you today:\n")
    agent = box_agent
    inputs: list[TResponseInputItem] = [{"content": user_msg, "role": "user"}]
    while True:
        result = Runner.run_streamed(
            agent,
            input=inputs,
        )
        async for event in result.stream_events():
            if isinstance(event, ResponseTextDeltaEvent):
                print(event.delta, end="", flush=True)
            elif isinstance(event, ResponseContentPartDoneEvent):
                print("\n")

        answer = strip_markdown(result.final_output)
        answer.replace("\n\n", "\n")
        print(f"{answer}\n")

        inputs = result.to_input_list()
        print()

        user_msg = input("Follow up:\n")

        inputs.append({"content": user_msg, "role": "user"})

We implement the different tools for the agent to access Box. For example we can have a tool for OpenAI to ask Box AI about a file:

@function_tool
async def ask_box(file_id: str, prompt: str) -> str:
    """
    Ask box ai about a file in Box.
    Type: function
    Args:
        file_id (str): The ID of the file to read.
        prompt (str): The prompt to ask the AI.
        type: function
    return:
        str: The text content of the file.
    """
    ai_agent = box_ai_agent_ask()
    response = box_file_ai_ask(
        BoxAuth().get_client(), file_id, prompt=prompt, ai_agent=ai_agent
    )

    return response

The last level is the actual interaction with the Box API:

def box_file_ai_ask(
    client: BoxClient, 
    file_id: str, 
    prompt: str, 
    ai_agent: AiAgentAsk = None
) -> str:
    mode = CreateAiAskMode.SINGLE_ITEM_QA
    ai_item = AiItemBase(id=file_id, type=AiItemBaseTypeField.FILE)
    response = client.ai.create_ai_ask(
        mode=mode, prompt=prompt, items=[ai_item], ai_agent=ai_agent
    )
    return response.answer

Implemented tools

  • file_search — Search for files in Box using queries, file extensions, and other parameters
  • get_text_from_file — Read the text content of a specific Box file
  • ask_box — Ask Box AI questions about a file’s content
  • box_search_folder_by_name — Locate a folder in Box by its name
  • box_list_folder_content_by_folder_id — List the contents of a folder
  • WebSearchTool() — Get relevant insights from the web to supplement your proprietary data.

What’s next?

Just like that we created an agent using the new OpenAI SDK to access content in Box.

The application of these concepts really shine when you bring more context sources into the fold. Imagine combining more agents to access your own internal systems of record, reading the latest from an internal website, accessing your Salesforce, and other SaaS systems.

Are you interested in exploring how OpenAI’s Agents SDK can be used in your workflow? Let us know how you plan to leverage multi-agent workflows in your applications!