Structuring content to fuel context-driven use cases

|
Share

The AI-first era is accelerating. Fast. Businesses need to make sense of content, which is a daunting task with 90% of all company data unstructured. 

A different narrative is unfolding for leading-edge AI adopters. Frontrunners are turning content chaos into fuel, structuring millions of files with AI as they drive automation in every department. ​​At the heart of the shift is metadata, which gives context to content.

In a set of invoices, metadata might include client names, invoice numbers, total amount due, and other key details. In a legal contract, metadata could be risky clauses and other non-standard nuggets of information. “There’s a lot of inside knowledge in content like documents, images, agreements, and datasheets,” says Kelash Kumar, Vice President, Product Management - Workflows at Box. “But in the past, content has been very hard to work with.”

That’s changing. With AI agents now in the workplace, organizations can now accurately extract this type of data at scale from their unstructured content. Rather than relying on the old, rigid taxonomies of the past, they can quickly process key information via AI agents

With AI agents now in the workplace, organizations can now accurately extract this type of data at scale.

Kelash Kumar, Vice President, Product Management - Workflows at Box

Kumar goes on to say, “With the emergence of agentic AI and LLMs, now you can understand the meaning of text within documents and the meaning of images — deeply.” 

Key takeaways:  

  • AI-enhanced metadata extraction pulls structured data from unstructured data to drive automation and workflows 
  • Extracted metadata connects repositoriesand systems of record, enabling operational actions like pushing contract data to Salesforce 
  • Testing and iteration helps ensure accuracy, and teams can supplement approaches with testing against control templates 
  • Custom extract agents and agentic workflow tools let teams handle varied document types and scale use cases across the enterprise 
  • Standardizing metadata reduces manual effort, with customer wins like $200K+ saved and 83% faster onboarding 

From PDF to process: Content workflows reimagined

Box AI Extract Agents convert unstructured files such as documents, PDFs, and images into structured, machine-readable intelligence — the bridge enterprise organizations need to gain automation, visibility, and insight from their content at scale.

The Box Standard Extract Agent is designed for high-volume, structured, or semi-structured documents where speed and flexibility are paramount. It can perform quick metadata extraction across a lot of file types.

The Box Enhanced Extract Agent is ideal for complex, unstructured, or lengthy documents, (think multi-party contracts, clinical trial reports, or regulatory filings). It uses advanced reasoning techniques, including chain-of-thought prompting, to ensure high accuracy and reliability.

These two Box Extract Agents help enterprise organizations unlock the promise of AI-powered business process automation, enable teams to make smarter business decisions, and streamline content discovery with remarkably faster search.

Extract Agents in action

 Here’s an example of a practical application of​ Box AI Extract Agents applied to a customer contract within the enterprise sales process.

  1. Box AI Extract Agents pull structured data like deal sizes, renewal dates, contract types, and terms and conditions from sales contracts that are long and complex, and accurately convert that information into metadata
  2. Sales teams can then surface all sales contracts that are currently in progress(or that need to be renewed) using views and dashboards within Box Apps that are powered by metadata extracted from Box AI Extract Agents
  3. Plus, teams can leverage Box AI Extract Agent APIs to push that metadata to third party apps like Salesforce 

Without the need for manual data entry or oversight, the process is automated, seamless, and accurate.   

This example, small in scope but ​massive​ in implication, shows how metadata connects ​content with systems of record. A contract, once a static PDF​ that had to be manually reviewed and synthesized​, ​is now ​transformed into a catalyst for workflow​ automation​​ ​and accurate forecasts. Proper governance ensures that Box AI Extract Agents​ (and the extracted metadata)​ adhere to the same strict controls and permissions policies that determine access to content across Box. 

Now, imagine all of the files across an organization, from invoices to bills of lading, and the impact automated data extraction will have against that content.​

Now, imagine all of the files across an organization, from invoices to bills of lading, and the impact automated data extraction will have against that content.​

How we use Box Extract Agents at Box

Our own ​Box employees use Box AI Extract Agents to save time and speed up work. The Box Consulting team, for one, has saved hundreds of manager hours, thousands of consultant hours, and roughly $200,000 annually by standardizing its SOW production with extraction templates and workflows.

Box Consulting saved roughly $200,000 annually using Box AI.

Plenty of Box customers ​​describe similar shifts. Fundwell, a financial technology platform that connects business owners with various funding solutions, reported an 83% reduction in client onboarding time after using Box to automate extraction for intake ​​documents.

At Insperity, a Texas-based provider of HR solutions for small-to-medium businesses, moving from manual records to metadata-driven workflows has allowed the company to grow without proportional increases in support. Managing Director for Data Privacy and Technology Compliance John Rhoades said: “One of the things that Box does beautifully is the democratization of the tools for the existing record center that we have. That has to be maintained by our IT product teams, and they have more of a developer and configuration background. A lot of the work now in Box can be done by the records team.”

Gopal Vangala, SAP Technology Lead at Loves Travel Stops and Country Stores, said standardizing metadata saves his teams roughly 20 hours a week otherwise spent on search and retrieval.

AI is key to unlocking the ability to accurately extract structured data from content at scale and apply it as metadata.

What these examples all demonstrate is a ​fundamental shift. Across industries​,​​ ​​documents come in dozens of formats, covering decades of history, and that potential can speak volumes for a business’ bottom line. AI is key to unlocking the ability to accurately extract structured data from content at scale and apply it as metadata. Metadata ​​allows us to tap into the value of our ​unstructured data, making it accessible for​ business process​ automation and ​​content discovery​.

When security and agility are no longer at odds

Technical feats with metadata are just part of the work involved in harnessing the value of context. Businesses also need trust and governance, and neither should come at the cost of agility. For that, experts turn to testing and iteration.

Rhoades described an approach where teams ground-truth a sample of documents, then compare extraction outputs to a control template: “You take a subset of documents, say, 5,000 to 7,500 documents. Run it through and see what the results are, see what the confidence is, and then ask: Do I change models? Do I change my metadata template? Is there something inherently flawed with this?”

Geoff Moore ofValmark Financial Groupadded that verification of metadata extraction accuracy is key as new AI models emerge. In fact, the Valmark team tests new models against its own trusted metadata template.

“The analysts would put in the known values, and then I have a separate metadata template for whenever a new model comes out,” Moore said. “I can quickly just test it, extract it, compare it to the control template, and make sure it’s good.” 

Metadata extraction turns content into fields and tags that systems can act on.

Model choice matters in agentic AI

Along with testing, nothing beats the immense flexibility of an infrastructure that supports different models and specialized agents.

One financial ​​​data extraction ​agent, for example, might be geared toward numeric tables. Another that’s tuned for long-tabular forms might route insurance illustrations.

In the case of Valmark, the value of multiple AI models became apparent as teams worked to add metadata to life insurance contracts. As analysts fine-tuned output, their selected model struggled a bit with multi-page tables and would “give up,” requiring the team to intervene and keep the process going. Then, one weekend, something unexpected happened: Moore reran this analysis, and AI performance dramatically improved.

“It just got better,” he said. “I thought, ‘Well, that’s really weird. What just happened?’”

After further analysis, Moore realized what had happened. At first, using the Structured Endpoint to extract data from life insurance policies that were scanned in as images, Valmark struggled to get great results. But when Box upgraded the LLM for the Enhanced Extract Agent to Gemini 2.5 Pro, the results automatically improved, giving it the capability to process the multi-page contracts.

This scenario highlights one of the biggest benefits of using Box for Intelligent Content Management and Box AI Extract Agents in particular: Box is continuously evaluating different models for Box AI and uses a model‑agnostic approach so customers can benefit from the latest LLM capabilities and choose the best model for their extraction needs.

Context is currency

A formula for success in leveraging context has crystallized across industries: Structure your content. Connect it to business systems. Watch the results roll in. 

Owens Corning CIO Annie Baymiller spoke to the possibilities: “We have a chance, as we look at these capabilities, to rewire the company for the future, for growth, and for productivity.”

We have a chance, as we look at these capabilities, to rewire the company for the future, for growth, and for productivity.

Annie Baymiller, Owens Corning CIO

At the core of this reimagination, and what's making it all possible: a fresh lineup of AI innovations designed to transform how businesses work. The fundamental building blocks for a new era of enterprise technology, these solutions will enable smarter decisions, more efficient workflows, and a more productive workforce. They include:

  • Box Automate, an agentic workflow automation solution for orchestrating business processes with an intuitive drag-and-drop builder
  • Box Shield Pro, a security suite building on Box’s flagship content protection solution
  • Box Apps enhancements to help teams make faster, smarter decisions
  • ​​Box Studio enhancements empowering teams to add knowledge to custom agents

And coming soon: ​​​Box Extractwill power intelligent data extraction at scale with customizable, AI-powered agents. Box Extract takes the power of the Box AI Extract Agents that customers are using today for data extraction and incorporates advanced OCR, confidence scores, configurable fields and field properties, metadata policies, and more. Users will be able to build custom Box Extract agents and assign those agents to specific folders to automatically trigger extraction at scale for any files uploaded to those specified folders. Stay tuned!

In the meantime, read more about what other Box customers are doing with Intelligent Content Management.