While everyone's racing to implement AI, most organizations are discovering an uncomfortable truth: Their AI is only as good as their worst data.
“ROT within the enterprise represents 40% to 50% of all information,” says Deloitte Digital Content Advantage Practice Leader and Principal Mike Carlino. By ROT, he means redundant, obsolete, and trivial data.
To understand why enterprises end up with so much ROT, think about the way information sharing tends to happen within the average organization. Carlino describes a typical scenario: “If you want to send a document to a whole bunch of people, you could do it a couple ways. One is to create a link to the source document, which is the way you’re supposed to do it. But quite often, that doesn’t happen. Instead, you create an attachment, and it goes out to maybe ten people, a hundred people, a thousand people.”
And then it gets forwarded. And forwarded again. Soon, you have dozens of versions floating around, and your AI has no idea which one to trust, so it uses them all. Even if you’re using a cloud-sharing solution instead of email, without easy link-sharing, people tend to download files and then upload them again with changes. This creates a lot of conflicting data.
Beyond document redundancy and version-control issues, every organization has a lot of outdated unstructured data in its archives — expired drafts, confusing chat logs, meeting notes that lack context, non-business content — all of which Carlino points to as ROT, and all of which can ruin your AI efforts.
This phenomenon isn’t new to AI. In fact, it’s a computer science principle many learned but later forgot: Garbage in, garbage out. Or as folks used to call it, GIGO.
The harsh reality: Nearly half of your company's data is garbage, and that garbage may be poisoning your AI initiatives. When you’re trying to scale an AI pilot, GIGO becomes a real issue.
Shiny objects don’t scale with garbage data
For the past two years, enterprises have been chasing what Carlino calls “shiny objects” — impressive AI demos that work perfectly in controlled environments. But when these proofs-of-concept hit the real world of messy enterprise data, they experience failure to thrive.
A team builds an AI pilot using a carefully curated dataset. It works beautifully. Leadership gets excited. Then they try to scale it across the enterprise, and suddenly AI is reading outdated contract versions, pulling salary information it shouldn’t access, and making decisions based on documents that should have been deleted years ago.
“We've spent the last two years creating those shiny objects, saying, ‘Wow, isn't this cool?’” as Carlino describes the general enterprise approach to AI. “But when you do it in an isolated way, you can do a lot of damage. If you take a proof of concept leveraging a data set that’s really small, then try to scale it up to production, you’re suddenly relying on data which is literally garbage.”
Sifting through the garbage to find the gold
To solve the problem of content ROT, organizations should focus on establishing a strong foundation of data readiness and governance. The sifting process begins with understanding the content as it exists across the enterprise and implementing strict governance policies to determine who should have access to what data.
In most organizations, the process looks something like this:
- Start with the right content foundation: As a basic step, your content should be held on a centralized platform that’s governed, secure, and compliant, with granular permissions handling so AI can never touch content it’s not supposed to
- Implement a foundational governance layer to determine which versions of a document are authoritative and ensure that AI agents don’t inadvertently surface obsolete data or sensitive personal information like salaries
- Create a common framework of definitions and categories so that structured data (like databases, spreadsheets, and metadata) and unstructured documents (like contracts, emails, and reports) can be understood and connected by the same systems, turning information that was previously invisible or siloed into intelligence that can actually be acted on
- Build integrations to the right content using secure connectors and APIs rather than embedding data directly into the AI — so governance policies travel with the content
By sifting through the 90% of enterprise information that’s unstructured and applying cohesive governance to all content, enterprise organizations can create what Carlino describes as “a common ontology that both structured and unstructured can talk to, that becomes another layer of cleaning up the foundation.”
From here, businesses can transition from isolated proofs of concept to reliable, transformational production environments.
The surprising winners in the AI race
Some industries have an easy advantage when it comes to applying AI to their enterprise content because they inherently work with very controlled data. In financial services, for instance, AI is already transforming document-heavy processes like loan origination and claims processing based on inbound data that tends to have a particular format.
For insurance companies, claims policies are formatted. For loan origination, applications are consistent, and you know what to expect in terms of the types of documentation that comes in. Carlino confirms, “If you know what doc types to expect, using the technology, you can say, ‘based on these doc types, I'm going to be looking for this information.’”
Banking, insurance, and adjacent firms can automatically extract and ingest information into large-scale systems, and a structured approach to data extraction allows for predictable management and automation, leading to substantial efficiency gains across regulated sectors.
These businesses have a clear understanding of the information they receive from customers and clients, and because they know exactly what document types to expect, they can use technology to:
- Identify specific document types and the information contained within them
- Extract data automatically using specialized tools
- Ingest the information into large systems without manual intervention
Beyond financial services, other companies are discovering how to apply AI to predictable data. In manufacturing, companies are using AI to predict equipment maintenance and extend asset life by combining structured data (like sensor readings) with unstructured documentation (like maintenance reports and technical manuals).
The key to AI success, for any company: Start with controlled, predictable data flows.
The path forward for AI across industries
Addressing content ROT is not a one-time snapshot but an ongoing commitment to understanding the living, breathing lifecycle of documents. The enterprises that succeed with AI will be the ones that first do the unglamorous work of cleaning up their data, implementing proper governance, and focusing on specific, measurable outcomes.
To hear more from Mike Carlino, watch the Box Partner Podcast episode AI-ready content: Governance, integration, outcomes featuring Deloitte or browse more Box Partner Series episodes.

