90% of your data is unstructured — and it’s full of untapped value

It’s no secret that AI is the hot topic these days. It has an inherent ability to extract profound value from content — pages-long customer contracts, in-depth employee handbooks, sales presentations, and splashy product launch videos. But before this intelligence can be tapped into and extracted, you have to take a step back to look at content, or “unstructured data,” from a strategic viewpoint. Building on exclusive research from the Box-sponsored IDC white paper, Untapped Value: What Every Executive Needs to Know About Unstructured Data, let’s develop the key to unlocking the vault of content.

What is unstructured data?

Unstructured data includes all of the files your teams work with on a daily basis, and it comes in many forms — documents, PDFs, videos, images, audio clips, and more. This kind of information, otherwise known as content, can’t be loaded into a database or put into rows and columns (it wouldn’t make sense), which means it is more challenging to organize and analyze.

Structured data, on the other hand, is “tabular,” and can easily be put into a database via columns and rows or flat files. Think transactional information, sensor data, customer lists, financials, and employee records — which can be labeled, tagged, filtered, and sorted.

What you may not know is that most company data fits squarely into the first category: 90% of data is unstructured.* And according to research from IDC, organizations globally are predicted to generate over 73,000 exabytes of unstructured data in 2023 alone.* To put this in perspective, if you started a video call 237,823 years ago and ended it today, you would have recorded a single exabyte.

Your unstructured data is business critical at every level of a company, from boardroom strategy to daily operations. Introduce AI to unstructured data, and you have the key to revealing a treasure trove full of important and meaningful insights.

The emergence of “big data” required many organizations to develop a fairly well-defined strategy for creating and managing structured data, so why not have the same for unstructured data?

You can’t have an AI strategy without a content strategy

The “content problem” is not new, but it’s not getting the attention it deserves. Now that GenAI has arrived, the pressure to get all that unstructured data under control is suddenly a lot higher. IDC found that only 26% of companies are using mostly automated methods to analyze their content (with people only handling exceptions).*

IDC Unstructured Data Report

GenAI holds the potential to analyze and synthesize massive amounts of unstructured data at scale, making it possible to not only tap into a goldmine of information, but refine, shape, and polish it for the first time. Whether it’s identifying risky clauses in a contract or drafting a press release, GenAI can transform how (and how quickly) businesses get work done. In fact, through their research, IDC identified the top AI use cases: creating new content faster (33%), automating idea generation (31%), better chatbots for customer interaction (30%), and recommending related, relevant content (30%).*

Of course, there are challenges on the way to effectively using GenAI. IDC found that 49% of organizations worry about releasing proprietary content into LLMs, and almost half (47%) are unclear about intellectual property rights around content used to train those LLMs.* 41% of organizations are also concerned about managing employee perceptions about automating existing jobs.* Here at Box, we understand those challenges and are committed to being transparent about our approach to AI, as you’ll see in our AI principles.

While IDC found that 84% of businesses are already using or exploring AI,* one thing remains clear: given that LLMs are trained on unstructured data, IT leaders can only leverage the power of AI once they have a strategy to manage and secure their data on a single platform. For this reason, companies with a centralized approach to content are at an immediate advantage when it comes to harnessing the power of intelligence.

The pitfalls of underinvesting in unstructured data

Despite the eye-watering amount and inherent value of unstructured data (and the promise that GenAI holds), IDC research found that only 44% of organizations can justify spend on unstructured data.* Even though 90% of data is unstructured, a disproportionately smaller amount of IT spend is allocated toward managing it.* IDC findings show that 40% of tech spend is allocated toward unstructured data, whereas 60% is spent on structured data.* This inverse ratio — more money spent on the much smaller percentage of content — means there is massive, unrealized value and pools of inherent risk at a time when every dollar counts and security threats increase at an exponential rate. In other words, underinvestment in unstructured data will end up costing businesses more in the long run.

Failing to centralize content is risky and inefficient

In the absence of a way to centrally manage unstructured data, content gets spread across systems, which has led to content silos and sprawl for 50% of organizations, according to IDC.* In fact, IDC research shows that the average employee uses 37 tools for daily work, and 70% of those are used for unstructured data.* Amidst all this complexity (and, commonly, tech bloat), teams spend precious time searching for the information they need — or even worse, replicating content such as slide decks, project plans, and operating procedures. IDC found that 22% of this type of content is replicated because people either can’t find it or don’t know it exists.* Think about that for a second: 22% of content being created is unnecessary. And once it is created, 41% of companies surveyed said that less than half of it is ever reused.*

IDC Unstructured Data Report

Centralizing content is the key; however, for this to work, the platform chosen must enable people to use familiar (sometimes beloved) applications, but keep the actual output of them safe in one place. Centralizing content greatly simplifies the complexity most organizations find themselves mired in. Here’s a more inspiring IDC finding:

IDC Unstructured Data Report

But productivity loss is only one side of the coin. Siloed and sprawled content leaves businesses susceptible to massive security and compliance risk because when unstructured data is decentralized, it’s nearly impossible to protect the content that’s at the heart of business-critical workflows. Over half (51%) of the businesses surveyed by IDC reported non-compliance with data regulations in the past 12 months,* exposing them to financial, reputational, and legal risk.

The reality is that content is filled with confidential, sensitive, and proprietary information, and neglecting to manage and protect it is a risk companies can’t afford to take, especially for those in highly regulated industries like financial services, life sciences, and the public sector. And if you happen to be thinking, “Well, all the really confidential and valuable data we have is locked down in a structured format,” consider this: Structured data is frequently downloaded for use in an unstructured format.

Businesses can’t afford to wait

It’s clear that centralizing unstructured data should be a top priority for IT leaders everywhere. IDC’s research backs this up, as more than half of respondents noted that implementing a unified, governed, secure, accessible unstructured data platform would have a positive impact on key metrics like cost and innovation (92%), as well as security (80%).* These are important points given that IDC estimates a security breach would cost nearly $4.5M in 2023.*

Failing to manage your unstructured data results in fragmentation of your content, application sprawl, lost productivity, and, most of all, real business risk. Organizations simply can’t afford to wait — those who prioritize, manage, and secure unstructured data will gain a distinct advantage, and those who don’t will be left behind.  

If you want to learn more, I invite you to read the full Box-sponsored IDC white paper: Untapped Value: What Every Executive Needs to Know About Unstructured Data.

All IDC data in this blog post comes from the following white paper:

*Source: IDC White Paper, Sponsored by Box Inc., “Untapped Value: What Every Executive Needs to Know About Unstructured Data,” Doc. US51128223, August 2023