Agentic AI solves content classification in an accelerating world

Modern organizations generate enormous volumes of unstructured content every day, appearing as product specs, customer records, marketing drafts, and in countless other forms.

The velocity and variety make consistent, accurate classification a significant challenge, requiring either time-intensive manual processes or automated solutions that can struggle with handling content lacking specific identifiers. Without reliable classification, an organization loses the ability to find and protect the unidentified sensitive content, leading to greater risk of data leakage and mishandling of content.

And we have seen what falling behind on content security can result in, with data breaches being responsible for everything from the inadvertent sharing of patient medical records to the release of entire movie scripts for blockbusters well ahead of release. The human element consistently plays a major role in these incidents, showing up in ~60% of breaches, and classification labels (along with the security policies they carry) represent the largest check on these threats.

But these aren’t new issues for organizations, and teams have found ways to mitigate and overcome challenges in managing content — at least so far.

Stay on top of sensitive content with deterministic automated classification

At Box, we have Automated Classification as a part of Box Shield, a solution that uses machine learning to detect specifically defined identifiers such as credit card numbers or social security numbers, and applies classification labels accordingly. This deterministic approach can accurately and quickly apply labels to sensitive content that fits the specific parameters, and those controls carry access policies, watermarking policies, and retention/disposition policies. Automated Classification streamlines the entire content security and governance lifecycle, helping organizations manage enormous amounts of content and minimize the risk of data leaks. In the past year alone Box has helped customers classify over 7 billion files.

But what about the content that doesn’t contain those pre-determined identifiers? There is plenty of content that could be seen as sensitive but lacks a clear, predictable identifier to look for, or that might have a more nuanced level of sensitivity. Content like memos and meeting notes might carry key strategic information, but not trigger a deterministic solution, and this grey area of sensitivity represents a huge swath of content. The risk associated with this hard-to-classify content is significant, and it isn’t going away.

Box Shield Pro AI Classification Agent brings context-driven classification

Coming in December, Box will be releasing our new Agent-powered content security solution in Box Shield Pro, with one key component being our AI Classification Agent. This new capability enhances our customers’ abilities to automatically classify content using generative AI, and vastly expands the amount of content that can be intelligently assessed and assigned a classification label.

AI Classification Agent helps answer questions like, ‘What’s this document about?’ and ‘How should it be handled?’, and then use that insight to apply more deliberate, nuanced classifications (something really difficult to do with traditional methods). This Agent looks at the MVPs of the content, the Meaning/Value/Purpose, determines how sensitive the content is and who should have access to it, then assigns one of the defined Box Shield classification labels that best fits the content.

To ensure the classifications are accurate and fit the organization’s definitions of sensitivity, admins are able to write a definition in plain language for each classification label. For example, they might say that Confidential content should include “any substantial description of future financial strategy”, or have Internal apply whenever “content is related to employee training”. Admins can be as detailed or simple as they like, and can easily test the definitions within the platform on a small subset of files, enabling them to tinker with and refine their definitions.

Once implemented, these labels will apply automatically in the same manner as Automated Classification, throughout the organization. We will also be including a summary of why the model determined that the chosen label was correct for a piece of content, that is easily viewable within the preview screen for the content. This explanation serves a few purposes:

Helping admins further fine-tune their AI classification prompts
Giving users a better understanding of the reasoning behind classification labels
Supporting Box’s commitment to the principle of transparency in AI

Building on a secure foundation

Classification is the foundation of modern content security, since without accurate labels carrying enforceable controls, content risks being exposed or leaked inappropriately. By building on Box’s existing capabilities with the new AI Classification Agent, we are ensuring that the challenge of classifying mountains of content is never a blocker for our customers when it comes to securing their most critical content.

AI Classification Agent is releasing as part of Box Shield Pro, launching early this December. If you have any questions or would like to see a demo in action, please reach out, or join the community discussion on our Security and Governance forum.