GPT-5, now in Box AI: A new benchmark for understanding complex enterprise data

Today, OpenAI released its latest model, GPT-5, which is now available for Enterprise Advanced customers in Box AI Studio and via Box AI APIs, and will be rolling out to all versions of Box AI shortly.

Our initial testing found that GPT-5 exhibits a more intuitive grasp of complex logic, making it exceptionally capable for demanding enterprise tasks. It demonstrates a deeper understanding of unstructured data and excels when faced with multi-step analytical questions. GPT-5 acts as your ideal thinking partner. For enterprises, this translates into more sophisticated and reliable automation, from identifying nuanced risks in legal agreements to performing complex, on-the-fly calculations within financial reports.

A new benchmark in understanding complex business information

One of the most important tests of an AI model, and its impact on the enterprise, is its ability to make sense of vast, unstructured data: the dense contracts, reports, and presentations that drive business yet are difficult to analyze. GPT-5 demonstrated an increase in performance for data extraction, which allows organizations to unlock valuable insights and enable workflows from their content.

GPT-5 data extraction accuracy of 90%, showing a significant increase over GPT-4.1

GPT-5, now in Box AI: A new benchmark for understanding complex enterprise data

We evaluated GPT-5 and GPT-4.1 on our proprietary extraction challenge set, which spanned over 8,000+ fields across a diverse range of text, image, and multi-modal document types such as contracts, research papers, transaction files, and government identification. The dataset was purposefully designed to stress-test a broad spectrum of model capabilities, such as structured data extraction, complex reasoning over dense text, parsing unstructured formats, interpreting concise, high-signal content, and handling multimodal inputs.

Overall, GPT-5 consistently outperformed GPT-4.1 across most categories. There were notable gains in table parsing accuracy, cross-document contextual reasoning, and adaptability to document structure. For example, when asked to extract information that was not readily available in documents or images, GPT-5 inferred accurate answers from granular details, such as disparate citations and references across research papers, where previous models would either not return a response or provide incorrect answers.

These improvements reflect GPT-5’s enhanced long-context comprehension, stronger layout awareness, improved mathematical reasoning and more advanced multimodal grounding—enabling more precise and reliable extraction across varied, real-world content. On our hardest data set, GPT-5 improved its ability to accurately extract specific content correctly by 5 percentage points over GPT-4.1. This represents a significant leap forward in content understanding, particularly on our most challenging and high-complexity dataset.

GPT-5 showed a significant improvement on long content, achieving 9 ppts accuracy over GPT-4.1

Content in the enterprise comes in all shapes and sizes — from short snippets like emails and notifications, to medium-length documents such as resumes, salary slips, or receipts, all the way to large, complex files like contracts, research papers, and policy manuals. On long-form documents specifically, GPT-5 showed a significant improvement of 9 percentage points over GPT-4.1. This is a promising lift in performance — extracting insights from lengthy, dense content remains one of the most tedious and time-consuming aspects of knowledge work. These gains demonstrate GPT-5’s enhanced capabilities in long-context reasoning, multi-paragraph synthesis, and maintaining accuracy over extended spans — unlocking real value in use cases that require deep understanding of structure, logic, and narrative flow.

Excelling at the hardest enterprise questions

When we pushed GPT-5 with complex, multi-step questions, we saw some impressive performance from this new model. We asked it to reason across dense financial reports, technical manuals, and multi-clause legal agreements. The results were clear: GPT-5's superior logic establishes it as a premier "thinking model" for the enterprise.

Consider this scenario we tested, designed to push the boundaries of financial reasoning: “If Adam Stein's account earned interest at the same daily rate throughout November 2019, and he wanted to make a single deposit on November 25th that would exactly offset all Amazon purchases and bring his final month-end balance to exactly $40,000, what amount would he need to deposit?”

To answer, a model must not only extract multiple figures but understand their relationship, calculate a daily rate, project a balance, and work backward to find the solution. GPT-5 flawlessly navigated this complex chain of logic to provide the correct answer, a task where previous models have struggled.

Our observations highlighted GPT-5’s intuitive strengths:

Sophisticated mathematical reasoning: The model showed a powerful ability to deconstruct a financial problem, perform the necessary calculations, and synthesize a new answer, showcasing an almost intuitive grasp of mathematical concepts.
More effective data extraction: In our tests, GPT-5 proved more adept at large-scale metadata extraction. Its reliability is a cornerstone for automating enterprise-wide processes like contract analysis and data governance.
Deeper contextual logic: Qualitatively, we found GPT-5 is better able to hold more context in mind and apply intricate logic to a problem. This translates directly to fewer errors and more dependable answers for users.

Why GPT-5 is a game-changer for enterprise tasks across industries

This new level of reasoning unlocks tangible benefits and more sophisticated use-cases across any industry. Here are just a few examples:

Financial Services: Go beyond simple data retrieval. Ask complex, multi-part questions about financial reports, and trust that GPT-5 can perform the necessary calculations to provide a synthesized answer. It can handle on-the-fly data verification and cross-document analysis to generate net-new insights.
Legal: Confidently automate the analysis of critical data across thousands of documents. GPT-5 can adeptly identify not just key terms, dates, and clauses, but also the potential risks and obligations implied by them, which is crucial for compliance audits and risk management.
Retail and CPG: Derive deeper, more nuanced insights from consumer trend reports and supply chain documents. The model’s ability to synthesize information from multiple sources—like market research, focus group transcripts, and sales reports—helps teams make faster, more insightful decisions by summarizing key themes and sentiment.
Technology and Engineering: Achieve higher analytical precision on technical specs, scientific papers, code samples, or engineering documents. The model is able to identify and clarify critical ambiguities within your sources to ensure conclusions are based on the correct information, delivering insights you can trust.

Get started with Box and Open AI’s GPT-5

The advancements in GPT-5 represent a significant step toward more intelligent enterprise AI. This enhanced reasoning translates to practical business outcomes: faster, more accurate document analysis and truly reliable knowledge extraction. With GPT-5, businesses can implement AI solutions that process, analyze, and understand content at scale, meeting the high standards required for mission-critical operations.

Ready to get started? Try GPT-5 today in Box AI Studio and with the Box AI APIs.