First look: Gemini 2.5 Pro and Box AI

Today, we’re examining how Google’s latest model, Gemini 2.5 Pro, which is available today in Box AI Studio, performs against real-world enterprise challenges. Our latest Box AI Enterprise Eval specifically tested Gemini’s reasoning capabilities and how they can be applied to data extraction tasks which is a major sticking point for organizations today.

Enhanced reasoning delivers higher accuracy.

Modern enterprises rely heavily on data embedded within a wide range of documents. Intelligently extracting this crucial data with high accuracy goes beyond simple pattern recognition. It requires sophisticated reasoning to truly understand context, and make the connection of information across sections of a document, and then make a logical deduction of meaning. Mastering this capability unlocks significant value and drives efficiency in critical business processes.

This is where Gemini 2.5 Pro demonstrates a marked improvement. It performs significantly better on tasks that require deeper analysis rather than just identifying the first potential answer. Consider the complexity of deducing specific dates within a contract, where the correct answer might depend on interpreting clauses spread across multiple sections. Gemini 2.5 Pro shows an enhanced ability to navigate this complexity, taking the necessary steps to reason through the content and arrive at a more accurate result.

First look: the Box AI evaluation of Gemini 2.5 Pro

Gemini 2.5 Pro excels in complex logic and reasoning

In our internal testing, we saw some of Gemini 2.5 Pro's advancement firsthand. Gemini 2.5 Pro scored 3 percentage points better overall than its predecessor, Gemini 2.0 Pro, in terms of overall accuracy. Crucially, the performance gain is even more pronounced in areas requiring complex logic embedded within the document:

Complex clause interpretation: Many of the fields where we saw Gemini 2.5 Pro’s gains represent specific, often standardized, legal or business clauses (e.g., Governing Law, Anti-Assignment, Audit Rights, Most Favored Nation, Exclusivity, Source Code Escrow). Accurately extracting information about these requires understanding the specific language, structure, and implications of these clauses within the document, going beyond simple keyword searches.
Temporal reasoning & calculation: Several categories where Gemini 2.5 Pro shines involve dates or time periods (Effective Date, Expiration Date, Warranty Duration). Extracting these accurately often isn't just finding a single date string; it might require understanding context (e.g., "effective upon signing," "30 days after completion"), calculating end dates based on start dates and durations, or resolving relative date references.
Identifying specific conditions, rights, or obligations: Fields like Audit Rights, Third Party Beneficiary, Exclusivity, Warranty Duration, and Anti-Assignment often detail specific conditions, rights granted to parties (or non-parties), or particular restrictions and obligations. Identifying these accurately requires understanding the nuanced language defining these terms within the document.

Accuracy is non-negotiable

For enterprise AI applications, mistakes simply aren't an option. Ensuring compliance in finance, managing patient data correctly in healthcare, or protecting intellectual property in life sciences, all depend on accurate information. Because the stakes are so high in these areas, data errors can lead to serious problems. The level of accuracy from Gemini 2.5 Pro, driven by its enhanced reasoning, can be the deciding factor in whether AI can be confidently deployed for critical workplace use cases. It unlocks the potential for AI to reliably tackle more sophisticated, high-value enterprise workflows.

Get started today with Gemini 2.5 Pro, now available in Box AI Studio and with Box AI APIs.