Today, Google released the preview mode of Gemini 3.1 Pro, and we’re excited to share the results of our latest model evaluation, comparing Gemini 3.1 Pro against its predecessor, Gemini 3 Pro.
Gemini 3.1 Pro delivers a 6 percentage point improvement in overall accuracy, jumping from 61% to 67% across challenging reasoning tasks. For enterprises relying on AI to analyze documents, draft reports, and extract insights, this upgrade translates to more reliable outputs and fewer errors where it matters most. Gemini 3.1 Pro shines for tasks that required deep analysis of document contents, complex calculations, and nuanced interpretation.
Where the model shines

The Box AI Enterprise Eval directly evaluates the models ability to respond correctly based on supplied documents in a single attempt. We used a comprehensive dataset spanning complex question and answering tasks that require heavy reasoning on document contents—the kind of real-world challenges that enterprises face every day. To ensure precision in our assessment, we scored these tasks across many trials against weighted rubric items.
On a subset of the data by Industry:
- Healthcare & Life Sciences saw a remarkable 20 percentage point gain (47% → 67%)
- Legal improved by 17 percentage points (57% → 74%)
- Technology gained 8 percentage points (49% → 57%)
Real-world wins

Healthcare data analysis
Gemini 3.1 Pro achieved a remarkable 20 percentage point gain (47% → 67%) in this sector, driven by superior arithmetic precision.
During a neonatal clinical data analysis task, the model demonstrated an ability to handle complex statistical noise that tripped up previous versions. It correctly calculated Relative Percentage Differences (RPD) for hematological parameters (e.g., pinpointing a 0.72% difference for lymphocytes) and accurately computed Standard Deviations for patient groups.
For researchers and clinicians, this means Box AI can now be a more reliable partner in drafting reports from raw clinical data, reducing the manual burden of verifying basic statistical claims.
Legal report drafting
In our evaluation, Gemini 3.1 Pro exceeded previous benchmarks for Legal use cases, improving accuracy by 17 percentage points (57% → 74%).
In a complex due diligence task involving privacy rights and property construction, the model’s improved logic was on full display. The task required drafting a legal memorandum assessing whether a specific building modification constituted an unlawful privacy violation. While the previous model incorrectly flagged a violation, Gemini 3.1 Pro correctly applied a "directionality test," reasoning that because the neighbor’s own wall modifications created the visual access, no liability existed
This level of nuance — understanding why a fact matters, not just that it exists — is critical for legal teams that use Box to automate contract review and memo drafting.
Beyond specific industries, Gemini 3.1 Pro showed broad improvements across our core use cases:
- Report drafting from data: Achieved the highest accuracy of any category at 72% (up from 67%).
- Data analysis: Improved to 65% (up from 57%).
- Expert review: Rose to 58% (up from 55%).
Now available in Box AI Studio
Gemini 3.1 Pro is now available in Box AI Studio as a Beta model, giving you the power to build custom AI agents with this more capable model. Whether you're analyzing healthcare data, drafting legal documents, or building workflows for any industry, you can now leverage these improvements directly in your enterprise content workflows.




