Gemini 2.5 Flash Delivers Enhanced Document Q&A and Extraction with Box AI

Last week, Google launched Gemini 2.5 Flash, a model designed for rapid processing and enhanced reasoning. Today, we’re evaluating its performance on critical enterprise tasks, shifting our focus to its capabilities in document question-answering, a vital function for unlocking insights from enterprise content.

Our Box AI Enterprise Eval process reveals significant advancements in Gemini 2.5 Flash's ability to accurately understand and respond to queries based on both single and multiple documents, alongside strong performance in complex data extraction. Gemini 2.5 Flash is a powerful and efficient option, offering substantial improvements over its predecessor 2.0 Flash, particularly in comprehending and synthesizing information for Q&A.

Gemini 2.5 Flash Delivers Enhanced Document Q&A and Extraction with Box AI

Gemini 2.5 Flash's effectiveness on demanding enterprise tasks

Accurately answering questions based on dense business documents requires more than finding keywords; it demands strong comprehension, synthesis, and reasoning. As part of the all new Box AI Enterprise Eval, we tested Gemini 2.5 Flash across Single Doc Q&A, Multi-Doc Q&A, Data Extraction across the most complex components of the CUAD data set, and the full Data Extraction data set*

Here’s what we found:

Significant Gains in Document Q&A: The most notable improvements for Gemini 2.5 Flash appear in question-answering tasks. Compared to Gemini 2.0 Flash, the new model demonstrates enhanced comprehension and ability to surface accurate answers.
- On our single-document Q&A evaluation, Gemini 2.5 Flash achieved an Answer Recall Score of 80.06, marking a significant 3.6-point improvement over Gemini 2.0 Flash (76.43).
- In multi-document Q&A scenarios, which require synthesizing information across multiple sources, Gemini 2.5 Flash scored 78.79, representing a 2.5-point bump compared to Gemini 2.0 Flash (76.3).
- These gains highlight Gemini 2.5 Flash’s improved capacity for understanding user queries and retrieving relevant, accurate information from complex content repositories within Box.
Strong Supporting Extraction Performance: While Q&A shows marked improvement, 2.5 Flash also maintains high quality on complex data extraction tasks, which often rely on similar reasoning capabilities. On the CUAD subset focused on hard extraction tasks which we’ve used across our previous model evals, Gemini 2.5 Flash achieved 82% correctness, placing it slightly ahead of Gemini 2.0 Flash (81%), demonstrating robust underlying reasoning.
Advanced Reasoning Underpins Performance: 2.5 Flash's ability to handle complex extraction fields requiring multi-step reasoning (like calculating dates or interpreting specific legal clauses) likely contributes to its improved performance in understanding and answering nuanced questions based on the same documents.

These results indicate that Gemini 2.5 Flash offers a significant upgrade, particularly for Q&A use cases, while maintaining high standards for complex data extraction, making it a versatile asset for interacting with enterprise content in Box AI.

Why Gemini 2.5 Flash is effective for enterprise tasks

The strong performance of Gemini 2.5 Flash, particularly its advancements in document Q&A, translates into tangible benefits for organizations using Box AI:

More Accurate Answers: The measurable improvement in Q&A recall means users get more reliable and comprehensive answers when querying their documents, reducing the need for manual fact-checking.
Deeper Insights: Enhanced ability to synthesize information across multiple documents allows users to quickly grasp key themes, trends, or connections previously hidden within large volumes of content.
Speed and Efficiency: As a Flash model, it delivers these improved Q&A capabilities with lower latency compared to larger models, accelerating research and decision-making.
Cost-Effectiveness: Optimized for speed and efficiency, it offers a potentially lower cost per query, making sophisticated Q&A scalable across the enterprise.
Reliable Quality for Q&A and Data Extraction: Its proven ability to accurately answer questions and effectively extract complex metadata ensures reliability for compliance, research, risk management, and data governance use cases.

Choosing the Right Gemini Model for Your Needs

The Gemini family offers flexibility within Box AI:

Gemini 2.5 Pro: Remains the top choice for maximum reasoning power on the most complex tasks, evidenced by its leading extraction scores (82% correctness on hard CUAD) and strong Q&A performance (80.32 single-doc recall).
Gemini 2.5 Flash: Provides an excellent balance, closely behind 2.5 Pro on hard extraction (80% correctness) while showing clear improvements over 2.0 Flash in single-doc (80.06 recall) and multi-doc (78.79 recall) Q&A. Its tunable "hybrid reasoning" adds flexibility. Ideal for demanding Q&A and extraction tasks needing speed and quality.
Gemini 2.0 Flash: An established workhorse for speed and efficiency. While effective for simpler tasks, it scores lower on Q&A recall (76.43 single-doc, 76.3 multi-doc) and hard extraction (77% correct) compared to 2.5 Flash in our latest evaluations.

Evaluate your specific needs for Q&A complexity, required accuracy, speed, and budget to select the best Gemini model.

Get started today

Gemini 2.5 Flash represents a compelling option for enterprises, offering notable improvements in document Q&A alongside strong extraction capabilities, speed, and efficiency. Its enhanced performance, particularly the ~3.5 point gain in single-doc Q&A recall and ~2.5 point gain in multi-doc Q&A recall over 2.0 Flash, demonstrates its readiness for demanding, real-world content challenges. Unlock the potential of fast, high-quality AI for your enterprise content. Gemini 2.5 Flash is available today in Box AI Studio and via the Box AI APIs.