Introducing Spout: A Fast and Scalable Way to Read and Write XLSX Files

As the world becomes more and more data-oriented, access to the right information is critical. Most of the tools we use to gather data are good at it, but they're not designed to process it. That's why specialized software—like Microsoft Excel or LibreOffice—is so popular. No matter where your data comes from, these tools make it easy to extract the important information from large data sets and visualize it.

Box is no exception. The Box Admin Console enables Box admins to manage a wide array of processes, workflows, content, and teams, powering productivity for entire organizations around the world. One of the key elements of this tool is reporting, which provides a comprehensive dashboard of critical information happening within a team or organization. CSV is the standard format for these reports because it is a simple, widely supported, and scalable format. But exporting this data is not always easy, and a tool many of our admins use to read these files is not compatible with CSV file encoding and instead defaults to ASCII. The result is that all non-ASCII characters are corrupted.

For instance, this is how a Japanese user might see such a report:

We wanted to provide Box admins with the best experience in exporting their important data, so we tried several different things to fix this issue. We couldn't find a suitable solution that didn't result in a degraded user experience. That's when we decided to stop using CSV and migrate our reports from another format: XLSX.

XLSX was the best choice for us for several reasons: it is widely supported, has better specifications than its ancestor (XLS only supported 65,536 rows per sheet, XLSX supports over one million,) and is based on Office Open XML (OOXML), a well documented but complex format. This is where Spout enters the picture.

Spout is the PHP library we created to read and write XLSX—as well as CSV—files. It mitigates the complexity of working with OOXML, regardless of the dataset’s size, with an easy-to-use API. With Spout, for example, this is how an XLSX file can be read:

$reader = ReaderFactory::create(Type::XLSX);
$reader->open($filePath);

while ($reader->hasNextSheet()) {
   $reader->nextSheet();

    while ($reader->hasNextRow()) {
        $row = $reader->nextRow();
        // do stuff
    }
}

$reader->close();

The main challenge in moving from from CSV to XLSX is scale. While it's easy to scale with CSV files, it's harder to read or write XLSX files containing millions of rows. Unlike other libraries, Spout has been optimized for scale. It is able to read and write any file, quickly and with very little memory consumption.

Today, we are open sourcing Spout to hopefully help other developers solve the problems we faced.

The library is available through Composer—it just requires “box/spout”. You can find more details on the Github project page.

Tags: php, php-library, api, xlsx, csv