Box Drive: How we made all of Box available on the desktop
Box has enabled countless businesses to not only keep their content secure and available everywhere, but has also enabled secure collaboration across corporate borders at a massive scale -- where users can create, co-edit, and share content. For many users, the desktop is their preferred place to work and where work gets done. So while we’re pretty big fans of the cloud at Box, we recognize that not everyone loves working from their browser. That's why we built Box Drive -- to bring the power of cloud content management from Box to the desktop.
Box Sync to Box Drive - A Historical Perspective
In order to understand how Box Drive improves the user experience and expands on the possibilities of what we can build on the desktop, it's important to first understand what makes traditional syncing difficult and how Box Drive takes the desktop experience to a whole new level.
Why is it Hard to Create a Syncing Product?
The engineers at Box can probably write a book on the numerous race conditions that can happen due to seemingly simple actions such as deleting or renaming a file! The main challenge that any sync product faces is that it has to reconcile differences between the local and remote actors -- and do so while not having much direct control over any of the actors.
Challenges in Designing a Typical Sync Product
Here are some of the more obvious challenges sync products typically face on just the local side of the architecture -- namely detecting local changes and applying remote changes locally:
- Detecting local changes is hard, especially when it's difficult or unreliable to get change events from the operating system. Sync products must occasionally scan some folders to maintain a snapshot of how the folder structure looks (file tree), and use this snapshot to compare it with the current snapshot to detect changes.
- These scans are costly, especially if the file tree is massive -- storing the snapshot consumes memory and opens the door to potential inconsistencies between the actual file tree in the user's folder and the snapshot. In addition, scanning is also time-intensive and delays the ability to propagate a local change to remote, as the scan must be completed first.
Enterprises can have lots of files and folders (a.k.a. a file tree)
- Applying remote changes locally is tricky as sync products must have a way to distinguish between changes made by the user and changes made by the sync application itself. Consider, for example, when a file was created remotely (say your colleague created a file in a shared folder). A naive sync implementation might faithfully apply this create in the sync folder, and that sync product might then detect this folder creation as a new change event that needs to be propagated to Box. Sync products must, therefore, have a mechanism to detect that a change was made by the sync product itself (and not, say, a word processor!)
A simplified version of how local changes are detected.
The Desktop team has put enormous effort (and used quite a few whiteboards) to design an engine that brings Box to the desktop, while solving edge cases that customers have seen, as well as solving issues that have yet to be seen! In addition to the numerous edge cases that the Desktop team had to solve, Sync had other limitations, such as having to mark items as "Synced" and only having those items appear in the Sync folder, and having these items fully downloaded locally, taking up valuable disk space.
Knowing about these limitations and understanding the complex needs of our enterprise customers, we at Box have always wanted to do something better. What if there was a way to somehow access all the files on a user's Box account without taking up much disk space locally? What if there was a way to only bring the content to the user on-demand? What if there was a way to get ordered, reliable change events as soon as they happen?
Virtual Filesystem, and the origins of Box Drive
A virtual filesystem (VFS) is an abstraction layer on top of the actual filesystem implementations (such as the default filesystem for your OS) that allows client applications to interact with these filesystems in a uniform way, using the same APIs -- In other words, applications such as Finder, Microsoft Office, etc. can use the same filesystem APIs standardized by the VFS layer, which then routes these calls to the appropriate filesystem implementations. Filesystems, therefore, have great flexibility on how the data is accessed and presented, as long as they implement a standard VFS interface. While most filesystems interact with a hard disk, the VFS layer allows for filesystems that get data from some other source! Here are some interesting filesystems that have been developed over the years:
- SSHFS: Presents files and folders in a user-specified directory just like a local 'Downloads' folder, except that these items are on a remote host.
- Gmail FS: Your folders are the subject of your email and the files in the folder are the attachments.
- Loopback FS: Mirrors some other user-specified directory, similar to creating a shortcut to another folder.
Box Drive and the Benefits of Creating our own Filesystem
Recall that in addition to being responsible for syncing changes between local and remote, the Sync Engine must also deal with the numerous challenges interacting with the local side, such as getting a reliable stream of events and applying remote changes locally without unwanted side-affects. Writing our own filesystem is a way to ease some of the Sync Engine's burdens such that it can focus on its primary responsibility -- to reconcile differences between local and remote content.
Creating a filesystem ourselves means that:
- We can get timely, reliable order of events from the operating system -- by implementing the filesystem calls ourselves. This means that scanning folders often is no longer necessary, and makes it possible to propagate local changes to Box much faster.
- In addition to being able to generate live, ordered events, implementing a filesystem allows for us to maintain the file tree data structures ourselves and have richer ways to access this data than through the standard filesystem interface.
Implementing a filesystem (left) means that Box Drive (right) now gets filesystem events reliably.
Remember when we said that traditional sync products must have a mechanism to prevent detecting changes that the sync product itself made? Being able to communicate with the filesystem means that it is trivial to distinguish between user-initiated changes and remote changes, as only the public filesystem actions through the VFS can be made to generate change events.
- We can fit "petabytes" of data in this file system (think INT_MAX), as the content doesn't actually have to be there all the time consuming disk space! In fact, we use an LRU cache for the file data to make it seem like the data that the user cares about was always present locally.
- We can download the file content or get the items in a folder on demand. By implementing the filesystem functions themselves, we can even make the application initiating the filesystem call wait while Box Drive seamlessly makes the necessary API calls on the fly.
Now that we have introduced the idea of writing a filesystem and some of its key benefits, we can now describe how Box Drive works in more detail.
Box Drive Architecture
The high-level architecture is quite simple; we need:
- A way to easily write a filesystem without having to think about Kernel code all the time.
- A Syncing Engine to reconcile differences between local and remote.
Part I: Developing a Filesystem
Designing a filesystem the traditional way is hard. Kernel code has less protections and a crash caused by a bug can lead to system crashes or disk corruptions. Instead, the Box Drive team chose a solution that breaks the filesystem into two parts -- one that lives in the kernel space and delegates filesystem functions to an application running in the user space, where most normal applications live. The kernel component of the filesystem is intended to be simple and extremely stable, while the more complex logic lives in the user-space application which can be used and updated like any normal application.
The diagram below highlights the difference between opening a regular folder on the users's computer vs opening the Box Drive folder:
The VFS abstraction provides a uniform API to interact with the files and folders and routes the filesystem calls to the appropriate filesystem implementation, based on the path. The BoxFS kernel extension registers itself to handle filesystem calls to the Box folder and delegates the filesystem operations to the Box Drive user-space application through an IPC mechanism.
Part II: A Syncing Engine
We have reused a lot of the code that was originally written for Box Sync, with few of the crucial modifications as follows:
- The folder scanning/change detection logic was reworked to accept change events directly from the filesystem.
- To execute remote changes locally, the Sync Engine now directly communicates with the local filesystem, which:
- Allows the Sync Engine to distinguish trivially between changes caused by the user and the remote changes applied by the Sync Engine itself.
- Lets the Sync Engine apply remote changes locally while preventing conflicting local changes.
While the high-level architecture sounds straightforward, the devil is in the details -- given the complex nature of writing a filesystem and modifying the Syncing Engine to work under a completely new paradigm, the engineering effort was quite monumental.
Looking Ahead
Although an increasing number of enterprises have embraced the cloud to store and manage their content, many users still use the desktop to interact with Box; after all, the desktop is where a large number of applications still live. Box Drive and the filesystem technology it is based on means that there are many futuristic and innovative possibilities that can significantly impact how work gets done on the desktop. When it's possible to load content on demand, "store" what looks like an infinite amount of data, and show an image upside-down every 10th time you open an image file (just kidding!), the possibilities are truly endless.
If you haven't already, give Box Drive a try!
Interested in working with us?
With millions of users using Box everyday, Box Engineering is always tackling interesting and complex problems. Our engineers develop next-generation technologies that help businesses manage their content better than ever before, transforming how millions work every day. Help the Box Engineering Team build the next big thing. Visit Box Careers to learn more.