The Technology Behind Box Notes

It has been an exciting journey getting Box Notes, our content creation service, rolled out to millions of users. Yesterday, we continued on that journey by introducing Box Notes for mobile. Box Notes is a collaborative editing service built on the Box platform. It is used in enterprises to serve business use cases around collaboratively taking notes in meetings, creating status updates, writing product designs, creating marketing plans and capturing customer interactions. There were a handful of challenges we needed to address while building the Box Notes service, such as allowing simultaneous editing by multiple users, resolving merge conflicts automatically and showing users where other users' cursors were in the document. Further complexity stemmed from merging these concurrent user changes on both server and browser and updating all collaborators in real-time. This need for real-time data synchronization created latency challenges that are exacerbated with scale. In addition to the specific characteristics of a collaborative editing service, there were requirements related to service reliability. The service needed to be highly scalable, supporting a large number of concurrent edits on a large number of documents. All services provided by Box are built to be scalable and fault tolerant. This meant taking additional precautions to ensure high availability and no data loss right from day one. We needed to build a resilient architecture that could handle all these demands. Architecture We engineered the Node.js and HBase-based service shown in the diagram to enable note-taking for mobile and webapp users. This architecture consists of several layers. First is the DNS layer that routes users to a specific datacenter. Underneath our DNS lies A10 hardware load balancers that route requests between Box and the Box Notes service. When an incoming request is identified to be Notes-related, it is routed to one of the Box Notes servers via Nginx load balancers. We added HAProxy to Nginx to perform active server health checks. Since each physical Notes server has multiple cores and Node.js is single threaded, we obtained better utilization of these servers by running multiple Node.js instances per Notes server. This topic is discussed at length in a previous blog post. notes-architectureNode.js for Operational Transformation We found Node.js interesting because our use case required deploying a change merge algorithm called Operational Transformation (OT) on both server and browser. Operational Transformation has been well understood in academia for several years and it has a few implementations, one of which is in the Etherpad Lite library. We went through extensive modification of Etherpad Lite to use it in the Box Notes service. Hosting this core JavaScript library using Node.js enabled the consistency and modularity of the complex algorithm for server and browser reuse. Managing Node.js clusters using Helix and Supervisor To power this architecture at scale, we implemented a horizontal set of Node.js instances. We learned that user requests cannot be purely round-robined across these Node.js instances because of the need for in-memory merge conflict resolution. Changes related to the same note are required to be routed to the same Node.js instance. We use an Apache open source cluster management framework called Helix to balance the need for thousands of notes edited in real-time, potentially by many users, against the need for all changes in a note to be routed to the same Node.js instance. One of the advantages of Helix is automatic reallocation of resources when cluster membership changes. For Box Notes, Helix enables what we refer to as region management, where each region consists of a set of notes. Helix balances the load on each Node.js instance by automatically distributing regions across the cluster. User requests are routed to instances depending on the note being requested and the region it is in. In the case of an instance removal or cluster expansion, Helix redistributes notes to all active instances, thereby always ensuring the load is balanced across all Node.js instances. Internally, Helix relies on an Apache open source component called ZooKeeper, which keeps each Node.js instance updated about its region ownership and state of peer instances in the cluster. We deployed a process control system called Supervisor to manage the uptime of each Node.js instance. Supervisor monitors each Node.js instance and restarts crashed instances automatically. Since Supervisor manages each instance as a sub-process, the restarts are instantaneous. Thus the combination of Node.js with Helix and Supervisor provides an architecture designed for failure and isolation. Nicholas Zakas's earlier post provides more information about our use of Supervisor. Persisting Changes to the Backend Store Box Notes is a stateful service and as with any stateful service, we needed to ensure that the changes cached in memory were written to a persistent store. Instant writes to a datastore provided immediate write guarantees but added latency when providing real-time updates to clients editing the same note. We needed a datastore that provided very low latency asynchronous writes with confirmed guarantees of each write. The load pattern on the datastore for a service like Box Notes varies from a standard CRUD service. Due to its collaborative editing nature, the service experiences frequent spikes in usage. Our data graphs show peaks and valleys of load. Considering these write patterns, we realized a need for non-relational datastore. A relational datastore always conforms to ACID requirements, thereby adding latency delays that under heavy load could become unacceptable for a collaborative content editing service. Also, going with a relational store would have meant considering expensive sharding strategies upfront to keep up with large amounts of data in our write-intensive service. HBase emerged as our choice for a non-relational datastore. It is a distributed, scalable store modeled after Google's BigTable that is popular in the Hadoop community due to its ability to provide real-time read/write access to large amounts of data. HBase is layered on top of HDFS, a distributed file system that provides the necessary scale required for a write-heavy service. The built-in replication factor of HDFS provides failover resiliency without the need to set up a master-slave architecture. The entire failover process happens transparently as data is replicated across multiple data nodes. The data nodes also provide the horizontal scale as needed. Amongst various non-relational stores available today, we selected HBase since our operations team has expertise running it at scale. HBase is used to support various other features of Box—Policies and Automation, Events API and our monitoring systems. While HBase satisfied all requirements around write-intensive nature of Box Notes, we had to consider additional data-related requirements. Box Notes had to adhere to the permissions model of Box around files and folders. To integrate with this permissions model residing in Box's MySQL database and to provide secondary backup for Box Notes, we created a service that uploads snapshots of notes to Box at short intervals of time. This combination of HBase and Box's datastore enabled us to integrate and leverage Box's feature set while satisfying the data needs specific to a collaborative editing service. Metrics, Monitoring, and Analytics We integrated Box Notes with Splunk to analyze logs generated from the service, monitor exceptions and failures to take immediate corrective actions. The Splunk dashboards provide visibility into client disconnects, Node.js instance crashes, runtime exceptions, and other information to help ensure high serviceability for Box Notes users at all times. Additionally, these application specific logs are streamed to our Hadoop infrastructure using Kafka. Our Hadoop setup enables us to run Hive queries and collect Tableau reports to obtain business metrics around Box Notes usage patterns. The Kafka stream also puts application logs in OpenTSDB, a time series database to generate graphs for performance monitoring. OpenTSDB in addition contains system specific logs like CPU and memory usage per server collected using tcollector. Alerts are set for certain thresholds on CPU and memory usage as well as application throughput and latency. Conclusion There was a lot involved in getting a new technology stack ready to be used by millions of users. Node.js, with its simple single-threaded architecture and no inbuilt assumptions about uptime, helped us build a stronger high availability system through use of external tools for process management, cluster management and monitoring systems. HBase simplified the architecture for the backend store through HDFS, removing the need for sharding or master-slave setup. That has given us a high level of confidence in our architecture. A system as involved as Box Notes cannot be fully explained in a single blog post. Our engineering team will be publishing more details in coming weeks covering the use of Solr for enabling search on notes content, Editor technologies for collaborative editing serving web and mobile clients and other interesting insights from our engineering experiences building Box Notes. Stay tuned! As our roadmap continues to evolve beyond this architecture, we are always interested in working with senior engineers who like to tackle new challenges. Check out our various openings including ones for content creation-Box Notes and Node.js here.