Today, at BoxWorks, we introduced Box Workflow, which is the backbone of Box’s new focus on industry solutions. Box Workflow is rules based, intelligent, and open and leverages machine learning to intelligently and automatically classify, surface, and recommend content stored on Box. Our proprietary machine-learning approach, detailed in this post, will enhance these industry solutions by automatically identifying and classifying content.
To achieve this goal, we create a fully connected, document-similarity graph. Documents serve as the nodes of the graph, and the edges between the documents will be computed based on various unstructured (text), structured (metadata), and semi-structured (usage logs) features. Additionally, we derive and use latent features, like topics and concepts, from within the documents to compute similarity.
One of the biggest challenges for any classification solution is the initial requirement of a well-labeled dataset, and that problem is often exacerbated when it comes to enterprise solutions that naturally contain smaller, independent datasets. Therefore, we are taking an unsupervised approach—which does not require a large, labeled dataset—to building the initial similarity graph. To build more accurate models, we are also using interactive machine learning. Feedback from users is gathered and incorporated into the similarity graph to provide increasingly more accurate results every time.
Classification of documents is only the beginning. The combination of intelligence and automation within Box Workflow enables targeted solutions for vertical industries. For example, the legal and health industries provide clear security use cases. We can identify confidential content automatically and perform appropriate actions accordingly.
In the media and retail industries, marketing the right content to the right person is critical. Our machine-learning model could make recommendations based on which consumers will be most interested in which content. Combining human and machine learning will make it easier to match consumers with content.
Thus, having machine learning as an integral part of Box Workflow will help bring the right information to the right user at the right time. Of course, a complex machine learning and intelligence solution can’t be fully explained in a single blog post, and we look forward to sharing more about Box Workflow and intelligence in the coming weeks and months. Stay tuned!