Developer Tutorial - Image Recognition & Box Platform

I have always been fascinated with the power of machine learning to recognize patterns in data. One practical application of machine learning is the ability to classify images at scale. In this blog post, I'll show you how I built an iOS app that lets a user take a photo, automatically recognize what is in that photo, and then enable the user to search across these photos based on keywords. 

This same workflow could have applications across industries: 
  • Interior Design - We could use this workflow to keep a portfolio of past projects with all of the nuances automatically added as searchable metadata.
  • Insurance  - We could use this workflow to quickly catalog a person's physical assets.
  • Photojournalism - We could use this system to reduce the need for an editor to manually identify keywords for each new photo added to a stock image database.

System Design

Mobile App - The mobile app is the core of the user experience.  This will be where the user uploads photos and sets metadata for every photo.

Application Server - The server is responsible for using JWT to authenticate as a Box Platform App User and hand the mobile app a properly scoped access token.
Image Recognition API - For the image recognition piece, I choose to use Clarifai's API based on a quick browse of their documentation.  I also found that their API worked really well with down-scaled images which is extremely important for bandwidth considerations.  
Box Platform - This is where we will securely store and access the photos. By leveraging Box Platform, we will be able to easily store and view photos without them ever touching our server.  We will also be using Box Platform to index and search through the uploaded photos based on metadata.
Part 1: Application Server
I built a simple Node.js application server using Cloud9.  Not only is it a PaaS offering but it also has an integrated IDE, which makes coding and then running the server on a publicly-accessible endpoint extremely easy.  
This server has one API route which accepts a Box app user id and returns a JSON object containing the user-scoped access token, Box app user id, and a user name.  Very simple. 
The first step is to authenticate the application using App Auth.  We then store that authentication object for use when requesting a user-scoped token.  

You can learn more about how Box uses JWT authentication here as well as how to set up your own credentials here.
Once I have initialized the SDK, I am going to initialize an Admin API client.  This is a client which is scoped to the enterprise and can perform CRUD operations on users. 

The only other piece to this server application is the single route, which the iOS app will call from the MainViewController in order to retrieve a user-scoped access token. I am using the Admin API client to get the user's information and then grabbing the access token via the getTokensJWTGrant method.  I then construct a simple JSON response conatining the name, id, and access token.

Since this server is meant to be a prototype, I included a way to generate a Box App User.  You will need the ID that is logged to the console to use in the iOS app.  Once you have the server running, head to http://localhost:3000/signup.  Enter a name or email address and press the submit button.  The next page will give you the ID, save this as you will need to input it into the iOS app before running. It will also be logged to the console.
That's really all there is for how the server interacts with the iOS client.  You can download the source code here.  Check out the readme for instructions on how to run it.
Part 2: iOS Application
The iOS client application is where all of the magic happens in this app.  One of the shortcuts I took when creating this proof of concept was to hard code a Box App User ID into the call to the server.  The Box Platform API allows you to authenticate as a user without going through the traditional 3-legged OAuth model.  In your application, you would grab a user scoped token (or create a Box App User) after they successfully authenticated using your own identity management system.  What it comes down to is how you want to map an internal user to a Box App user.   In this app, I am assuming that piece has already happened and returned the app user's ID so that it can be sent to the example server I provided.  
The first thing you will want to do is update the 'config.plist' file to include the following:
  • Your Clarifai API credentials. Sign up for a free account at Clarifai.
  • The URL of the sample server.  i.e. http://localhost:3000/
  • A Box App User ID. You can generate one by heading to the '/signup' in a browser.

You should be able to run the app and have all the pieces work, but first let's walk through some of the code that makes the app work.  

The first screen in the app is the MainViewController.  In the 'viewDidLoad' method we're calling the example server and getting an access token.  We're then storing it in the NSUserDefaults.  In a production application, I would recommend storing this token in the Keychain.  
On the UI side, once the server returns a response, the Box app user's name and id will show up.  There are two buttons on this screen.  One goes to the SeachViewController, the other goes to the UploadViewController.  Pretty straightforward.
When using the Box iOS SDK for App Users, you will need to initialize the Box client a little differently than if you were to use traditional OAuth.  You will need be sure your class conforms to the 'BOXAPIAccessTokenDelegate' protocol.  You then override the 'fetchAccessTokenWithCompletion' method to tell the SDK how it should go about obtaining an access token.  In this case, we're simply pointing it to NSUserDefaults.
The next thing we're doing in this controller is initializing the Box and Clarifai clients.  See below:

As far as the code for bringing up the camera roll or device camera, it should be self explanatory, so let's move on to what happens once the user has selected an image (either camera roll or camera).  In the didFinishPickingMediaWithInfo method, I am encoding the UIImage as a JPEG and then piping it to the recognizeImage method so that it can be sent to Clarifai for recognition.

Notice the method scaleDownImage? In Clarifai's documentation, they do not recommend sending a full resolution image because of bandwidth considerations.  They go on to say that a full resolution image would have little impact on the accuracy of the results. Once Clarifai's response comes back, I am simply parsing it out and lumping all of the tags into a large NSString.  These will eventually be part of the metadata set for an image.  

Uploading to Box is extremely simple as you can see from the code below.  Since I didn't want to force the user of ths app to enter a file name each time a photo is uploaded, I am just constructing a name using a timestamp. 

We need to tell Box what folder to upload the photo.  In the example project, I am using "0", which is the root directory.  Once the upload succeeds, part of the response object will be the file id.  This is important because this is how we will be attaching the metadata to the file in the next step.  

To add metadata to a file in Box, you will need a Metadata template.  The README included with the iOS project gives the steps for doing this.  But in a nutshell, you want to log into your Box enterprise as an admin and head to the Metadata section.  The template you create will need to look like the image below.

Given that template exists in your enterprise, the code that attaches the metadata from Clarifai to your photo is shown below.

If this operation is successful, we pop a UIAlertController with the metadata.  The upload portion is now complete and metadata has been attached to the photo. The next step is to use the Box iOS SDK to search through files based on that metadata.
Notice that this controller subscribes to the BOXAPIAccessTokenDelegate.  Once the Box client is initialized, we take the text entered in the UITextField and create a BOXSearchRequest. In this case, I am limiting the search results to files with the "jpg" and "png" extensions.

From here, if there are results, they will be sent to SearchResultsTableViewController.  Pressing any of the table view cells will bring up a preview of the photo and a bar button which will display metadata.  To handle displaying over 120+ files types, I am using the Box iOS Preview SDK.  

This is your run of the mill UITableViewController.  If you have used one before, the code should be very straightforward.  The only thing that I would like to call out is how easy it is to use the Box Preview SDK.
Like some of the other view controllers in this app, we are going to subscribe to the BOXAPIAccessTokenDelegate.  Since we will be calling the Box Preview SDK, we are also going to subscribe to the BOXFilePreviewControllerDelegate.   If you just want to push the preview controller without any customization, you can skip the subscribing to the delegate.  I am doing so since I am overriding one of the bar buttons on the preview controller.
When a user clicks on one of the search results' table view cells, we are going to instantiate a BOXFilePreviewController as shown below.

In order to alter the functionality of the right bar button, I am altering the willChangeToRightBarButtonItems protocol.  I am simply replacing the default button with a new one called "Metadata" that will call the showMetadata selector as shown below.

Making This App Production Ready
If this was a production application, there are a number of changes I would implement including:
  • Secure storage of API credentials in the keychain rather than NSUserDefaults
  • True user authentication (not hard coding a Box App User ID in the app)
  • Robust error handling
  • Ability to edit the photo's keywords in the Box metadata
  • Integrate IBM's Watson API to implement text and face recognition
  • Handling image metadata creation on the server. More specifically, the user would upload the photo to Box and my server would monitor the Box event stream for new images. The server would then send Clarifai new images and set the metadata on them.

Getting Started with Box Platform

This tutorial highlighted the power of Box Platform, which provides enterprise-grade security, a granular permissions model, and rich preview capabilities for 120 file types. If you want to test out Box Platform in your application, click here to create a free developer account.
If you have any questions about this tutorial, please feel free to ask in the Developer forum within Box Community