Automate paperwork and never manually copy data from a PDF again with Box and Docparser

This guest post comes from Giuliano Iacobelli, CEO of Stamplay, and originally appeared on the Stamplay blog. The usecase of extracting content from forms is one that we see all the time at Box, and we're pleased with how easy Stamplay has made the connection between Box and our partners at Docparser. This type of functionality, built on legacy ECM technologies, would have been cumbersome and expensive. With Box and our partner cloud services, you can digitize key workflows for your business in minutes.

Businesses are wasting almost an entire work day (6.8 hours) a week, doing paperwork. Activities like filling out forms, copying data, searching for documents, or chasing signatures all over the company is an invisible cost that slows down business growth, hurts productivity and wastes precious resources.

Back-office processes should be smarter and require almost no manual work and here we show how to leverage Box and DocParser to get your time back.

In this example we’re going to automate the management of a PDF Form. This process can be applied to any other form regardless if it’s a tax form, an invoice, a purchase order or a custom form used at your company.

Parsing forms with Docparser

Docparser is a PDF parser that can automatically extract the data you are looking for and offer an easy to use visual interface. For each type of document used at your company you have to create a dedicated parser to tell Docparser what that you want to extract.

Creating the parser

The first thing required is to create our first parser. Sign up for an account on DocParser and create your first parser. You’ll be prompted to choose between some presets, choose generic Filled Form, give it a name and provide a couple of sample PDF.

Creating the parsing rules

For each field you want to extract from your form, you have to create a parsing rule. The process is very simple:

  1. Pick a parsing rule (for modules like the one below you can use the Text Field position one most of the time)
  2. Highlight the area where you expect the text you want to extract to be written
  3. Give it a name and go to the next one

For this example let’s extract just two values from our form and let’s name them “field1” and “field2”.

Configuring Metadata template on Box

Box has a feature called Metadata that allows users to define and store custom data associated with their files in Box. This custom data can serve many different use cases like searching a file by custom attributes that we defined.

Note: this will work on Box accounts at the Business+ level or higher, since it relies on direct download capability and metadata templates.

Before being able to apply Metadata to a file we need to create a Metadata template first, this can be done from the Admin panel of your Box account.

In order to accomodate the data that will be extracted by Docparser, the template needs to match the parsing rules created previously on Docparser.

The Metadata template should reflect the fields that you’re extracting from your form with Docparser (which at the moment we created with field1 and field2).

Automate all the things

This process automation can be build with two Flows. The first one grabs files from Box and pass it to Docparser, the second one receive extracted data from Docparser and applies it as Metadata on the file on Box.

This project is available as a Blueprint, an easy to customize pre-built template to help you get started quicker. Open this link on a new tab to initialize it on your Stamplay account.

You’ll be prompted to pick a name for your project and then a wizard will start. After that Stamplay will prompt you to:

  • Provide Docparser API Key that can be found in the API Credentials section
  • Connect your Box account

Once you have connected the two services click on Next.

Now it’s time to fill the blanks of the workflow, you’ll be first asked to select the folder where you’re going to upload your forms.

On second step and third step of this configuration you’ll have to pick from the dropdown the previously created parser from Docparser.

Lastly we need to pick the Metadata template and type a sample of that (still reflecting our structure that currently has field1 and field2.

Now let’s see if everything is running properly!

Configuration Checkpoint

What we want to do now is to verify that we’re passing documents to Docparser properly when we upload them on Box.

Let’s upload a copy of the PDF form you used to create the parser on the Box folder that you just connected to the blueprint. If the setup was successful you’ll see two entries in the realtime history of your Flows that you can see in the project Dashboard.

If so you can now check the file on Box and see the data extracted from the PDF available as Metadata.

Customizations

If you want to customize this template to fit your own form. You have to:

  • Create a parsing rule for each field you want to extract from your form
  • Create a matching field on the Metadata template to host the data extracted by Docparser
  • Edit one of the Flows on Stamplay to forward the data accordingly

For the last step the process is quite easy. Run at least once the flow by uploading a new document on Box with the new parsing rules configured on Docparser.

Open your project on Stamplay, open the second flow and click on the second step. In the wide text area is written the Metadata that we’re going to pass to Box, edit this so that it contains a new line for the new parsing rule created (e.g. field3) and then grab the values from Docparser using the button the right as you can see below.

In the text area you will end up having something like the one you see below.

{
"field1":"FIELD1_DATA_FROM_DOCPARSER_STEP", "field2":"FIELD2_DATA_FROM_DOCPARSER_STEP", "field3":"FIELD3_DATA_FROM_DOCPARSER_STEP"
}

Conclusion

At Stamplay we make it easy for people to automate processes and create high value integrations by tying together APIs. If you need help to connect your apps or have an API that you want to make easy to connect with tweet us at @stamplay and/or drop us a mail at support@stamplay.com. To learn more about Box Platform, and how to use our Cloud Content Management APIs in your applications, visit us at https://developer.box.com.