Dev and prod parity

If you are like me, you've worked in places where your development environment and your production environment have notable differences. I've certainly seen cases where I have had bugs on my local dev machine that don't happen in prod. I've essentially said (more often than I'd like to admit) "I'm sure it is just something messed up in my dev setup, so I'll ignore it." You've probably also had times where your teammate's system works while yours didn't (or vise versa), even though you are both on the same commit of the same branch of the same repo.

Even setting aside differences in the macro environment (environment variables, system configuration, test fixtures that someone set up and then forgot to tear down, different versions of Node.js or Java, etc), your local dev setup may have subtle but no-less relevant differences from prod. You may be building on a Mac or Windows system, but deploying onto Linux. This means you may have packages with native code that differ subtly in behavior. Theoretically, Node.js itself may have slightly different behavior on different platforms.

These differences are a kind of "grit" that makes for an unsatisfying development situation. Not fully trusting your environment means you are burning time on things that actually aren't relevant to your project's goals. As the 12 Factor app also points out, keeping uniformity between dev and prod is really valuable for maximizing your throughput.

When we started building our framework for BFF microservices, we found ourselves asking: Since we are starting afresh, can we avoid building in this kind of grief? The answer, in the end, was "yes, we can". We put together a working situation that eliminates this whole class of problems, while maintaining a very dynamic development work flow. This posting will tell you about what we did.

The center of our solution is, of course, containers. In our case, Docker containers (along with kubernetes to coordinate them). As is well known, the beauty of a container is that it provides an absolutely uniform environment for running code. This environment is trivial to "throw away" and recreate. You can make a change to your environment for some test purpose, and then with a couple commands tear it down and re-create it so it is exactly the same as before you started. You always have access to a clean state.

If our goal is to eliminate the differences between the development and prod environments, containers are a crucial piece since we use the same image in dev, staging and prod. The ease of locking down every detail of the environment is a big win. It eliminates huge classes of those "It works in prod, but not dev" and "Jo has no problems, but I can't get the same code to run here."

Yet, by itself, there is still a gaping hole in this solution. While it is easy to start up a container, building an image can take several seconds (at least!). This would seem to be a huge impediment to using containers for development. After all, it presents a workflow like:

Change code
Build a new image
Start up the container
Test it
Discover bug
Repeat

These six steps could take minutes. Contrast this to working on the native development machine, where turnaround for making a code change can be seconds. How did we bridge this gap?

Our solution was to use docker-machine-nfs to mount our code repository from our laptop into the Docker container.

This unusual hybrid approach buys us the best of both worlds:

Our code actually runs in the same environment as production (any native code is linux native code. Environment variables etc. are all the same).
The whole environment can be reset with a few commands (stop the container, start a new one)
The service code lives on the development machine, where one's favorite command line and graphical development tools are available to edit code.

We were excited when we got this working. We seemed to have a highly productive, prod-like development environment. What's not to love?

Well, after we used it for a while, we found one thing which was very, very much not to love. Updating files in the laptop environment did not immediately cause things like hot-re-running of our tests to happen. Sometimes we would have to wait for a minute for a change to be reflected into the container.

A little more investigation revealed that the default nfs parameters were not tuned for this particular use case. However, there are parameters. So, with a little alteration of the refresh rate for file metadata (actimeo) ...

docker-machine-nfs default --shared-folder=... --mount-opts="actimeo=1"

...this problem was eliminated. Now we can run our server with nodemon, make a change to a route handler, and the server will restart with the new code, and tests will be automatically re-run.

We have been using this setup for several months now and find it has delivered on the goals we had. We think it would be useful for you to try out, too.

There are, of course, some caveats:

Getting a local Kubernetes + Docker + Node.js environment set up and configured can take a couple hours.
There are genuinely more "moving pieces" in this setup, versus just doing Node.js development on a laptop. We have a tool to help with this, "diagnose", which we will talk about in a future blog entry.
We strive to be as close to production as possible, but in two cases we have drifted from absolute fidelity:
We need things like a gcc compiler on the docker image so as to install all the necessary Node.js packages.
We need to give the container more memory than we do in production so it can handle running multiple processes (test and server) at the same time.

These have proven to be comparatively minor caveats, and are far outweighed by the uniformity of development experience that this brings us.