Testing and testability are a big part of Sync 4, Box's new desktop sync'ing application. We committed to the company and ourselves to always maintain excellent engineering practices. In that light we spent a lot of time thinking about testing and testability. We wanted (and continue to want) quality code coverage and effective testing patterns. One aspect of testing continued to bite us... expressiveness. We found ourselves struggling to write the sort of tests we wanted, that covered all the cases we wanted.
Allow me to explain: imagine a set of classes that represent geometric shapes, each inheriting from a base class Shape.
We had 2 choices for our tests:
- Write a bunch of tests, each for a different test case
- Write a single test which manually iterates over the bunch of test cases
The former solution is not expressive, not DRY, not maintainable, and tedious, all of which encourage taking shortcuts or making copy/paste bugs. The later solution violates a tenant of unit-testing where a test should test "one thing". These solutions would look like:
# lots of duplicated code in each test (remember... this is a simple example)
self.assertEquals(Square(3, 5).area(), 15)
self.assertEquals(Square(3, 0).area(), 0)
# A single test that actually tests many different code paths
for shape, expected_area in [
Argh! Neither option is very nice. We needed something better.
Let me introduce you to genty, our solution to this problem. Here's how we use it to solve the problem:
def test_area(self, shape, expected_area):
In genty we hooked on to the idea of data providers (data sets) for tests.
We looked for existing solutions and didn't find any that met our needs. Nose, one of the popular test frameworks/runners for python, has a feature called test generators, but it doesn't work in unittest.TestCase subclasses... oops, all of our tests inherit from unittest. We also wanted a style that was more data driven and more powerful. Tying our solution to nose also didn't seem wise as there are other fine test runners like py.test. So we wanted a solution independent of a test runner.
Fake examples, like the geometric shapes example above, don't express the true usefulness of genty. But some real numbers should. The central codebase of Sync4 has ~2700 test methods as of Feb 2014. These test methods cover all our unit and B-Y tests*. Almost 40% of these test methods use data sets. Imagine that... in 40% of our tests we've found it useful to pass multiple data sets. Some tests use 2 or 3 data sets, some have 100's of data sets.
This evolution has really changed the way I think about tests. I used to think that tests shouldn't test code that doesn't exist. If a method under test didn't branch on some attribute, say the "type" of a geometric shape, I'd say why bother testing that code with various shapes. But now I think otherwise. Now I'd write:
@genty_dataset(<iterator over all shapes>)
def test_something_that_does_not_care_about_shape_type(self, shape):
The awesome thing about such tests is that they help your future self. If the code that previously didn't care about shape type suddenly does:
... # previously never overridden
# This is new code, just added, overriding inherited behavior.
# Circle if the first class to override this method
You're got some built-in protection. Some built-in code coverage. Some tests that could fail, pointing you to what else needs updating based on the new code. Freak'in awesome stuff! I've made changes in production code and had tests fail that point me exactly to what else I need to change. Among other things (another post?) this is borne from our use of data sets (a la genty). Did I mention this is freak'in awesome?
The README file on on github does a nice job showing the features of genty as there are many more than what's hinted at here. We've developing genty on github. Fork it. Submit enhancements. Also checkout our open source page.
If you want to start using genty right now we're on pypi so simply pip install genty
Genty is pronounced "gen-tee" and stands for test generator. It also generates awesome!
*B-Y tests: B-Y is our fun term for tests that aren't quite complete end to end, A-Z tests, there's a bit of mocking at 'A' and 'Z' edges. Hence they are only B-Y :)