Introducing genty - data sets for python unit tests

Testing and testability are a big part of Sync 4, Box's new desktop sync'ing application. We committed to the company and ourselves to always maintain excellent engineering practices. In that light we spent a lot of time thinking about testing and testability. We wanted (and continue to want) quality code coverage and effective testing patterns. One aspect of testing continued to bite us... expressiveness. We found ourselves struggling to write the sort of tests we wanted, that covered all the cases we wanted.

The Problem

Allow me to explain: imagine a set of classes that represent geometric shapes, each inheriting from a base class Shape.

We had 2 choices for our tests:

  1. Write a bunch of tests, each for a different test case
  2. Write a single test which manually iterates over the bunch of test cases

The former solution is not expressive, not DRY, not maintainable, and tedious, all of which encourage taking shortcuts or making copy/paste bugs. The later solution violates a tenant of unit-testing where a test should test "one thing".  These solutions would look like:

[code]
# lots of duplicated code in each test (remember... this is a simple example)
class TestShapes(object):
def test_square_area(self):
self.assertEquals(Square(3).area(), 9)

def test_rectangle_area(self):
self.assertEquals(Square(3, 5).area(), 15)

def test_funky_rectangle_area(self):
self.assertEquals(Square(3, 0).area(), 0)

def test_circle_area(self):
self.assertEquals(Circle(2).area(), 4*math.pi)
...
[/code]

and

[code]
# A single test that actually tests many different code paths
class TestShapes(object):
def test_area(self):
for shape, expected_area in [
(Square(3), 9),
(Rectangle(3,5), 15),
(Rectangle(3,0), 0),
(Circle(2), 4*math.pi),
...
]
self.assertEquals(shape.area(), expected_area)
[/code]

Argh! Neither option is very nice. We needed something better.

 

The Solution

Let me introduce you to genty, our solution to this problem. Here's how we use it to solve the problem:

[code]
@genty
class TestShapes(object):
@genty_dataset(
(Square(3), 9),
(Rectangle(3,5), 15),
(Rectangle(3,0), 0),
(Circle(2), 4*math.pi),
)
def test_area(self, shape, expected_area):
self.assertEquals(expected_area, shape.area())
[/code]

In genty we hooked on to the idea of data providers (data sets) for tests.

We looked for existing solutions and didn't find any that met our needs. Nose, one of the popular test frameworks/runners for python, has a feature called test generators, but it doesn't work in unittest.TestCase subclasses... oops, all of our tests inherit from unittest. We also wanted a style that was more data driven and more powerful. Tying our solution to nose also didn't seem wise as there are other fine test runners like py.test. So we wanted a solution independent of a test runner.

Fake examples, like the geometric shapes example above, don't express the true usefulness of genty. But some real numbers should. The central codebase of Sync4 has ~2700 test methods as of Feb 2014. These test methods cover all our unit and B-Y tests*. Almost 40% of these test methods use data sets. Imagine that... in 40% of our tests we've found it useful to pass multiple data sets. Some tests use 2 or 3 data sets, some have 100's of data sets.

This evolution has really changed the way I think about tests. I used to think that tests shouldn't test code that doesn't exist. If a method under test didn't branch on some attribute, say the "type" of a geometric shape, I'd say why bother testing that code with various shapes. But now I think otherwise. Now I'd write:

[code]
@genty
class TestShapes(object):
@genty_dataset(<iterator over all shapes>)
def test_something_that_does_not_care_about_shape_type(self, shape):
...
production_code()
...
[/code]

The awesome thing about such tests is that they help your future self. If the code that previously didn't care about shape type suddenly does:

[code]
class Shape(object):
def some_method(self):
... # previously never overridden

class Circle(object):
def some_method(self):
# This is new code, just added, overriding inherited behavior.
# Circle if the first class to override this method

def production_code(shape)
...
shape.some_method()
...
[/code]

You're got some built-in protection. Some built-in code coverage. Some tests that could fail, pointing you to what else needs updating based on the new code. Freak'in awesome stuff! I've made changes in production code and had tests fail that point me exactly to what else I need to change. Among other things (another post?) this is borne from our use of data sets (a la genty). Did I mention this is freak'in awesome?

The README file on on github does a nice job showing the features of genty as there are many more than what's hinted at here. We've developing genty on github. Fork it. Submit enhancements. Also checkout our open source page.

If you want to start using genty right now we're on pypi so simply pip install genty

Genty is pronounced "gen-tee" and stands for test generator. It also generates awesome!

Enjoy.

*B-Y tests: B-Y is our fun term for tests that aren't quite complete end to end, A-Z tests, there's a bit of mocking at 'A' and 'Z' edges. Hence they are only B-Y :)