This summer I interned at Quansight Labs with a focus of integrating Hypothesis into the testing suite of SymPy. The primary pull request to add Hypothesis to the test infrastructure to complete this was simple. The primary challenges lay thereafter: questions around the utility of Hypothesis and its appropriate usage arose.

There are many ways to test your software: unit testing, regression testing, diff testing, and mutation testing are a few that come to mind. This blog post is for anyone interested in understanding the value of utilizing property based testing in their software projects. The post will be broken up into three parts:

What Is Property Based Testing?
What Is Hypothesis?
Experience Integrating Into SymPy

If you wish to follow the examples in the blog post, you will need to have the latest version of SymPy (on master) and you need to install Hypothesis via:


$ pip install hypothesis

What is Property Based Testing?

Property based testing (PBT) is a technique where instead of testing individual test cases, you specify properties you wish to hold true for a range of inputs. These properties are then tested against automatically generated test data. PBT uses logical properties over generated test data to facilitate broad, robust testing that can expose edge cases not easily found via traditional test cases.

PBT shines when testing generic functions and libraries working across a wide range of possible inputs, where manually enumerating test cases is difficult but the behavior in question is easily testable. Examples include:

Mathematical operations
String formatting
Database migration
Compression algorithms

Let's take a simple, concrete example. Say we want to ensure that after multiplying two polynomials together, the degree of the resulting polynomial is the sum of the degrees of the two polynomials. We could write a test case for this:


from sympy.abc import x
def test_degree():
    f = Poly(x**2 + 1, x)
    g = Poly(x**3 + 1, x)
    h = f*g
    assert h.degree() == f.degree() + g.degree()

Notice we are limited in how many various combinations of f and g we can test. It would be better if we could fix a property and have a library automatically generate interesting test cases and run them for us. We would no longer need to worry about thinking up input/output pairs. This would increase our trust in the implementation.

What Is Hypothesis?

Hypothesis is a Python library for creating unit tests which are simpler to write and more powerful when run, finding edge cases in your code you wouldn’t have thought to look for. It is stable, powerful and easy to add to any existing test suite.

Now, let's test out the property above using Hypothesis:


from hypothesis import given, strategies as st
@given(f = polys(), g = polys())
def test_degree(f, g):
    h = f * g
    assert h.degree() == f.degree() + g.degree()

Note, that here polys() is custom-built for SymPy and generates a random polynomial with integer coefficients. It is not a built-in Hypothesis strategy.

How Hypothesis Works

Give Hypothesis the types of inputs you are expecting using the @given decorator and it will automatically generate examples using the strategies module. It will then use these examples to test and report the minimal failing inputs (if any).

Hypothesis is able to come up with interesting inputs using a combination of smart generation, guiding metrics, and feedback loops. For example, the built-in strategies like floats() have default behaviors tuned for common useful values. That is, while some inputs are random, it also tries to choose cases that commonly cause errors (like 0 or NaN). Hypothesis is adversarial in the way it chooses inputs to test against your function.

Hypothesis comes with built-in strategy functions for common Python data types. In the example above, we accessed integers using st.integers(), but Hypothesis also gives you access to floats(), booleans(), fractions(), dictionaries(), and many more.

The full documentation for Hypothesis can be found here, and for a nice and robust introduction check out this video from PyCon 2019.

Overall, Hypothesis is great at finding bugs and in general, writing tests as high level properties keeps your code consistent.

What Hypothesis Is NOT

It's tricky to test black boxes, machine learning systems, simulations of complicated systems, or code with lots of state (e.g., things which depend on a database or talk to a network) with Hypothesis. Hypothesis must receive an understanding of the input and output behavior that can be easily modelled.
Hypothesis is not just a bug finder; it also helps protect against future bugs. Hypothesis disallows the existence of latent bugs which increases trust in the current implementation of whatever function is being tested. Hypothesis may also reveal weird design patterns.

Integrating Hypothesis Into SymPy

SymPy is an ideal library for property based testing so integration was painless.

What Has Changed in SymPy?

Hypothesis is now a required testing dependency of SymPy. Property based tests can be created in a test_hypothesis.py file in the appropriate test directories (more details in the new contributor documentation). An example testing file using Hypothesis can be found in ntheory/tests.

Utilizing Hypothesis in SymPy

Hypothesis was able to find various bugs and code design flaws. Below I will highlight two:

The resultant function returning incorrect answers. While this bug ended up not needing to be resolved, it did reveal the utility in having Hypothesis check consistency between implementations of the same function.
There were various issues with the lowest common multiple (LCM) function (notes in issue #25624, PR #25636, and PR #25517), the biggest being when the LCM should make use of the integer implementation vs polynomial implementation (when the defined polynomial is essentially an integer).

Acknowledgements

Thank you to my mentors Aaron and Matthew for guidance during this project. Added thanks to Melissa and the general internship program for their support.

Integrating Hypothesis into SymPy