Declarative Tests

Testing is an important but unfortunately boring part of software development. Here is a short recollection of how we ended up with declarative tests at work, what they are and how it ended for us.

As probably most other software developer teams, we “love” manual product testing at work. Especially before a release, it is not uncommon for us to spend one to two weeks almost exclusively on testing. For us this means repetitive, monotonic work that mostly comes down to click-orgies in the product UI. What’s not to like about it?

Thankfully, at some point in 2018 we were given the time to look into automated tests - though only for nightly tests at first. Some other team in the company already had nightly tests running for a related product, so that was a natural first step for us as well. In order to cut down on precious development time, we were asked to use their approach as-is. Afterall, the sales department had already sold features that would take us the next few months to implement¹.

Unfortunately, it wasn’t the saving grace, yet. To understand why, let’s have a look at the following setup. It consists of two files: the test file

import paramiko

import server2 as config
from service import send_http_request_to_server
from custom_asserts import *

def test_foo_service():
    with paramiko.client.SSHClient() as ssh:
        ssh.connect(config.SERVER_ADDRESS)

        for i in range(config.NUMBER_OF_ITERATIONS):
            print('test foo service iteration {}'.format(i))

            obj = ... # several lines of setup code
            send_http_request_to_server(obj, config.SERVER_ADDRESS)

            stdin, stdout, stderr = ssh.exec_command('cat /var/log/some_file')

            custom_assert_eq(stdin, some_reference, 'could not verify foo service')

and the configuration file

NUMBER_OF_ITERATIONS = 9
SERVER_ADDRESS = '192.168.15.72'

This is obviously a construed example. However, it isn’t too far away from an actual test setup that we had at work a while back.

A few things can be said about this code. Among other things, it is directly using a library dependency without encapsulation and it has server-specifics hard-coded in the test specification. But both of these things are not the topic of this post.

As far as we were concerned, the need to have the same boilerplate code in every single test file was the main issue. The object construction is generally different for individual tests, but a lot of them will look quite similar. The canonical way of fixing this would be by extracting all of the boilerplate into a helper function, for example:

def construct_some_input_object():
    obj = # ...
    return obj

def test_foo_service():
    run_test_iterations(
        iterations = config.NUMBER_OF_ITERATIONS,       # not ideal, but...
        server = config.SERVER_ADDRESS,                 # ... this is still worse
        obj_generator = construct_some_input_object,
        remote_command = 'cat /var/log/some_file',
        reference = some_reference,
        error_msg = 'could not verify foo service')

However, some key people in our team were opposed to having the object contruction in Python (for product-specific reasons that I won’t go into here). While this meant that we wouldn’t have any automated tests for some more time, it gave us the opportunity to rethink from the ground up how we wanted to write our tests.

Which brings us to the main topic of this article. While discussing requirements for our tests, I happened to revamp my home network setup and experimented with configuration management (“infrastructure as code”). Most of the tools in that space seem to use some kind of declarative approach to specifying the desired setup. The main idea of declarative programming is to specify what needs to be done as opposed to how to achieve that. Well known examples of declarative languages/frameworks are CSS, QML and Ansible. Inspired by these examples, I proposed to experiment with a more declarative style of specifying our tests as well.

However there was nothing quite like it when it came to testing frameworks - at least as far as we could find. The Cucumber framework goes into that direction but at the time it didn’t have any Python support (we still wanted to re-use some auxillary code from the other team for the testing backend and Python is our language of choice for tools). It also didn’t quite fit our specific requirements. But if you haven’t encountered the Cucumber framework before, I’d suggest to read up on it a little bit, or even give it a try!

So after some more brainstorming on the side, a cleanup sprint was coming up and I was given two weeks’ time to implement a proof of concept as part of the sprint. After approximately one week I had ported an existing nightly test and rewritten some core aspects of the existing Python code, using pytest for backend tasks like test selection, reporting and debugging (pytest’s fixtures and pdb integration are awesome!). The test was specified as JSON files - some service descriptions were kept as separate files for readability and reusability - and had some extra features to boot. Not only could it be easily run against different systems by specifying the IP addresses on the command line now, it automatically cleaned up all resources and it could use virtualized products (our products are hardware-based) in a docker setup, spinning up additional resources as required by the test.

On top of that, the actual test specification got a lot more structured. Unfortunately, our test specification is heavily product-specific, so I wouldn’t make sense to show it here. However, here is a generalized example to give you an idea of how it looks like:

{
    "requirements": {
        "available resources": [
            ...
        ],
        "has to run on hardware": false
    },
    "api version": "3.14",
    "setup": {
        ...
    },
    "expectations": {
        "status": {
            ...
        },
        "warnings": [
            ...
        ]
    }
}

The decision for a JSON-based format was done for ease of implementation since our API is JSON-based. For certain aspects of the test, the test framework reads the resource description from a specified JSON file and sends it more or less as-is to the API endpoint (another reason to keep them in separate files).

As a side note: JSON is a pretty simple format and lacks features like comments (you could emulate that with {"comment": "foo bar"} but that is quite clumsy). If you wanted a more powerful format than JSON, you could turn to something like YAML instead. The pytest documentation even provides an example for how you could specify tests in YAML files. From my point of view, YAML has some drawbacks, though. First, it is not in the standard library of Python. That is usually not a big deal, but it’s that one extra step you have to take care of when setting up and maintaining your testing pipeline. More importantly though, YAML provides ample opportunity for you (or your colleagues) to shoot yourself in the foot. There is that topic of safe versus unsafe load. And then there are features like references and loops to really confuse your co-workers.

Around half a year after implementing the proof of concept, we had started to implement some additional tests using this declarative style. At the time I wrote the following:

At work we have started to experiment with declarative tests. The jury is still out regarding what the entire team thinks. However, those who have worked with it so far seem to like it a lot better than our previous Python-only tests.

Skip forward to 2021 and we’re still using the same framework. And so far it has been holding up very well. In fact, we have even extended it with a range of additional features like templating and various product-specific options.

To my own surprise, the backend code for the tests has required very little maintenance. The tests themselves obviously require some maintenance as the products evolve. But after using this style of writing tests for more than two years now, we haven’t encountered any major drawbacks. The biggest grief so far is still the lack of “proper” comments in the test specifications themselves (we do have some "comment": "..." elements in a few test files). On the plus side - one could argue - the lack of proper comments encourages short, consize tests that don’t need any additional explanation.

And finally, as hinted at in the introduction, we are also using these tests for release test automation. There are still quite some things to test manually, as not everything is easy to automate. However, it allows us to cover a broader range of features in the same limited amount of time. Afterall, our sales department still sells features to customers long before we have a chance to implement them.

Our customers are generally aware of the fact that some of the features they payed for do not exist, yet. ↩