Automated testing has become standard practice in the software industry over the past decades. While automated testing has many benefits, it can also do more harm than good. To understand this risk, we need to consider these aspects:

  1. The main point of automated testing is to increase productivity by preventing old bugs from creeping back in.
  2. Test code needs to be treated with the same care as productive code. Low quality test coding makes the tests difficult to change and thereby reduces the productivity gain.
  3. Writing tests is an investment which should pay off in the future. This only happens if we can save time in the future by relying on the tests. If we never modify the code again after we wrote tests for it then the investment will not pay off. As a result, it can be more economic to manually test code which we won’t change in the future (sadly, this is hard to predict correctly). However, it can also be the right decision to write tests for such code if it is easier to test automatically then manually.
  4. Automated tests need to be maintained. Tests can break even if the production code itself is unchanged. The more maintenance is required, the harder it is to break even in terms of productivity.
  5. Similar to manual testing, creating automated tests doesn’t increase quality. See this article on the subject for more details.

As we can see, automated tests are all about productivity. Ideally, we want to have tests which are low maintenance, quick to execute, easy to modify and have good defect localization (a test has good defect localization if it is easy to tell why it failed). These ideal tests should cover code which we often modify. This tells us that we should strongly favor unit tests as these fulfill all of these characteristics. It is hard to go wrong with writing unit tests as they are so low maintenance that investing time in their creation almost always pays off. Also, as tests only pay for themselves when we modify the code which is covered by them, we should never add tests to legacy code just to increase our code coverage! The proper time to add tests to legacy code is when we want to modify said legacy code. If the legacy code is never changed, we don’t need to cover it.

Unit tests are great, but sometimes we feel the need to add integration tests into the mix which test larger parts of our code. This is where things get dangerous. Integration tests pull more parts of the system into the test context, including messy parts like the database, which makes them slow and brittle. To make things worse, they also have poor defect localization as there are a lot of reasons why the tests might fail. As a result, it is quite common for integration tests to reduce productivity because so much maintenance is required. Usage of integration tests is often cyclic:

  1. We decide to add more integration tests.
  2. More and more integration tests are written. As a result, running all automated tests takes longer and there are more test failures.
  3. We decide that all these integration tests just take too long and fail too often. Hence, we move them into a separate pipeline which is only executed at certain points (e.g., once per day).
  4. As the tests no longer block merging, failures are ignored.
  5. We decide to delete the integration tests as they fail too often and we are unwilling to invest more in them.

After some time has passed, the cycle restarts. As we can see, it is very easy to waste a lot of time with integration tests. But how do we avoid that?

First, before creating a new integration test, we need to make sure that we know what we want to achieve. If we think about this carefully, we might come to the conclusion that a unit test is enough which neatly avoids the problems related to integration tests. For example, it is rarely beneficial to write an integration test just to test that certain components are wired together correctly. This can be tested manually and as the wiring will not change often it is questionable whether the integration test will pay off. However, integration tests are required for things we cannot test on a unit level, e.g., complicated database queries. Once we’re absolutely sure that we need an integration test, we need to decide whether this integration test should be critical that is if it should block merging if it fails. Failures in critical integration tests are very disruptive and hence very urgent to fix. This leads to additional work when we cannot immediately fix the test because we’re busy with other tasks. In that case, it is necessary to disable the test to unblock ourselves and to reenable it once the fix is done. Marking the test as non-critical avoids this extra work, but makes the fix less urgent which increases the risk that the test never gets fixed. We also need to keep in mind that integration tests can break because of changes in a module owned by another team. Depending on how busy the other team is, it might take them a while to fix the issue and then we have to disable the test or set it to non-critical to unblock ourselves.

Next, we need to make sure that our integration test has reliable test data management. As an integration test needs certain data in place and manipulates it during the test run, we need to make sure that we always start the test with the expected data. It is not enough to clean up the changed data at the end of the test as a failure during the execution might prevent our data cleanup from running. We need a more robust solution here, for example a database container with our expected data which we spin up at test startup and just discard after the test is done. This is quite robust, but also requires maintenance as we need to modify the container data whenever our test requirements change.

Once the test data management is in place, we also need to decide which parts of the system our integration test should touch. It might make sense to replace parts of the system with test stubs if we aren’t interested in verifying them. This makes our test faster and more robust. For example, let’s assume that certain classes get called when we insert a certain record into our database. Some of these classes are in our module while others belong to a different module. We don’t need to use real versions of classes which we don’t want to verify in the test.

Last, we need to think about defect localization. There are a lot of reasons why an integration test can go wrong. To make diagnosing the root cause easier it helps to add guard assertions to the test setup phase so that we know immediately that something is wrong. Also, it might be helpful to add debug log statements in the production code and to enable these logs when running integration tests.

Conclusion

The purpose of automated tests is to increase productivity. However, it is possible to actually decrease productivity if automated tests (usually integration tests) require too much maintenance. To prevent this, we need to minimize the number of integration tests we write and take great care to build high quality tests if we decide to add them.

If you liked this blog post, please share it with somebody. You can also follow me on Twitter/X.