One question I've seen asked by software automation teams is "If your test suite fails, should you halt the test run on the first sign of failure, or let the tests continue to run?" It's a common question, with testers taking one side or the other, with very little middle ground.

It's not as clear-cut as picking a side and sticking with it. Whenever I get this question, my answer is always "Well, it depends on your situation." That answer may sound like a cop-out, but there are benefits and risks to both sides.

Each side of the coin has its pros and cons to consider. Once you ask yourself and your team some questions to clarify your current situation, the answer becomes much clearer.

Advantages and disadvantages of running all tests, even when one fails

If you run the remainder of your test suite after a failure, you'll have to wait longer for your suite to complete despite knowing there's a failed test. That extra run time can cause a longer feedback loop between the test run and your team. Having a longer feedback loop will inevitably cause delays between the time the testers and developers can get back into debugging and fixing the tests and validating the fix, slowing down your workflow significantly.

Also, depending on your automation workflow, letting the entire test suite run after a failed test can create bottlenecks. For instance, your team might not receive a notification of the failed test run until its completion. Or in some cases, your continuous integration system is configured to execute one test run at a time which causes a traffic jam of sorts for subsequent test runs. In these types of cases, it can completely kill the team's productivity due to having to wait around.

On the other hand, letting the entire test suite run after a failure has its benefits. The main advantage is that you'll be able to have a clear picture of the overall health of your automated tests. You'll be able to check if there was a solitary failure, or if there was a recent change that is causing cascading failures on multiple tests.

Having run the entire test suite can also uncover other issues that might not have surfaced by running a subset of the tests. For example, performance issues, occasional network failures, or data corruption can pop up when executing multiple test cases, signaling a deeper problem in the application.

Advantages and disadvantages of halting a test run on the first sign of failure

When you stop running tests the first time one fails, it can give you misleading information for the test suite as a whole. You'll lose the bigger picture of the health of your automated tests since you won't know if the failure was isolated or if other test cases will also fail. You'd have to fix that test and re-run the test suite to determine if more tests need fixing.

Additionally, other forms of testing, such as integration and system testing, are prone to unexpected flakiness - caching, infrastructure problems, and third-party systems failing are among the common reasons. By stopping your test run as soon as something fails, you might not be getting enough information back to determine if there's a temporary problem or a legitimate failure. It can lead to somewhat of a guessing game.

Having the tests stop immediately on failure has a few good advantages, however. Where running the entire suite can extend the feedback loop for the team, halting the test run significantly shortens that loop. You'll be immediately notified of an issue and can jump into resolving the problem as soon as possible. Any recent changes will be fresher in the team's mind, leading to a quicker turnaround for fixing the failure. Having faster fixes will speed up the process of making sure your test suite goes from red to green.

So, how do you handle your test suite when a test fails?

As mentioned previously, the answer depends on your team and your current workflow. Here are a few rules that I have followed successfully to keep the right balance between having a short feedback loop and having a good grasp of your automated test suite's health.

Run the entire test suite when:

An external system executes the test run

If your tests are triggered on a continuous integration pipeline or run on external servers, it's often best to let the tests run entirely to get a picture of the system's health. Often, these external services will benefit from having the entire test suite run. For instance, you'll be able to have a history of test runs and other kinds of notifications that are useful to the team.

The tests don't take a lot of time to run

Some tests take very little time to run, such as unit tests. If these tests only take a minute or less to run, there's very little to gain by halting execution on the first sign of failure.

Your team doesn't rely on speedy outcomes for the test run

In some cases, knowing the outcome of a test run isn't urgent. A good example is a long-running test suite that's triggered in the middle of the night. In this scenario, you don't need to have an immediate notification if there's an issue since you'll have the results by the time you begin working the next business day.

Halt the test run when:

You're executing long-running tests locally

When you're development tests locally, having the test run stop when something fails will save you tons of time in fixing any issues your automated tests are having. While the tests are running, you can continue working on something else instead of babysitting the test run. A quick halt will aid you in fixing any issues you may have introduced in the test suite.

The development workflow depends on having a short feedback loop

Another reason to stop tests from running after failure is to avoid delaying the development workflow for the rest of the team. These days, many organizations run a continuous delivery practice, where their code gets deployed after it's committed and goes through a series of robust tests. By halting the tests as soon as something fails, the development team will be able to fix issues quickly and have their work deployed as soon as possible.

Each path is not exclusive - use either where necessary

If you noticed, the choice to keep running your tests or stopping on the first sign of danger is not an exclusive one. You can use both routes if your workflow benefits from it. The testers and software developers can set up their local workflow to halt tests, and the DevOps team can set up external systems to run the entire test suite. You're not forced to stick to one route.

You can also take advantage of your current automation tools to alleviate some of the setbacks each type of decision has. For instance, you can set up your pipeline to immediately notify the team on failure while continuing to run the test suite, or execute more than one test run at a time.

Get your team together to discuss the current workflow, and you'll eventually get a clear picture of where to apply these tactics. Your team will enjoy an optimized workflow and development process, and it will make for a healthier product.