Years ago, I found myself sitting in a small conference room at the office of the startup I worked at as a software engineer. Along with me were the product's tech lead and QA engineering lead, and we scheduled this meeting to discuss an emerging issue happening on the development side of things.
"Our developers have been complaining about the test suite", said the tech lead after we started our talk. "They're not happy with how long it's taking our continuous integration system to build between their commits and have begun doing things to bypass the consistent test runs, which don't help your efforts." I knew this pain all too well since I experienced it myself during my daily work.
The QA engineering lead was understandably not pleased hearing about this. "We're doing what we've been tasked with—implementing automated tests to help us reduce the time it was taking the team to complete each sprint. We've added a ton of tests, and it's helped the testing team a lot. I don't know what we can do to help development besides stopping."
I sympathized with the QA team as well. I was in this meeting because a few months earlier, I was the one who began the initiative to improve our test automation, primarily because we had no strategy whatsoever. It was a large project, and each sprint took the team over a week to thoroughly test, given the number of bugs that constantly crept up during active development. I integrated with the QA team for a few weeks to get them started, setting up the frameworks and training team members without automation experiences.
A few months into our work, regression testing times took less than two days to complete, and the rate of reported bugs in production dropped by over 50 percent. Everyone was pleased with the results. But as time rolled on and the test suite expanded, we found ourselves at this point where the tests were becoming a burden. We needed to think of a solution to keep providing value to the team, which would yield a better overall product.
The good and the bad of automated testing
Any testers or developers working with a robust automated test suite can tell you how it provides the peace of mind to deliver high-quality products. The higher the test coverage your application tends to have, the more confident the team becomes in shipping quickly. However, it's not all sunshine and rainbows. Having all of those tests can come at the expense of slower development and testing times.
Having a test suite containing thousands of tests introduces the possibility of creating bottlenecks in the software development process. Each new test adds more time to a test run. By itself, the amount of time added by a single automated test is negligible. But what once seems like a trivial increment adds up over time, and before you know it, your entire team faces a sluggish test suite that they absolutely hate to use.
Slow tests are among the most common complaints with teams implementing test automation and probably the primary contributor to abandoned test suites. I've seen it happen over and over again across different organizations. Only a handful of automated tests exist at the start of a new project, and continuous integration builds run in mere seconds. But as the application expands in scope and functionality, it requires additional and more complex test cases, leading to longer test run times.
It's not just the amount of tests that create this issue. You'll inevitably run into flaky tests that require useless re-runs of the test suite. Your applications may introduce new external services that add longer execution times to your existing end-to-end tests. A few unit tests here, an end-to-end test there, some seconds to communicate across networks. Everything starts piling up until the builds take so much time that your test suite begins having a negative effect on the project.
How can you prevent these problems from creeping up, whether due to added tests or updates in architecture? Your team can spend time trying to optimize tests or eliminating useless ones. It's a good practice to do regularly, but I've found it's a lot of effort for a relatively low return on investment. A more efficient way is only to run the tests you need when you need them, and the best way to do that is by tagging your tests.
What are tags in automated tests?
In test automation, a tag is a segment of extra metadata that you can include on an individual test case or a group of tests. For purposes of this article, I'll call them tags, as many test frameworks use this terminology (like Cucumber and Robot Framework). However, this kind of metadata can be called differently depending on your testing tools. For example, pytest calls them markers.
These tags allow you to specify additional information for your tests, which you can use to enhance your test runs. For instance, some testing tools include this metadata with the generated test run report, which any report consumer can use. Some teams use tags to add details related to the application under test for informational purposes, such as the module or section of the app the test covers. However, the most frequent use case for tagging automated tests is to identify tests or test groups to filter them during specific test runs.
Using this strategy, some tests will include an identifier, allowing the testing tool to execute only tests containing that piece of information. Almost all modern testing tools and frameworks have integrated support for running a subset of tests, using tags as explained in this section or with another identifier like the test description. We can use this functionality to our advantage for limiting the scope of our test runs and shrinking down the time it takes to complete.
How to start tagging your tests for more efficient test runs
The best strategy for tagging automated tests depends on your organization's current test suite. Ideally, you'll begin planning your tags before building any tests. When designing new features in an application, development and QA need to determine the core functions or sections of high business value that must always work for the project to succeed. This information can help the teams uncover which tests need to run as frequently as possible to ensure the basics of the application under test work as expected. These tests would be your smoke tests.
In the real world, though, it's likely a team already has an extensive test suite that needs tagging. In this case, you can make a list of the existing tests and determine which ones can serve as your automated smoke tests. A simple way of accomplishing this is to use your test library or framework to list all the existing automated tests. Most libraries and frameworks let you format the output of a test run to print out individual names and descriptions for each test case.
Once you've compiled a list of smoke tests for the application, you can continue segmenting your tests further. For example, you can make a list of sanity tests to verify additional functionality beyond your smoke tests or regression tests to ensure new code doesn't break existing functionality before deployment. Depending on your situation, it's not always necessary to go beyond smoke testing. But it's still a good practice to classify your automated tests early since it becomes more tedious as the test suite grows.
With a list of segmented test cases in hand, the team can begin putting the information to use in the automated test suite. If you're starting to build your tests, include the appropriate tags as you write your test cases. For existing test suites, this list will help you slice up your current test scenarios. This process leads to setting up the software development process to run only a subset of tests at different moments—adding scripts to help developers run a group of tests locally, updating your continuous integration system only to execute smoke tests after each commit, and so on.
How our team solved our test automation woes
I started this article with a story about how the test automation strategy I helped implement at a startup was causing some issues during development. We didn't want to abandon these tests since they yielded clear, tangible benefits. But at the same time, the team was already experiencing some pain, and it wouldn't get better as the test suite evolved along with the product. We had to change our automated testing methods.
The development team worked using a pull request strategy, where everyone worked on separate Git branches and created pull requests to prepare for code review. Our CI service ran all of the product's tests frequently—when making a pull request, when merging the pull request, when deploying to our staging environment, and so on. At this time, the test suite contained a few thousand automated scenarios across unit, integration, and end-to-end tests. It reached a point where it took over thirty minutes to complete the entire build successfully.
Thirty minutes might not seem like a long time, especially in large enterprise projects where a test suite might take hours to complete. But for smaller, agile teams, it completely breaks the development flow since we required tests to run successfully before being approved for merging. The leading source of stress and frustration was when a developer created a new pull request and shifted their headspace to a new task, only to have their flow interrupted by a failed build. It's a major productivity killer.
To help solve this problem, our initial plan of attack was to begin using tags for limiting each test run. The QA engineering lead and I spent a few days classifying our tests into three distinct buckets: smoke tests, sanity tests, and everything else. We tagged each smoke and sanity test in the automated test suite and adjusted our services to run different kinds of builds through the development process.
When a developer created a pull request, we only ran our set of smoke tests instead of the entire test suite. This group of tests consisted of all unit tests, some integration tests, and a single end-to-end test. These tests validated the critical functionality of the application, and our CI service ran these builds in about three minutes. We chose to run all unit tests since they took about one minute for over a thousand tests. The single end-to-end test we included ensured the application didn't have any issues starting up with everything in place.
After a developer approved the pull request and merged the code into the main branch, the CI service ran our list of sanity tests. These tests included our smoke tests plus additional integration and end-to-end tests. In total, these tests took 6-7 minutes to run. Since the sanity tests covered about 75 percent of the application, it gave us a better look at how the newly-merged code interacted with the application.
Our team still did thorough manual and exploratory testing before deploying to production. We had a staging environment where we deployed everyone's work for the current sprint, test it for a few days, and ship out to our customers if everything was okay. Since updating staging didn't occur often—once every two weeks, usually—we decided to run the entire test suite when deploying to staging. It gave the testing team more time to perform manual testing on new features that didn't have automated tests yet and uncover new bugs that our test suite didn't capture with exploratory sessions.
As a final step, we took advantage of our continuous integration service to run the entire suite every night when most of the team was asleep. The CI service would alert us of the results on Slack, which let us know in the morning if there were any issues. Since we didn't run our entire test suite frequently with our new strategy (only when deploying to staging), we still wanted to take full advantage of the robust set of end-to-end tests the QA team built and continued to build. These tests took over almost 25 minutes to run, so running them in the middle of the night removed the issue of long wait times for the development team.
Once we tagged our tests and put them to use, the results of this strategy were positive in the long term. Developers didn't have to wait or get interrupted after a long CI build, so they could quickly move with their subsequent tasks. It helped everyone finish the project sprints quicker and with less stress towards the end of the sprint. The testing team was also able to continue helping build the automated test suite without worrying that the tests would get slower over time. In the end, it made for a better automated test suite and higher quality product.
Pitfalls to watch for when tagging tests
I've seen these strategies help many teams to get more out of their automated tests. However, it's not a "one and done" kind of strategy. Almost any software development and testing work requires constant and consistent attention to keep things running smoothly as the application grows, and using tags as a strategy is no different.
One of the downsides of splitting test runs by tags is that the test suite can still have many failures as you run a larger subset of tests or run the entire suite. When we decided to implement tagging as our first attempt to help rein in test run times, I was hesitant because it felt like we were deferring the responsibility of broken tests to others later. But in reality, our first line of defense—the smoke tests—caught the most egregious bugs quickly, and future test runs didn't fail as often as I expected. The gains in productivity far outweighed the potential of the test suite failing afterward.
An issue that teams implementing tags as one of their strategies is that they only go through the process of tagging once and never touch them again. Doing this leads to inefficient test runs as the application under test evolves. For instance, smoke tests can change over time as new business requirements alter the importance of certain areas of the application, or sanity tests need to expand beyond their current scope. Teams must keep iterating on their automated tests at all times, including their tagging strategy, to make the most out of their work.
Finally, keep in mind that tagging is not a silver bullet. In my story, I only mentioned tagging when solving my team's issues with the automated test suite, but in reality, we did more than just add tags. The QA team eliminated plenty of tests that were no longer useful. The development team helped speed up some inefficient tests either by improving the application's performance or fixing the code in the tests. Tagging was one part of the overall plan of action, but it had the most immediate results for us.
As your software projects grow, it's natural that the number of tests expands along with the new functionality. While a healthy automated test suite is crucial to the overall quality of a product, test automation can slow your team down significantly if you're not careful. One strategy to help reduce this friction is to split up your automated tests by tagging them and running a subset of the entire test suite at different points of the software development process.
Plan the different kinds of tests you want to run throughout your team's workflow, either with an existing automated test suite or ahead of time if you have no tests. With these tags, use your testing tools to filter the tests at any given point to avoid running the entire test suite constantly. Some examples are running smoke tests when creating a new branch in your source code repository and executing lengthy end-to-end tests automatically in the middle of the night. These steps reduce the time needed for developers to wait for tests to pass before proceeding to the next task.
Tagging your tests isn't a silver bullet that will solve all your problems with slow test runs. But when done right, tagging tests strike an excellent balance between providing fast feedback loops and the confidence that there isn't a severe defect after each change in the codebase. The process takes some time upfront but can save you a lot more time and headaches in the future. Every test automation tool out there has some support to label tests to allow teams to filter them per test run, so take advantage of it to help you get the most out of your tests.
Does your team tag their automated tests? If so, how does their strategy help them make better use of their test runs? Share your tips in the comments section below!