Why Your CI Test Suite Keeps Getting Slower

It’s launch day for your app, and the engineering team is putting the finishing touches on that shiny new feature that they’ve been working on for weeks. But in the flurry of the inevitable last-minute tweaks and bug fixes, the output begins to slow to a crawl, and it starts to look like the launch will have to be put on hold. It seems like the developers and testers have run out of steam, but that’s not the case. The reason they’ve slowed down is that the CI test suite has been taking longer and longer to run, and they can’t do much until the build turns green.

What happened? Did the automated tests suddenly get slow from one day to the next? In my experience, that’s rarely the case. Automated test suites rarely slow down all of a sudden. Instead, they get slower and slower as the team works on the project. A new test case, when run by itself, can take just a few milliseconds, but that short amount of time piles up when you add dozens or hundreds of test cases to the project. Before you know it, a test suite that took less than five minutes to run is now taking over 20 minutes, and no one knows how it got to that point.

I’ve run into this problem over and over through my career, and it’s one of my main reasons for building TestNod, a service that collects automated test run data over time and uncovers these patterns that are easy to miss. A common issue that happens in most automated test suites is these gradual slowdowns where test execution times creep up slowly. Most development and QA teams don’t pay much attention to a test suite that takes a second or two longer compared to the previous code commit, so the slower times become the norm. It never feels like an issue until it’s too late.

How Automated Test Suites Slow Over Time

Of course, no one wakes up in the morning and thinks, “You know what? I’m going to make the tests slower today.” Test runs that gradually experience slowdowns over time are barely noticeable, and the times when someone does notice, it appears like a harmless thing. For most organizations, there are a thousand more high-priority tasks to address than figuring out why the CI systems are taking a few seconds longer compared to last week. It’s rare that teams proactively carve out the time needed to make sure things don’t get worse, but it’s usually because no one realizes what’s going on.

Decay in automated test suites, whether it’s in the form of slower test runs or less reliable results, always comes as a result of many small decisions taken over time. Developers and testers go through their day doing their work by writing code for new features, adding new test cases for existing functionality, or increasing coverage. Your CI system will run the automated tests and tell you whether things are still looking good or if there’s a change in expected behavior. It will also have the details about how long the tests took to run, but it won’t tell you, “Hey, your tests have slowly gotten slower over the past 20 test runs.” That kind of feedback loop is missing for most CI setups.

There's also the fact that we, as humans, are excellent at adapting to gradual change. If your CI build took five minutes to run and it suddenly spiked to over ten minutes in a day or two, you'll immediately take notice. However, if the execution time took five minutes last week and now it's five and a half minutes, it's highly likely you won't care about that seemingly insignificant bump. The week after that, it's another 30 seconds added to the test suite run time that you won't pay attention to. Over time, those half-minute increments add up, and that's where most teams end up with test suites that run twice as long as just a few months prior, wondering where things went wrong.

This problem has existed as long as automated testing has, but it's gotten noticeably worse over the past year. AI coding tools let teams churn out test cases faster than ever, and most of those tests get committed without anyone checking whether they're efficient.

Metrics That Actually Matter

If you want to determine if your automated tests have a speed issue, you need to look at the right numbers. Most teams will only pay attention to the total build time in their CI system dashboard or how long it takes the process to complete from commit to test result. They’re good indicators for understanding the development workflow experience, but they don't tell you where the issue is.

A typical CI test run does a lot more than running your tests. I typically work with Ruby on Rails projects, and the typical workflow has to install the current version of Ruby, install dependencies, and set up the database before it runs the test suite. Any one of those steps could be taking the lion’s share of the total time it takes to complete the entire cycle. One of the first places to look is the difference between wall time and test time.

Wall time is the actual real-world time that elapses in a given process, while test time is how long the tests spend running. Distinguishing between these two measurements will help you know where the team’s focus should lie when it comes to speed improvements. For example, if it takes a developer 12 minutes from the moment they make a commit to the time they get the CI build results back (wall time) and the automated tests take 3 minutes to execute (test time), the problem isn’t the test suite. But if the test times are growing at a pace faster than the wall time, you should pay closer attention to the test suite.

Another area worth looking at is how long each individual test takes, specifically the median and p95 (95th percentile) values. The median value tells you what a typical test in the suite takes, which is good to know but can also obscure pain points. With the p95, you can see the outliers in your automated tests. In most projects I’ve worked on, there’s always a small percentage of the tests that account for the bulk of the total execution time. Improving these test cases usually makes the most of your time in speeding things up.

Given the increase of AI-generated test suites, one of the main metrics I’ve been keeping track of lately is the relationship between the number of test cases and the run time. If your test count went up 20% in the past month but the total execution time went up by 50%, it’s a clear sign that the newer tests are not as efficient as they should be (which is common in AI-heavy workflows).

Regardless of the numbers you keep track of, what matters most is the trend over time. A CI test run only offers a single snapshot in time, and a single measurement on its own is almost meaningless because there are plenty of variables in play, from the CI service to bugs in the commit triggering the test run. What you need to pay close attention to is how things have changed in the past few days, weeks, or months. Seeing that your total time increased by 30% or your p95 has gotten significantly worse in the past three weeks tells you more than any single test run can.

What You Can Do About It

Deciding what “slow” is for you

Before jumping into the codebase to start refactoring to cut down the test run time, I’d recommend taking a step back to figure out the definition of slow for you and your team. One of the reasons why automated test suites get to a point where it’s a pain to execute is because no one decided when things were slow. As mentioned earlier, execution slows over time, and we get used to the additional time until one day we realize the tests are hurting instead of helping. It’s because no one drew the proverbial line in the sand.

Deciding on what “too slow” means for you and your team is ideally done way before hitting that threshold. Doing it early gives you and your team accountability for keeping tests running as quickly as deemed acceptable. For example, let’s say your entire end-to-end tests have gone from running in 10 minutes to running in 13 minutes over time. It’s good enough, but everyone’s feeling a bit dragged down by waiting those extra three minutes. If the team decides that anything over 15 minutes is slow before reaching that point, developers and testers will know that they should do their best to reduce, or at the very least maintain, their tests so they don't ever reach that point.

However, if we’re being honest, even with declaring what the slowest acceptable run times are for your tests, we’ll likely forget as we go on with our daily work. That’s why the threshold also needs to become visible, and the best way to do that is to build it into the test suite or CI system. Depending on your library or framework, you can add some configuration to fail a test run if it goes over a specified time limit.

For instance, in Playwright you can configure timeouts globally or per test, and in Python you can use the pytest-timeout library to extend pytest and configure how long to keep a test running before terminating and failing it. While these settings are typically used for making sure your tests don’t hang a CI process, you can also use them for the purpose of keeping your test runs fast and not being caught off guard when the suite slows down.

Taking care of the outliers

In most automated test suites I’ve worked on, there always seem to be a few tests that take a disproportionate amount of the total run time. Most testing libraries and frameworks have built-in ways to profile and display the slowest tests in your automated test suite. If you haven’t done this in your applications, take a moment now to learn how to do this with your current testing tools and check the results.

I can almost guarantee that the top 10 slowest test cases in your automated test suite are taking much more of the test time than you imagined. I did this with the unit tests of a client project I’m working on currently, and out of nearly 900 unit tests, the 10 slowest tests took 10.6% of the total time. That means roughly 1% of the entire test suite is consuming over 10% of the time developers and testers have to wait around for the results. Your results will vary, but I’m certain the ratio is similar to or worse than this.

Whenever I help teams with the automated test suite performance, I typically like to begin with those 10 slowest test cases, even if there are thousands of test cases. The reason for addressing this subset first is that they almost always point to the same issues. Common patterns of slow tests are because they do too much I/O like creating dozens or hundreds of records in a database, making calls to external services, or overloading the test with too much unnecessary setup work. There are also many instances where a test case that can be a smaller and quicker unit or API test is running as a bulky integration or end-to-end test.

Often, addressing these issues in the slowest tests will also improve other tests that weren’t in that initial batch of ten. Once those top 10 slowest tests are handled, re-run the tests with your profiler again and check the next batch of slow tests. It’s likely that they’re similar issues from the first batch, and you’ll begin identifying what’s causing performance issues in your test suite. Do this enough times and you’ll begin to notice that the patterns causing frequent slowness start to disappear over time, including with new test cases.

Beware of AI-generated tests

Thanks to the explosion of LLMs for writing code, I’ve seen a radical increase in the number of automated test cases added to the codebase. Developers that would barely write a single unit test are now opening pull requests with complete code coverage of the new feature they built or bug they fixed. More tests are a good thing, right? I’d argue that it’s the opposite with AI-generated code.

Of course, having test coverage for critical parts of the system is essential to keep the application running smoothly. But it’s never a good idea to focus on having automated tests for various reasons, from increased complexity in test setup to more fragility as a system evolves. Unfortunately, AI tools like Claude Code and Gemini often don’t have enough context to know what to test, so they usually err on the side of writing too much code, and we end up with bloated test suites if we don’t pay attention to the output.

Besides the issue of having too many automated tests, AI-generated tests are prone to focus on making tests pass, entirely ignoring performance issues. As an example, another project I worked on recently suddenly experienced a 2x to 4x increase in test run times. I didn’t notice the slowdown until I worked on a feature a few days later and saw the test suite take over five minutes to run locally when before it took less than two minutes. When I traced the offending commit, it was due to an AI-generated test case from another developer. The test switched the database cleaning strategy (from transactions to table truncation) because the AI couldn’t “figure out” how to make the test pass without this. This single-line configuration change led to the massive slowdown across the test suite. Fixing the test to not need the database strategy switch resolved the slowdown and test run times went back to normal.

I’m not advocating to not use AI for test generation, as it can be a multiplier for most developers and testers. But the problem is that most people who use these tools tend to accept the output blindly without a second thought, and it’s led to test suites full of unnecessarily inefficient test cases. Until AI tooling gets better at this, the only solution is to pay close attention to whatever your preferred LLM generates before committing it to the codebase. This can take shape whether you’re the one doing the AI test generation or double-checking someone else’s pull request as part of a regular code review. The few extra minutes it takes to review AI output will save you much more time down the road.

Treat symptoms, but know why you're doing it

There’s a handful of techniques that most teams go for when their test suites start dragging along and they want to cut down the wall time of the test suite. Some go-to strategies are using parallel workers to run multiple tests simultaneously, adding more caching of dependencies and state, running subsets of tests, or bumping up the size of their CI systems. These approaches are widely used because they work well and show real improvements that you can ship out as soon as possible.

The problem is that most teams reach for these first, before looking at why the tests are slow in the first place. None of these actions make your test suite faster, even though the result is a shorter test run. The same bad practices stay scattered throughout the codebase, and there's only so much parallelization or scaling of your CI workers you can do before you hit a roadblock.

The order in which you apply these techniques matters. If 30 of your tests each spin up a full database and hit three external APIs, parallelizing them means you're now spinning up 30 databases concurrently and hitting those APIs 30 times at once. Now you've traded a slow test suite for a flaky one that eats up your CI runner's resources. Running subsets of tests has a similar trap. It works until a change breaks something the subset didn't cover, and no one notices until it hits production. The underlying flaw didn't disappear. It's still lurking under the covers.

Fix the tests first, then apply parallelization, caching, and other strategies to what's left. Working in that order will make each of these techniques compound. If you begin in reverse order, they’ll just paper over problems until the papering stops working. If you find yourself adding more parallelization or spending too much time figuring out which subset of tests to run effectively, it's a sign to stop and address the real issue instead of slapping another stopgap method on top.

How TestNod Helps You Stay Ahead of Slowdowns

All of this advice assumes someone is keeping track of the automated test suite over time. But if most teams are like the ones I’ve worked with over my career, the reality is that almost no one is paying attention. Tracking trends across weeks or months means either building internal tooling or manually pulling data from your CI provider. Most teams never get around to it, which is how we end up with 20-minute test suites in the first place.

That's exactly what I built TestNod to do. TestNod collects JUnit XML files from your CI runs and tracks their data for every test case across your build history. These files often contain a time attribute that tells you exactly how long it took a test case or test suite to run. Instead of looking at a single test run in isolation, TestNod shows you how your suite’s total execution time is trending over weeks and months. When your test suite begins to slow down, TestNod alerts you so you’ll see it long before it becomes a pain to deal with in your day-to-day workflow.

The data for tracking all of this information already exists in the JUnit XML files that most testing frameworks produce and most CI systems can store. TestNod makes that data visible and actionable without requiring you to build or maintain any custom tooling. Setup takes a few minutes on the most popular CI services, including GitHub Actions, CircleCI, and GitLab, and it works with any testing library or framework that outputs JUnit XML.

TestNod is not open to the public yet, but if you want to be one of the first to try it out, sign up for early access at https://testnod.com/.

Why Your CI Test Suite Keeps Getting Slower

Dennis Martinez

Dennis Martinez

How Automated Test Suites Slow Over Time

Metrics That Actually Matter

What You Can Do About It

Deciding what “slow” is for you

Taking care of the outliers

Beware of AI-generated tests

Treat symptoms, but know why you're doing it

How TestNod Helps You Stay Ahead of Slowdowns

Ship code confidently, from development to deployment.

Why Every Testing Tool Generates Different JUnit XML

3 Signs of Successful Software Testing Automation

Everything is a Trade-off

How Junior Engineers Can Build Real Skills While Using AI

Why Every Testing Tool Generates Different JUnit XML