Why Every Testing Tool Generates Different JUnit XML

When working with automated tests, you may have come across an XML file format called JUnit. Almost every modern test automation framework can produce JUnit XML files, and many continuous integration services use them for displaying additional information about our test runs. Even if you haven’t used them in your existing automated test suites, there's a good chance you can use them to get more visibility into your test results.

For the past few months I’ve been building a tool that ingests JUnit XML files generated by automated test suites in CI systems, allowing teams to collect test data over time and uncover patterns that are easily overlooked. For example, you might not realize that your test suite has slowed down significantly over the past 90 days or uncover that a specific test scenario has been failing more than you thought. JUnit XML files are a perfect choice to collect this information because most of our existing tools have this treasure trove of data available without needing to change a thing.

As I built my service, I was surprised to see how much variation exists between how each testing tool generates JUnit XML files. Even though the format name is the same, the output is wildly different. Some tools wrap everything in a <testsuites> root element, while others don’t use that element at all. Sometimes the test execution time is included, and sometimes it’s nowhere to be found. Having each library I tested out using different elements and attributes was quite frustrating, since there are plenty of cases to account for.

There was no rhyme or reason as to why JUnit XML files generated by a Ruby test framework were unlike the files that a Python test framework spits out. The discrepancies between each tool made me begin digging into the reasons why every framework or library seemingly did whatever it wanted. The answer is both simple and makes a lot of sense: there has never been a JUnit XML file specification in the first place. That's a bit odd considering how widely used these XML files are, despite their differences in output.

Where Did the JUnit XML File Format Come From?

Due to the name, my thought was that the JUnit XML file format came from the JUnit Java testing framework, so that’s where I began to search for the answers to my questions about why there’s no consistency with these XML files. To my shock, it turns out that it doesn’t come from this project. In fact, JUnit doesn’t even produce XML reports itself, and it relies on external build tools and test runners if you want them generated using the framework.

The “JUnit” name for the file format came from the Apache Ant library, which isn’t even a testing tool. Apache Ant is a tool for automating build processes, primarily used for building Java applications. Part of its functionality includes testing these applications. In the early 2000s, the project created a class called XMLJUnitResultFormatter as a way to save test results to disk and have other systems read those results. The JUnit portion of its name came to be due to Apache Ant using JUnit for running tests as a task. The XML structure was chosen since the format is easy for other tools to parse, and it soon became the standard for processing this information.

Why JUnit Files Are So Different Despite the Name

Despite the XMLJUnitResultFormatter files gaining traction, there’s no official spec around their format. This led to other organizations and individuals trying to formalize how to validate these files by creating XML Schema Definitions, or XSD, to define the structure of these XML files produced by the JUnit task in Apache Ant. A few schemas exist, with a few being the most referenced: the Windy Road schema, the Jenkins xUnit plugin schema, and the Maven Surefire schema.

I read through each of these schemas (yes, I’m the person who actually enjoys reading these), and on the surface it seems like they all agree on the same format. However, looking a little deeper, you begin to notice where each definition started to put their spin on things. For example, the Windy Road schema requires that the classname attribute be set for <testcase> elements, while the other schemas have it as an optional setting. Another discrepancy is the <testsuites> root element, which the Windy Road and Jenkins schemas have defined, but the Maven schema does not.

There are plenty of other places where each schema varies on their definitions. The differences in each of these main schemas used for JUnit XML files, plus all the other schemas that might be used, are the reason different tools generate different output. Since there is no “one schema to rule them all” for JUnit XML, the authors of tools that produce these files chose whatever reference felt like the most authoritative for them or simply used the structure of an existing JUnit XML file from another framework.

Although it seems like the creators of the Windy Road, Jenkins xUnit plugin, and the Maven Surefire schemas couldn’t settle on their definitions, there’s enough common ground between them. If you look at different JUnit XML files, the XML structure is similar:

The file begins with a top-level element, either a single <testsuites> element or multiple <testsuite> elements.
Each <testsuite> contains <testcase> elements.
Each <testcase> can have children elements for <failure>, <error>, or <skipped>, depending on the test case result (or none if the test passed).

All three have similar definitions for required attributes on <testsuite> for providing a name (name), the number of tests in the test suite (tests), and the number of failures or errors that occurred in the test run for the suite (failures and errors, respectively). The <testsuite> element also has a few common elements that differ only on whether they’re required or not. For example, the timestamp attribute for the test suite is defined on all three, but only Windy Road requires it.

There are other elements that appear on all three schemas, such as the <system-out> and <system-err> elements that contain the standard output and error streams, which can have important details of a given test run. However, it’s worth noting that they can appear under different elements. For instance, the Maven Surefire and Jenkins xUnit plugin schemas mention they can be placed under individual <testcase> elements, while the Windy Road schema does not. So even when each XSD agrees on which elements and attributes to use, we need to understand where they can appear.

JUnit XML Files Created By Different Tools

To gain a more in-depth understanding of the differences and similarities of the JUnit XML files that the service I’m building can process, I decided to get my hands dirty and see how different testing tools actually produce these files. This isn’t an exhaustive list, but it’s a good mix of different programming languages to get a clearer picture of how each one varies in their output.

Maven Surefire (Java)

Maven Surefire is a plugin for the Apache Maven build tool, responsible for executing unit tests using either JUnit or TestNG. It generates separate JUnit XML files per test class, instead of a single XML file as most other frameworks do. As you might expect, the plugin follows its own schema, starting with setting a single <testsuite> element as its root. Other differences include setting a <properties> element for every Java system property used by Maven, and some attributes will vary according to the test runner.

pytest (Python)

pytest is one of the most popular Python testing frameworks, with built-in support for creating JUnit XML files. The output for this framework wraps everything in a root <testsuites> element and a single <testsuite> child element, where all the <testcase> elements live. pytest follows the Jenkins xUnit plugin schema but does include non-standard attributes like file and line to <testcase> elements, something that’s not defined in any of the common XSDs mentioned.

RSpec (Ruby)

The RSpec testing tool is one of the main testing tools for the Ruby programming language. It doesn’t have JUnit XML file reporting out of the box, but it can easily be added with the RSpec JUnit Formatter gem. The gem looks like it uses the Windy Road schema as a reference, starting the XML files with a single <testsuite> element. Due to how Ruby code works, it sets the class names for the <testcase> elements using a stripped-down variant of the file name where the test code lives, which differs from most other testing tools.

Jest (JavaScript)

There are plenty of JavaScript test tools out there, with Jest being the most used for unit testing. Like Ruby’s RSpec, it doesn’t have JUnit XML reporting by default. The jest-junit reporter adds this functionality, and one of the more interesting aspects of it is how it allows developers and testers to easily configure the output of any attribute. As for its structure, it closely seems to follow the Jenkins schema by setting <testsuites> as its root element and multiple <testsuite> elements as children. At the same time, it does have some uncommon usage, such as setting <properties> elements inside <testcase> which is not in any of the main schemas.

Go

The Go programming language has excellent testing support as part of its standard library, but like many other of the tools mentioned so far, it also needs third-party support for generating JUnit XML files. The most common one I’ve seen used is go-junit-report. The README for the library mentions one of the primary uses is for using it in Jenkins, and the output seems to support the similarly named schema, with a root <testsuites> element, proper support for <failure>, <error> and <skipped> under <testcase> elements, and so on.

PHPUnit (PHP)

PHPUnit is the dominant testing framework for PHP and has built-in JUnit XML reporting support through a command-line option or a configuration file setting. Its output uses <testsuites> as the root element with nested <testsuite> elements that mirror the test class hierarchy. Something I didn’t expect about PHPUnit’s XML output is that it can have multiple levels of <testsuite> elements when the suites are organized into groups, which I haven’t seen in other tools. The framework also includes attributes that aren’t typical, such as assertions on some elements and using class and classname as separate attributes as part of <testcase>.

The Minimum to Get the Most Out of JUnit XML Files

After spending days running tests using different programming languages and the more popular frameworks in use today, it’s clear to me that we’re never going to get a single standard to follow. But there’s some good news if you want to use JUnit XML files for the analysis of your automated test runs. Even with all the varying structures, elements, and attributes, there’s a short list of these that will give you the most information, and in most cases it’s probably all you need to focus on retrieving.

Starting from the top, we can safely ignore any <testsuites> root elements since it’s one of the XML elements that’s not consistently generated in the output. However, the <testsuite> element is, and it has some useful attributes that are worth parsing out. The name, tests, failures and errors attributes are usually set to summarize a given test suite, as defined by the testing tool. The time attribute is optional but contains the total time in seconds to execute all the suite’s test cases.

For each individual <testcase> element, there are three attributes that contain the most important information used for most tools that consume JUnit XML files. The name attribute has the identifier for the test, the classname attribute is used mainly for grouping, and the time attribute tells you how long it took to run that test. While some of these attributes are marked as optional in some of the main schemas, I’ve seen them appear on almost all the JUnit XML files I generated during my investigation.

Test cases will also often have details about their outcome. If a <testcase> element has no child elements, we can safely assume the test passed. However, most tools will include a child element such as <failure> (a failed assertion), <error> (an unexpected exception), or <skipped> (the test didn’t run) if the test didn’t run successfully. Depending on the testing tool, these child elements can contain details about what happened, like a stack trace, which can be shown to help with debugging the issue.

Anything else that a JUnit XML file can include, from <system-out> and <properties> elements to timestamp and hostname attributes, should be taken as optional since they’re rather inconsistent in how the testing tool produces this kind of output. Some tools will display some of these details since they’re nice to have when they’re available, but any tool that processes these XML files shouldn’t rely on the existence of these.

Why Bother with JUnit XML Files?

So, given that JUnit XML files are a bit of a mess when it comes to consistency, why should we care to use them? The answer to that question becomes clear over time. A single JUnit XML file by itself is rather limited in its utility. It’ll tell you what happened on a single test run, but nothing beyond that. But when you begin gathering the information over time and aggregate the data across multiple runs, you’ll begin to see patterns that you may miss in your day-to-day work.

One of the more useful applications of the data gathered by JUnit XML files is detecting flaky tests. It’s difficult to spot a flaky test since they fail randomly and sporadically. Unless you’re paying close attention to every single test run failure that happens, a flaky test can slip by for months without anyone realizing it. Using the data collected from multiple JUnit XML files, you can calculate exactly how unreliable a specific test case is over a rolling window since you’ll have a clear pass/fail history for all your tests.

Another great use for JUnit XML files is to see how your test execution times are trending over time. It’s especially important to keep track of this nowadays with AI changing how software development is done. I’ve seen teams double the size of their automated test suite in days thanks to AI, which means their tests began to take a lot longer to run. Since each test case contains a time attribute (which can be used to get the total for the entire test suite), we can catch gradual slowdowns in our CI test runs before they get out of control due to the overeagerness of these new development methods.

JUnit XML files will have plenty of data to keep your automated test suites healthy and have a clearer picture of the overall quality. Some teams use this information to ensure their tests aren’t degrading over time and to help deal with technical debt and other problems that happen over time. For instance, Slack had an initiative called Project Cornflake to help squash flaky tests. Part of their process was to process XML test results to surface which tests were deemed as inconsistent, and over time they were able to get their test failure rates from 56.76% to 3.85% in just a few months.

Wrap Up

What started for me as an investigation into why JUnit XML files are so inconsistent helped me better understand the reasons behind it. From what I can tell, JUnit XML was never meant to be a universal standard, and the lack of a formal spec for these files can be a challenge for tools that rely on them. Even with the inconsistencies being a pain to deal with at times, there’s enough core information that’s consistent across tooling that will help you understand your automated test suite better and clear evidence instead of gut feelings that most of us rely on (myself included).

The issues that may lurk underneath the surface of your automated tests might not show up in a day, a week, or even in a few months, but they always seem to show up at the worst time. To make the most out of JUnit XML files, the key is to think about your test suite over time instead of individual test runs, focusing on a few key elements and attributes that show you where things might be slipping. The format may be messy, but there’s gold to extract from them if you know where to look, and that’s where I hope this article was helpful.

Check Out My New Service: TestNod

If you’re interested in getting the most out of your automated test runs, take a look at my new service: TestNod.

TestNod collects JUnit XML files from your CI runs and converts that historical data into something useful you can use to improve the health of your automated tests. Over time, TestNod will connect the dots throughout your build history and show you what you’d otherwise miss, like intermittently failing tests, execution times slowly creeping up, and more. The service will be dead-simple to set up on the most popular CI services (GitHub Actions, CircleCI, GitLab, etc.) and can be used with any testing library or framework that outputs JUnit XML files. There's nothing custom to build or maintain.

TestNod is not open to the public yet, but if you want to be one of the first to try it out, sign up for early access at https://testnod.com/.