End-to-End Testing in Microservices: Where's the Real Issue?
"The problem is not in using end-to-end tests, but in believing that they are the only strategy for quality in a microservices architecture."
End-to-end testing has been widely adopted as a crucial strategy for ensuring quality in complex systems, especially in microservices architectures. The promise of validating complete business flows and identifying bugs before they reach production attracts many companies to heavily invest in this approach. However, as these systems grow in complexity and the interdependencies between services become more intricate, E2E testing starts to reveal its limitations.
In this post, we’ll discuss the problems that arise from relying exclusively on E2E tests in microservices environments. We’ll explore how prolonged wait times for feedback, test instability, high maintenance costs, and debugging challenges are compromising the agility and efficiency of development teams. Additionally, we’ll see how the slowdown in value delivery, the low efficiency in capturing bugs, and the persistence of errors in production highlight the need to rethink this strategy.
If you’ve ever wondered why your test pipeline is slowing down your deliveries or why bugs are still slipping into production, this article is for you.
The Central Purpose of Software Development
When we discuss quality in software engineering, we often limit our understanding to the absence of bugs or adherence to technical specifications. However, this narrow view neglects fundamental aspects that make a digital product truly valuable. Quality is not just about technical correctness; it also involves the software's ability to effectively solve real problems, provide a positive user experience, and meet business expectations.
The word "quality" has its roots in the Latin "qualitas," derived from "qualis," meaning "of what kind" or "of what nature." Originally, it referred to the properties or characteristics that define the essence of something. Over time, the concept of quality has evolved to represent not only the intrinsic properties of a product but also its ability to generate value and meet expectations.
While many digital products may have the same ultimate solution goal—such as solving a specific problem for a group of users or meeting a business need — the quality of each can vary significantly. The quality of a digital product is not determined solely by its ability to fulfill the final purpose but also by how it achieves that goal. Different products may solve the same problem, but the way they do it—the experience they provide to the user, their efficiency, reliability, aesthetics, and other factors—can differentiate one product from another.
For a small portion (I believe) of software engineers, business analysts, and even managers, quality can be mistakenly seen as an obstacle to agility. This is because the process of ensuring quality is, by nature, rigorous and meticulous. It involves not only testing and validation but also constant reviews, adjustments, and sometimes rework. This process can seem, at first glance, like something that delays the development and delivery of new features.
But this view overlooks a crucial point: quality is not just about testing to find flaws; it’s about ensuring that the delivered product truly meets the expectations and needs of users and the business. A system that is launched quickly but fails to deliver on its promise is not delivering value—and the cost of fixing these flaws after launch is significantly higher than the time invested in preventing them during development.
Think of a car. When an automaker launches a new model, the vehicle undergoes a series of rigorous tests before hitting the market—crash tests, safety tests, performance tests in various conditions. These tests are not conducted to delay the car’s release but to ensure that once in the hands of consumers, the car is safe, reliable, and capable of meeting the expectations of its future owners.
However, when a manufacturer fails to ensure quality before launch, the consequences can be severe. A problem identified after the car is already in circulation usually results in a recall—a complicated process that involves locating all affected vehicles, performing repairs or replacements, and managing customer dissatisfaction. The typical steps of a recall include:
Identifying the Problem: Detecting the defect that affects the vehicle's safety or functionality.
Communicating with Owners: Informing owners about the issue and the necessary steps to correct it.
Performing Repairs: Organizing the repairs or replacements, which may involve the logistics of bringing the cars to dealerships or authorized repair shops.
Managing Costs: Absorbing the costs of repair, logistics, and possibly compensating consumers.
Restoring Trust: Working to regain the trust of affected consumers, which may include addressing dissatisfaction and concerns that the vehicle could have other issues.
For the owners of vehicles involved in a recall, the process can generate a mix of frustration and concern. In addition to the inconvenience of having to bring the car in for repair, there’s the fear that other problems might arise, and confidence in the brand may be shaken. For the automaker, the recall is not only a significant financial burden but also a blow to its reputation.
Similarly, in software engineering, testing is an essential part of ensuring that the final product is solid, reliable, and valuable. If we fail to ensure this quality before launch, the consequences—such as bugs in production, security failures, and loss of user trust—can be equivalent to the impact of a recall in the automotive industry.
James Bach, a renowned software engineer and testing expert, defines software testing as "a process of questioning a product in order to evaluate its quality." In other words, testing is the tool we use to challenge the software, putting it to the test in different situations, to ensure it can handle real-world challenges.
When a new feature is developed, it must go through its own "crash tests." We should test it not only under ideal conditions but also in adverse situations—high user load, network failures, integration with other systems—to ensure it is robust enough to withstand these challenges. These tests should be seen not as a delay but as an investment in the product’s quality and, by extension, the business’s success.
The true purpose of testing in software engineering, then, is to ensure that each delivered feature is ready to face the real world. Testing doesn’t exist to hinder or slow down development but to ensure that when a feature reaches the end user, it will meet expectations and be capable of delivering the expected value. When we understand testing in this way, it becomes clear that it is an ally, not an enemy, of agility and product success.
The Importance of the Discovery Phase in Product Quality
Before any line of code is written, there is a crucial phase in software development: the discovery phase. During this phase, business specialists, product owners, and software engineers come together to explore and deeply understand the problem that needs to be solved. This is where the solution's objective is clearly defined, and where it is determined how this solution should create real value for the business and users.
A good analogy for this phase is the construction of a building. Imagine a team of engineers constructing a skyscraper. Before any brick is laid, intense planning and engineering work is necessary to ensure that the foundation is solid enough to support the structure. If the foundation is poorly designed or constructed, the entire building will be at risk, no matter how well the upper floors are built. Similarly, in software development, the discovery phase is like that foundation: it’s where the building (or software) is clearly defined in terms of what it should be and how it should function.
A clear understanding of the problem and objectives is essential for the final product to be of high quality. Without a solid foundation of well-defined requirements based on research and evidence, any solution developed runs the risk of being misdirected, failing to meet the real needs of the business or users. Quality, therefore, begins long before the testing phase; it has its roots in the quality of thought and analysis during the discovery phase.
In this phase, it is vital for software engineers to take an active role, closely collaborating with business specialists and product owners. They bring the technical perspective to the table, helping to shape the feasibility of proposed solutions and ensuring that requirements are clear, realistic, and implementable. Without this close collaboration, the risk of misunderstandings and misaligned expectations increases significantly, which can compromise the quality of the final product.
When the discovery phase is well conducted, with everyone involved having a deep understanding of the problem and the solution's objectives, the software development is built on a solid foundation. This, in turn, facilitates the creation of a product that not only fulfills its function but does so in a way that adds real and sustainable value to the business and users.
Therefore, before thinking about testing or agile deliveries, it’s crucial to remember that quality begins with a well-structured discovery phase. Just as a car is exhaustively tested to ensure its safety and performance, software must be solidly planned and validated from the outset, ensuring that each feature adds value in an efficient and relevant way for the business and the end user.
And Where Do End-to-End Tests Fit In?
With a solid foundation established during the discovery phase, software development can proceed with a clear sense of direction and purpose. But even with all the efforts to ensure that the planning is sound, it is still necessary to validate whether the developed solution truly delivers what was proposed. This is where end-to-end (E2E) testing comes into play.
E2E tests aim to verify that all components of a system work together as expected, ensuring that business processes function correctly from start to finish. In a microservices system, these tests can be essential for identifying issues that arise only when all services are operating together, simulating the end-user experience.
However, while E2E tests are crucial, they are not without challenges. When implemented improperly or excessively, they can become a significant bottleneck in the software development and delivery process. The need to orchestrate multiple services and systems can result in slow, unstable, and difficult-to-maintain tests. Additionally, in complex architectures, the time required to obtain feedback can be long, which in turn can compromise the agility of teams.
Another critical point is that while E2E tests are designed to cover complete business flows, they may fail to pinpoint the exact cause of an issue. When an E2E test fails, it can be difficult to determine whether the problem lies within a specific service, in the communication between services, or somewhere else in the system. This can lead to a lengthy and frustrating debugging process, often resulting in rework and a general slowdown in the delivery process.
Therefore, while E2E tests play an important role in validating complex systems, they must be used judiciously. Their implementation should be balanced with other forms of testing, such as unit and integration tests, and contract tests, which can provide quicker and more isolated feedback on specific components of the system. When combined effectively, these different types of tests can offer comprehensive coverage without compromising agility. Let’s dive deeper into this now.
Understanding the Key Considerations in End-to-End Testing
Tests are the tools we use to validate the observable behavior of software, that is, whether the solution we’ve developed truly fulfills its intended purpose. This includes testing whether the functionality meets business requirements, offers a satisfactory user experience, and is technically sound. To ensure that the observable behavior is correct, we must dedicate as much quality to the tests as we do to the software itself. This applies to all types of tests, whether they are unit tests, acceptance tests, contract tests, or end-to-end tests.
Each testing strategy has its advantages and disadvantages, and it’s essential to apply them in the right measure. Unit tests, for example, offer quick feedback and isolate problems in specific components, but they don’t capture integration failures. Acceptance tests validate whether the software meets business needs, but they may not cover all technical nuances. Contract tests ensure that communication between services is correct, but they don’t guarantee the system's complete behavior. And while end-to-end tests are fundamental for validating complete flows, they can become pitfalls if used improperly.
We want to avoid these pitfalls to maintain agility in delivering value to the business. Therefore, it’s important to understand that while end-to-end tests play an important role, they are not the definitive strategy and must be used with caution.
In this context, let’s focus on some common pitfalls we might face when blindly believing that end-to-end tests are the ultimate solution. We will examine eight critical points that require special attention:
Prolonged Wait for Feedback: As the end-to-end test suite grows, the time required for the complete execution of tests also increases. This means that engineers have to wait longer to receive feedback on the changes they’ve made, which can delay development and reduce team efficiency.
Lack of Confidence in Results: Tests that fail inconsistently, also known as "flaky tests," are a major challenge. They can yield different results for the same code, leading to the need to rerun the tests to confirm whether there was actually a problem. This creates a lack of confidence in the test suite and can consume valuable time and resources.
High Maintenance Costs: Maintaining a stable and consistent test environment is a difficult task, especially in systems that require frequent manual configurations. Any manual changes can corrupt test data or environmental conditions, making test maintenance expensive and labor-intensive.
Difficulty in Identifying the Cause of Failures: In environments where asynchronous communication is prevalent, debugging test failures can become extremely complex. It’s often difficult to connect a failure to its real cause, such as a message not being sent to a queue, resulting in unexpected behaviors in other parts of the system. This complicates the process of finding and fixing the issue.
Delayed Value Delivery: When code commits are piled up waiting to pass through the end-to-end test suite, the deployment process can be significantly delayed. This delay in continuous integration can reduce the frequency of deliveries, slowing down the delivery of new features or fixes to the customer.
Low Efficiency in Identifying Bugs: Despite extensively covering and interacting with the system, end-to-end tests are not always effective in detecting behaviors that are not aligned with business expectations. In some cases, even after many executions, the number of detected failures can be disproportionately low compared to the effort and time invested, raising questions about the efficiency of this approach.
In the following sections, we will explore each of these points in depth, discussing the precautions we must take in distributed environments with complex integrations, constant asynchronous communication, and a wide variety of business rules.
The reason we invest time in testing is not just to ensure that the code works, but to verify that we are aligned with business objectives and that the software's behavior meets the expectations of both business analysts and end users.
Prolonged Wait for Feedback: The Impact on Engineering Teams
As end-to-end (E2E) test suites expand, the time required for their complete execution grows exponentially. This increase in execution time is not just a technical issue; it profoundly affects the working dynamics of engineering teams, product managers, product owners, executives, and QA professionals. To better understand the impact, we need to consider what’s at stake: time, value, and agility.
What’s Involved?
In software development, quick feedback is crucial. E2E tests are designed to validate the system's behavior as a whole, ensuring that all components interact correctly. However, as the complexity of the system increases, more tests are added to the suite, which naturally leads to an increase in execution time.
Chris Richardson, author of Microservices Patterns, highlights that in microservices architectures, the need to validate integrations between distinct services can generate a massive number of test cases. Each new service introduced into the architecture increases the need for end-to-end tests to ensure it works correctly with the other services. This can result in an E2E test suite that, despite its coverage, becomes a bottleneck, delaying deployment and undermining one of the main benefits of microservices: agility.
Richardson comments that “end-to-end tests are often a bottleneck that reduces deployment frequency and negates the purpose of using microservices. You have a monolith—the deployment unit—that is composed of services. Or, in other words, a distributed monolith.” This critical view underscores how the misuse or overuse of E2E tests can transform an architecture that should be agile and modular into a disguised monolith.
Martin Fowler, one of the leading advocates of agile development and continuous testing, also points out that “tests are the anchor of quality, but that anchor can become a burden if not managed properly.” When end-to-end tests are poorly scaled or overly relied upon, they can create exactly the type of bottleneck that Richardson mentions, creating a system where speed and flexibility are sacrificed.
Fowler goes further by suggesting that an unbalanced approach to testing can inflate the number of end-to-end tests to a point where they become a burden rather than a benefit. He states that the higher you go up the testing pyramid, the fewer tests you should have. This means that E2E tests, being high-level, should be used sparingly.
Fowler warns that an excessive set of E2E tests may indicate bloated test flows, resulting in a test suite that is slow, fragile, and expensive to maintain.
This continuous expansion of the test suite creates a cycle where the feedback time lengthens, resulting in a prolonged wait for engineers who are waiting for confirmation that their changes have not introduced new issues. This delay directly impacts the team's workflow and can have several negative repercussions.
The Challenge for Software Engineers
For software engineers, time is a crucial resource. When a change is implemented, the ideal scenario is to receive feedback almost immediately. This allows for quick adjustments while the change is still fresh in the developer's mind. However, when the end-to-end test suite takes hours to execute, this feedback is delayed, forcing engineers to shift their focus to other tasks while waiting for the results. This situation introduces several challenges:
Loss of Context: When feedback is slow, engineers may lose the context of the change made. This means that if an issue is detected, they will need to spend time recalling what was changed and why, which decreases efficiency.
Inefficient Multitasking: To get around the wait, engineers might try to work on other tasks while awaiting feedback, but this can lead to inefficient multitasking. Frequently, switching focus between complex tasks diminishes the quality of work and increases the risk of errors.
Delay in Bug Fixing: If a failure is detected after a long wait, immediate correction becomes difficult, resulting in more time spent identifying and fixing the issue, which can delay development.
Impact on Product Managers and Product Owners
For Product Managers and Product Owners, agility in delivering new features and fixes is crucial for maintaining competitiveness and meeting market demands. When end-to-end test feedback is slow, the development and deployment cycle is extended. This can have several consequences:
Reduced Agility: The ability to respond quickly to market changes or new user insights is diminished. Features take longer to reach the market, which can result in missed opportunities.
Compromised Planning: A prolonged feedback cycle makes it more difficult to plan the next steps in development, as time estimates become less accurate. This can affect communication with stakeholders and jeopardize deadlines.
Decisions Based on Outdated Data: The delay in feedback may mean that decisions are being made based on data and situations that have already changed, leading to less informed choices.
The Challenge for QA’s and Executives
QA professionals, responsible for ensuring the final quality of the product, face similar challenges. When feedback is slow, they have less time to identify and report issues before the next iteration begins. This can lead to reduced testing effectiveness and a higher risk of undetected issues in production.
Executives, in turn, are always focused on efficiency and return on investment (ROI). A slow feedback cycle can mean higher development costs and lower team efficiency, ultimately impacting the company’s financial results. Additionally, delays in the development cycle can affect the company's market reputation, especially if competitors are more agile.
Losing Time, Value, and Agility
Time lost waiting for end-to-end test feedback is time that could be spent creating new features, improving the product, or fixing issues. This reduces the engineering team’s agility and diminishes the team’s ability to deliver value quickly. Agile development is based on the premise of short feedback cycles to enable rapid and frequent iterations. When feedback is delayed, the ability to iterate quickly is compromised, which can result in a less competitive product and one less adapted to market needs.
Quoting again the author and software consultant, Chris Richardson, he warns us:
“End-to-end tests can become a significant bottleneck, especially where interdependence between services increases the complexity of scenarios and journeys.”
This complexity, if not carefully managed, can turn a test suite designed to ensure quality into a barrier that impedes agility and the continuous delivery of value.
Richardson further points out that many organizations resort to end-to-end tests because they find that service-level tests are not sufficient to guarantee application correctness. However, he emphasizes that this is often a symptom of a flawed architecture, where services are not designed to be deployable independently, resulting in the need for an E2E suite that, in practice, creates a "distributed monolith". He suggests that instead of blindly relying on E2E tests, the solution is to fix the architecture, reduce the number of services, and ensure that each service can be deployed independently.
Martin Fowler, in his writings on continuous integration, also emphasizes the importance of "optimizing for fast feedback." He argues that without fast feedback, the team’s ability to react to issues and iterate effectively is severely limited. He advocates for the use of a testing pyramid, where most tests are unit tests — fast and cheap — while E2E tests are minimized to avoid these bottlenecks.
The concept of fast feedback is not new, but its importance cannot be overstated. In an environment where decisions need to be made quickly and changes need to be implemented with agility, the slowness of end-to-end tests can paralyze the development process. When feedback is fast, engineers can act immediately, fix issues, adjust functionalities, and keep moving forward. This keeps the workflow moving and allows the product to evolve continuously, remaining relevant and competitive.
Sam Newman's Perspective on End-to-End Testing
Consultant and author Sam Newman also extensively discusses the importance of feedback cycles in testing. He argues that in an agile development environment, where rapid iteration is essential, the feedback time for tests should be as short as possible. This is crucial to enable developers to identify and fix issues immediately without having to wait long periods for test results.
For Newman, long feedback cycles, such as those often associated with E2E tests, can be detrimental to the development process. When developers have to wait a long time to know whether their changes were successful, it impacts the team's productivity and morale. Moreover, prolonged feedback cycles can result in a larger amount of code being changed before any issues are detected, making debugging and error correction even more complicated.
Newman advocates for the use of a "testing pyramid," where the base is composed of unit tests, which are fast and provide almost instant feedback. Above the unit tests, he places integration tests, which validate interactions between components within a service. At the top of the pyramid are the E2E tests, which are slower and more expensive to execute but still have their place in validating the system's critical flows. This structure allows most feedback to be obtained quickly, maintaining agility in development.
Sam Newman, in his book Building Microservices, offers a critical and pragmatic view on the use of end-to-end (E2E) tests in microservices architectures. He acknowledges that while E2E tests have their value, they should be applied with caution due to the challenges they present, especially in distributed environments.
The problem with E2E tests, according to Newman, is that they tend to introduce coupling between services, which can hinder their independence. As a result, E2E tests can become a bottleneck in the development cycle, delaying the deployment of new features and fixes.
Newman shares a practical experience to illustrate this point:
“I worked on a monolithic system, for example, where we had 4,000 unit tests, 1,000 service tests, and 60 end-to-end tests. We decided that, from a feedback perspective, we had too many service and end-to-end tests (the latter of which were the worst offenders in impacting feedback loops), so we worked hard to replace the test coverage with tests of smaller scope."
He highlights that the end-to-end tests, despite being only 60 in number, were identified as the main culprits for delaying feedback cycles. The impact of these tests was so significant that the team opted to reduce the number of large-scope tests and replace them with smaller-scope tests, such as unit and integration tests.
Newman also warns of a common anti-pattern, which he calls the "test snow cone" or "inverted pyramid," where there are few or no small-scope tests, and all coverage is done by large-scope tests. He notes that this approach results in extremely slow test executions and long feedback cycles, which severely compromises the efficiency of the development team.
He continues:
"These projects often have glacially slow test runs and very long feedback cycles. If these tests are run as part of continuous integration, you won't get many builds, and the nature of the build times means that the build can remain broken for a long period when something breaks.”
Newman's observation reinforces the need to avoid excessive reliance on end-to-end tests, especially in microservices environments. When large-scope tests dominate the testing strategy, development agility is compromised. The time required to run these tests can become so long that when something goes wrong, the process of fixing and rerunning them becomes ineffective, resulting in long periods of downtime.
Lack of Confidence in Results: The Impact of "Flaky Tests" on Product Quality
One of the biggest challenges development teams face when implementing test suites, especially end-to-end tests, is the inconsistency of results, commonly referred to as "flaky tests." These tests, which fail unpredictably and without an apparent cause, represent a significant problem. They can produce different results for the same code in different runs, creating uncertainties and undermining confidence in the test suite.
The Nature of Unstable Tests
"Flaky tests" are a headache for any development team, and their impact is amplified in complex systems, such as microservices architectures. These tests may pass in one run and fail in the next, even without any changes to the code. They fail due to factors such as unstable external dependencies, race conditions, or synchronization issues. So, what does the engineering community say about this type of problem?
Sam Newman warns that "flaky tests" can be an indication that the test is poorly designed or that the system being tested is inherently fragile. When a team cannot trust the results of their tests, it diminishes the effectiveness of tests as a tool for ensuring software quality. The time and resources spent investigating test failures that do not reflect real problems in the code are wasted, and this can significantly slow down development. Below are some of his words:
"Unstable tests are the enemy. When they fail, they don't tell us much. We rerun our CI builds hoping they'll pass later, only to see check-ins pile up, and suddenly we find ourselves with a load of broken functionality.
When we detect unstable tests, it is essential that we do our best to remove them. Otherwise, we start losing faith in a test suite that 'always fails like that'. A test suite with unstable tests can fall victim to what Diane Vaughan calls the normalization of deviance—the idea that, over time, we can become so accustomed to things being wrong that we start to accept them as normal and not as a problem." - Sam Newman
Quoting Martin Fowler again, in his article, "Eradicating Non-Determinism in Tests", he describes "flaky tests" as one of the biggest enemies of an efficient test suite. He asserts that non-deterministic tests — those that produce inconsistent results — must be identified and fixed as quickly as possible. Fowler also discusses how these tests can arise from various sources, such as external dependencies (e.g., third-party services), failures in the test environment configuration, or even improper timing synchronization. He suggests that the presence of "flaky tests" is a warning sign, indicating that something is wrong with the testing strategy or the application itself.
Operational Cost and Agility
My view on this topic is as follows: there is no room for "flaky tests" in any testing strategy, especially in the more costly ones at the top of the testing pyramid, such as E2E tests. The logic is simple: unit and integration tests can be executed quickly and adjusted relatively easily. However, E2E tests are expensive in terms of execution time, computational resources, and maintenance. If an E2E test becomes "flaky," the operational cost of dealing with it is exponentially higher than it would be for a unit or integration test.
This cost is not limited to just time and financial resources; it also directly affects the team's agility. When a team cannot trust the test results, confidence in the continuous delivery pipeline is eroded. Engineers begin to question every failure, rerunning tests unnecessarily and delaying deliveries. This not only slows down the speed at which new features can be implemented but also harms team morale, leading to frustration and demotivation.
Impact on Final Product Quality
From a product quality perspective, the presence of "flaky tests" is a serious risk. When a test fails inconsistently, it can mask real issues in the code. A test that occasionally passes may allow a significant bug to go unnoticed, resulting in critical failures in production. Imagine a scenario where a test meant to validate the integrity of a financial transaction fails intermittently due to a "flaky test." If this behavior reaches production, it could cause substantial damage, both financially and reputationally.
Moreover, confidence in the test suite is crucial for the refactoring process and continuous code improvement. If developers cannot trust that the tests will consistently catch issues, they may become reluctant to make necessary changes to the code, fearing that new bugs will be introduced and not caught by the tests.
The Reliability of End-to-End Tests is Fundamental to the Success of Any Software Project. When that feedback is compromised, the entire development cycle is affected.
To address the problem of "flaky tests," it is essential to adopt rigorous testing practices and ensure that each test has a clear purpose and is executed under controlled conditions. This may involve eliminating external dependencies, using mocks and stubs to isolate the code being tested, and continuously reviewing tests to ensure they remain relevant and effective.
Eradicating "Flaky Tests" is a Priority
Eradicating "flaky tests" should be a priority for any development team striving to maintain quality and agility. As you pointed out, there is no room for these tests in an effective quality assurance strategy, especially in tests at the top of the pyramid, which are more costly. Trust in test results is the foundation upon which trust in the code and final product is built. Without that trust, the entire development process is compromised, resulting in inferior products, longer delivery cycles, and a less efficient and motivated team.
The adoption of more rigorous testing practices, combined with continuous review and the eradication of "flaky tests," will allow development teams to maintain the integrity of their tests and continue delivering high-quality software in an agile and efficient manner.
High Maintenance Costs: The Challenge of Sustaining an E2E Environment in Microservices
Maintaining a stable and consistent testing environment is a daunting task in any system, but this complexity is amplified in microservices ecosystems. In an architecture where each service is independent but interconnected, maintaining tests becomes a monumental task. This effort involves not only manual work and technical resources but also carries significant costs in terms of time, quality, and potentially the success of the product as a whole.
Complexity in Microservices Ecosystems
In the context of a distributed architecture, each service can have its own dependencies, configurations, and requirements. These services often interact with multiple databases, external APIs, and other services within the same ecosystem. Maintaining a stable testing environment for each service and ensuring that all dependencies are correctly configured and synchronized can be a real challenge.
For example, imagine a service that depends on three other services to function correctly. If any of these dependent services is down, has inconsistent data, or is running an outdated version, this can cause test failures that do not reflect real issues in the service being tested. Even worse, manual changes in the testing environment, such as temporary adjustments to "fix" issues, can introduce inconsistencies that are difficult to track and correct.
This scenario is not uncommon in microservices, where the distributed nature of the architecture complicates quality control even further. Every change in a service can have cascading effects on others, making debugging and verification extremely difficult. The combination of heterogeneous environments and the need for specific configurations increases the maintenance cost and makes it harder to identify real issues.
Maintenance Costs: Time, Value, and Quality
The cost of maintaining a test suite is not limited to just the time and financial resources needed to keep it up. It also includes the impact on the team's ability to deliver value continuously and efficiently. When tests require frequent maintenance, the team can find itself trapped in an endless cycle of adjustments and fixes, consuming time that could be dedicated to implementing new features or improvements.
Even more concerning, the constant need for maintenance can lead to neglect in the quality of the tests. When engineers and QA professionals are overwhelmed with the task of maintaining the testing environment, there may be a tendency to "do whatever it takes" to pass the tests, even if it means compromising quality. This approach can result in critical features not being adequately tested, increasing the risk of production issues.
These failures can be devastating. Imagine a payment feature that was released with a bug because the tests were neglected. This not only directly impacts user trust but can also result in significant financial losses and damage to the company's reputation.
The QA Perspective
For QA professionals, the challenge of maintaining a stable testing environment is even more pronounced. They are often the guardians of quality, responsible for ensuring that the final product meets the required standards. In the book "The Art of Software Testing" by Glenford Myers, the importance of stable and well-maintained testing environments is highlighted as crucial for ensuring accurate and reliable results. Myers emphasizes that without a controlled and consistent environment, test results can be misleading, leading to a false sense of security.
The author does not directly discuss microservices, but his principles on the importance of maintenance and quality in testing are highly relevant to this architecture. In an environment where systems are distributed and each service can be developed and deployed independently, the challenge of maintaining a stable and reliable testing environment becomes even more critical.
Maintenance Costs: Myers notes that the cost of maintaining a test suite can be high, but the cost of not maintaining it is even higher. In microservices, this translates to ensuring that each service can be tested in isolation, without relying on manual configurations that might introduce errors or inconsistencies.
Testing Environment Quality: The reliability of the testing environment is crucial to the quality of the software. This means that testing environments need to be configured automatically and consistently, avoiding manual interventions that might corrupt test data or environmental conditions.
Impact on Quality: As Myers points out, the quality of the tests is directly related to the quality of the final product. Therefore, ensuring the integrity of the tests is essential to avoid critical issues in production.
QA professionals must remain constantly vigilant to ensure that changes in the testing environment do not corrupt the test data or introduce new variables that were not considered. The manual work involved in setting up and maintaining these environments can be exhausting, especially in large organizations where multiple services are constantly evolving.
Impact on Product Managers and Product Owners
In software engineering, everything involves costs. While Product Managers and Product Owners may not be directly involved in test maintenance, they are deeply affected by these challenges. The quality of the tests has a direct impact on the ability to deliver new features with confidence. When the testing environment is unstable, delivery timelines become uncertain, and the ability to respond quickly to changes in market needs is drastically reduced.
Everyone involved in a software project relies on fast and reliable feedback cycles to plan and prioritize the next steps in development. If the tests are difficult to maintain and generate instability, timelines stretch, leading to frustration among teams. This continuous strain, caused by constant re-planning and adjustments, can lead to decreased team morale and a loss of confidence in the ability to deliver value efficiently.
Moreover, the pressure to meet deadlines can lead to rushed decisions, where quality is sacrificed in the name of "agility." This creates a vicious cycle: production issues lead to more maintenance and adjustments, consuming even more time and resources. At this point, the perception of agility can become an illusion, where the time gained in quick deliveries is lost in resolving issues that could have been avoided with a more robust testing environment.
The Inevitable Cost of Maintenance
Maintaining quality and stability in microservices environments is a task that involves significant costs, and ignoring these costs can lead to even more serious consequences. These costs are not just technical; they permeate the entire organization. From the engineers and QA professionals struggling to keep the tests running to the Product Managers and Product Owners who have to deal with the consequences of delays and compromised quality, everyone feels the impact.
Ensuring a stable and efficient testing environment requires a strategic approach. This includes automating wherever possible, eliminating manual dependencies, and creating testing environments that can be easily reproduced and configured consistently. Investing in the quality of tests and maintaining the environment from the beginning may seem like a high cost, but it is a necessary investment. The cost of not doing so is much higher, both in terms of time and the impact on the final product and customer trust.
Therefore, the lesson is clear: quality cannot be compromised. The cost of maintaining a stable and effective testing environment is inevitable, but the cost of not maintaining this quality is even greater. Ensuring that tests are reliable, consistent, and well-maintained is essential to delivering a product that not only meets expectations but is also able to withstand the test of time and the inevitable changes in the world of software development.
Calculating End-to-End Test Execution Time: A Theoretical Example
Let's take a hypothetical example, which could easily apply to many real-world scenarios. Suppose we're working on a microservice responsible for managing vouchers. This service has five main endpoints:
Create Vouchers (POST /vouchers)
Validate Voucher (GET /vouchers/{id}/validate)
Apply Voucher (POST /vouchers/{id}/apply)
Cancel Voucher (POST /vouchers/{id}/cancel)
Query Vouchers (GET /vouchers)
Now, consider that a change has been made to the voucher application logic (on the /vouchers/{id}/apply
endpoint). Although the change was specific to this endpoint, since we're dealing with a legacy system without a clear segregation of business rules, it’s prudent to test all the endpoints to ensure that the change didn’t introduce issues in other areas of the service.
In scenarios like this, the business rules can be quite complex, especially when it comes to ensuring voucher integrity. For example, a user might need to be authenticated or authorized to access certain resources, and a simple GET call may have numerous security checks before returning a result. Additionally, the service may make asynchronous calls to other systems to validate information in real-time, which adds more time to the testing process.
Let’s consider the following details for the E2E tests, noting that this is a completely hypothetical example:
Number of test scenarios per endpoint: 10 scenarios for each endpoint.
Average execution time per step: 800ms (0.8 seconds) per step.
Number of steps per scenario: 10 steps (with 2 steps in each scenario involving a 15-second pause for asynchronous calls or other checks).
Now, let’s calculate the total time needed to run all test scenarios for this service:
Time per scenario (without the 15-second pauses):
Execution time with the 15-second pauses:
Total time per scenario:
Total time for all scenarios of one endpoint (10 scenarios):
Total time for all endpoints (5 endpoints):
Thus, the total time to execute all end-to-end test scenarios for this service would be approximately 30.33 minutes. While this may seem manageable in a single run, remember that this is just one service in a potentially large microservices ecosystem. If we apply this logic to a system with many microservices, the time required to validate all integrations can quickly become a significant bottleneck.
Impact on Productivity and Development Cycle
What happens when you multiply this by several services? And what if there’s a test failure that requires multiple runs? Those 30.33 minutes can quickly turn into hours, especially when debugging and rerunning tests is necessary.
More importantly, if the test suite fails due to a poorly defined business rule or an incorrect environment configuration, the entire development flow can be disrupted. This is particularly frustrating in legacy systems, where the time needed to isolate and fix the problem can be significant.
These delays not only affect the productivity of engineers, who are left waiting for test feedback, but also impact the delivery of value to the customer. Every minute spent waiting for test results that could have been optimized is a minute less dedicated to developing new features or improving code quality.
The Reality of Complex Business Rules
Another crucial point to consider is that, depending on the rules of each corporation, a voucher service may have many more error scenarios and complex rules to ensure voucher integrity. For example, a simple GET operation to list vouchers may require the user to be authenticated, have specific permissions, and for the system to validate the status of the vouchers in real time. Each of these checks adds layers of complexity to the tests and increases the total time needed to validate them.
Additionally, in environments where asynchronous communication is prevalent, such as in microservices that use message queues or events to process data, debugging test failures can become extremely complex. It is often difficult to connect a failure to its real cause, such as a message that wasn’t sent to a queue, resulting in unexpected behaviors in other parts of the system.
For software engineers and team leaders, measuring the execution time of E2E tests and monitoring the amount of time spent waiting or debugging failures is essential to identifying bottlenecks that affect productivity. By analyzing these times, it becomes easier to understand where the testing process can be optimized.
For example, if test execution time starts impacting continuous delivery, it may be necessary to split the test suite or adopt different testing approaches to verify basic behaviors. We will be discussing this topic in depth soon.
Difficulty in Identifying the Cause of Failures: The Challenge of Debugging in Asynchronous Environments
Where asynchronous communication exists, debugging failures can become an extremely complex and frustrating task. Imagine a hypothetical scenario where a large application is composed of dozens of microservices, many of which communicate through message queues like RabbitMQ or Kafka. Now, picture that you're running an end-to-end test suite to validate a critical business flow, such as processing a financial transaction.
The Payment Process
Let's imagine a payment flow. Suppose Service A receives a payment request, processes the initial data, and then sends a message to a queue for Service B to perform anti-fraud validation. Service B, upon completing its validation, sends another message to Service C, which debits the user's account and finalizes the transaction.
Now, during the execution of the end-to-end tests, a failure occurs. The system didn’t complete the payment, and the test failed. But why did this happen? These are some questions a software engineer would need to answer:
Did Service A fail to send the message to the queue?
Or was Service B unavailable at the time, causing the message to be lost or ignored?
Or perhaps Service C received the message but couldn’t access the database to complete the debit?
Each of these steps is asynchronous and may have failed independently, complicating the identification of the root cause.
Think with me: how would you identify the exact cause of this failure? How would you connect a message that wasn’t sent, or was received out of order, to an error in the final transaction? How much time would you spend trying to trace the source of the problem? And more importantly, how does this affect your ability to focus on developing new features or improving the existing architecture?
This scenario reveals one of the pitfalls of end-to-end testing in complex systems. The distributed nature of microservices, combined with asynchronous communication, creates a web of interactions that are difficult to monitor and debug. When a test fails, it may not be immediately clear where the problem lies. This not only consumes time but also frustrates engineers and quality analysts who must deal with failures that may not reflect real issues in the code.
The Headache of Messages in a DLQ
Now, consider a specific scenario: Service A sends a message to the queue, but for some reason—perhaps a wrong configuration, a temporary network failure, or an unavailability of Service B—that message never reaches its destination. Service B, therefore, doesn’t process the transaction, and Service C never receives the command to debit the customer’s account. When the test fails, all you see is that the transaction wasn’t completed and the message is in the broker's DLQ. But the root cause, a lost message, may be hidden several layers beneath the surface.
This type of problem is not only difficult to identify; it’s also a headache to fix. You might spend minutes or hours checking logs, retesting, and adjusting configurations only to find out that the problem was a small communication failure between the services. And if that failure is intermittent, it might go unnoticed in some tests but not in others, making debugging even more complex.
Simulation versus Reality
Faced with this challenge, some engineers may opt to simulate certain services or queues to make tests more predictable and less prone to failures caused by temporary unavailability or configuration errors. But here’s an important question: by simulating these interactions, are we really testing end-to-end?
Simulation may be appropriate for unit or integration tests, where you want to isolate components and ensure they work correctly independently. However, in end-to-end tests, the goal is to verify that the entire business flow, including communication between services, works correctly in an environment as close to reality as possible. If you start replacing critical parts of the system with simulations, you’re compromising the integrity of these tests.
So, stay alert and always question and evaluate the necessity of using simulations.
The Time and Complexity of Asynchronous Tests
Considering that end-to-end tests in asynchronous environments can take significantly longer to run, the impact on the development cycle is real. Each time a test fails, engineers need to spend time investigating, which slows down the delivery of new features and can increase the pressure to compromise on quality. Additionally, E2E tests that rely on asynchronous communication can be difficult to parallelize, further increasing the total execution time.
If the tests take too long to run and the failures are not clearly identifiable, you’re facing a bottleneck that can compromise the entire agile development process. The time you spend debugging these tests is time that could be used to improve the code, refactor components, or add new features that bring value to the business.
The Hidden Cost of Failures in Asynchronous Environments
Debugging failures in systems with asynchronous communication is a challenge that requires care, patience, and a structured approach. The complexity of identifying the root cause of failures not only affects development time but can also demotivate engineers and quality analysts. The lack of clarity about where the problem lies can lead to longer development cycles, greater team frustration, and ultimately a reduction in the quality of the final product.
It is crucial that, when planning an end-to-end test suite, teams consider these challenges and weigh the best way to test them. Simulations can be useful in certain contexts, but it’s essential to understand the limitations they impose. The ultimate goal should always be to ensure that the system works as expected in the real world, where asynchronous communications and potential errors are inevitable.
Delays in Value Delivery: The Impact of End-to-End Testing on the Development Cycle
When an organization adopts a microservices architecture, one of the main goals is to enable different teams to work independently, delivering value continuously and agilely. However, in large teams, where multiple teams may be working on interdependent services, end-to-end tests can become a significant point of contention. This is especially true when code commits accumulate, waiting to pass through the E2E test suite, resulting in delays in continuous integration and, consequently, in delivering new features or fixes to the customer.
The Dilemma of End-to-End Testing in Distributed Environments: Conflicts of Interest and Incompatibilities
Imagine a large corporation (if you don’t already work in one) where multiple teams are developing services that, while independent, need to integrate to deliver a complete functionality. Each team makes its commits, and before anything is deployed to production, end-to-end tests must be run. If one of these tests fails, whether due to a configuration issue, an unresolved dependency, or even a "flaky test," all commits may be held up until the problem is resolved.
This creates a scenario where one team’s progress depends on the success of another team’s tests. Even if one team has completed its work, it cannot move forward if another service’s test fails. As a result, value delivery to the customer is delayed, and the promise of continuous integration is compromised.
This scenario can lead to conflicts of interest between teams. For example, one team may be ready to release a new version of its service, but another team is still adjusting its E2E tests to accommodate a new feature or fix a bug. This type of incompatibility can result in significant delays, where code ready for deployment is held back, possibly for days or even weeks, until all teams are aligned.
Martin Fowler, in his article on Continuous Integration, highlights that the goal of continuous integration is to avoid exactly this type of build-up of non-integrated code. He suggests that one of the greatest benefits of continuous integration is detecting conflicts and issues as early as possible, but when E2E tests become a bottleneck, this benefit is lost. Instead, issues are discovered late in the process, increasing the complexity and time needed to resolve them.
A Real Problem
Imagine that a team is developing a new multi-factor authentication (MFA) login feature in a large microservices system. The security team has already implemented and tested the MFA logic in its service, but now E2E tests need to be run to ensure that this new functionality works in conjunction with all other services, such as account management, notification services, and the payment system.
However, the E2E tests fail due to an issue with the integration between the MFA service and the notification system, which hasn’t yet been updated to handle the new messages sent by the MFA. Until this issue is resolved, no commits can move forward to production. As a result, all other services that depend on MFA, such as the payment service, are also blocked.
This delay can have a direct impact on the company’s ability to respond quickly to compliance requirements, affecting not only value delivery to the customer but also the overall security of the system.
The Impact on Engineers and Product Managers
This type of bottleneck can be frustrating for both engineers and Product Managers and Product Owners. For engineers, the feeling of having their work blocked by an issue beyond their control can be demotivating and lead to a loss of focus. Instead of moving on to the next task, they find themselves stuck resolving problems that are often not directly related to what they were developing.
For Product Managers and Product Owners, these delays can compromise deadlines and goals, making it difficult to deliver value to customers on time. They need to deal with anxious stakeholders, replan releases, and possibly adjust development priorities, all due to issues that arose late in the development cycle.
Optimizing Value Delivery
The use of end-to-end tests is essential to ensure that all services work in harmony, but when these tests become a bottleneck, it’s time to reconsider the strategy. One possible solution is to integrate contract and integration tests that can validate interactions between services more isolated and efficiently, allowing teams to move forward independently without waiting for E2E test results.
Michael Feathers, in Working Effectively with Legacy Code, mentions that the true effectiveness of tests is not in testing everything at once, but in testing so that each component is validated in its own context. This can relieve the pressure on E2E tests and ensure that they are used only to validate critical business flows without hindering the continuous delivery of value.
Low Efficiency in Bug Detection: The Dilemma of End-to-End Testing
The statement that end-to-end tests can be ineffective in detecting bugs might seem contradictory at first glance. After all, these tests comprehensively cover the system, simulating user interactions and validating complete business flows. However, this perception becomes more understandable when analyzed from both technical and business perspectives.
The Complexity of End-to-End Testing and Inefficiency in Bug Detection
From a technical standpoint, E2E tests are undoubtedly labor-intensive to write and maintain. This is due to the need to cover multiple scenarios and ensure that all possible interactions between system components are tested. However, it is precisely this complexity that can limit their effectiveness in detecting bugs.
For example, a feature that might seem simple on the surface—such as updating user profile information—may, in fact, have several underlying business rules that need to be considered. Have you ever worked on a feature like this? One that seemed easy, but upon diving into the code, you discovered several important checks that couldn’t be broken? Consider this: a feature allows the user to update their address, but only if they are verified, and the update can only occur if there are no pending transactions. Each of these business rules needs to be tested, and an E2E test must cover all these conditions. This means that multiple test scenarios need to be created, each with its own dependencies and interactions.
Writing an E2E test that covers all these nuances can be extremely difficult and time-consuming. And even if the test is well-written, it may still fail to capture certain behaviors, especially those that occur under specific or rare conditions. This can result in a low number of detected bugs despite the significant effort invested in creating and maintaining these tests.
From a Business Perspective: Clarity in Business Rules and Communication
The efficiency of E2E tests also depends on the clarity of the business rules and the communication between the development and quality teams. If the business rules are not well-defined or if there is no shared understanding between engineers and QAs about what needs to be tested, the tests may end up being superficial and fail to capture critical details.
In a business context, low efficiency in bug detection means that critical problems may go unnoticed until they reach the production environment, where the cost to fix them is much higher. Moreover, the time and resources invested in running these tests may not be justified if the return in terms of detected bugs is low.
This situation raises an important question: are end-to-end tests really the best approach to ensuring software quality? Or could other strategies, such as contract testing, complement E2E tests and offer a more effective way to capture bugs?
Martin Fowler’s Perspective: Contract Testing as a Complement
Martin Fowler, in his writings on the testing pyramid, suggests that a balanced approach may be more effective than relying solely on end-to-end tests. He cites the use of contract testing as a complementary strategy. Contract tests verify the interactions between different services, ensuring that the expectations of each service are met.
This approach is particularly useful in microservices architectures, where communication between services can be complex and prone to failure. Contract tests can help capture bugs related to integration issues before they cause failures in E2E tests, which can make the debugging process much simpler and more efficient.
Another point to consider is that writing truly effective E2E tests requires a deep understanding of business flows and possible exceptions and edge cases. This can be a significant challenge, especially in complex systems. The attempt to capture all possible scenarios can result in an inflated test suite, where many cases are superficially covered without really adding significant value to the validation process.
The inefficiency of end-to-end tests in detecting bugs can, therefore, result from a combination of factors: technical complexity, lack of clarity in business rules, and the inherent difficulty of writing tests that effectively cover all possible scenarios.
Contract Testing: Challenges, Concerns, and a Practical Example
When it comes to contract testing, some software engineers and technical leaders may have reservations. The shift to this approach can seem intimidating, and it’s natural for doubts to arise. Let’s address some of these common concerns and explore what’s really involved in adopting contract testing.
Misconceptions About Contract Testing
Does Contract Testing Add Unnecessary Complexity? One of the first reactions you might hear is: "This is going to complicate everything." It seems that by adding contract tests, we’re adding extra work to keep the tests running. But what actually happens is the opposite. Think of contract tests as a way to distribute the verification responsibility among services, relieving the pressure on end-to-end tests.
Imagine you have a network of interconnected services. Without contract tests, all potential problems have to be caught in E2E tests, which become heavy and time-consuming. With contract tests, these integration failures are caught early on, long before they reach the stage of an E2E test.
Are Contract Tests Difficult to Write? Another concern is that writing contract tests is an arduous task that requires extra effort. It may seem complicated at first, especially because you need to understand the roles of consumer and provider. But the reality is that, with tools like Pact, this process becomes much simpler. These tools help automate the creation and verification of contracts, making the testing process smoother and more efficient.
For example, with Pact, you can create tests for both the consumer and the provider. Here’s how this might work in a real scenario:
// Teste de contrato para um consumidor frontend que consome dados de perfil de usuário
@Pact(consumer = "UserProfileFrontend")
public RequestResponsePact createPact(PactDslWithProvider builder) {
return builder
.given("User with ID 123 exists")
.uponReceiving("A request to retrieve user profile details")
.path("/api/users/123")
.method("GET")
.willRespondWith()
.status(200)
.body("{\"id\": 123, \"name\": \"Alice\", \"email\": \"alice@example.com\", \"status\": \"ACTIVE\"}")
.toPact();
}
@Test
@PactTestFor(providerName = "UserProfileAPI", port = "8080")
public void testGetUserProfilePact() {
WebClient webClient = WebClient.create("http://localhost:8080");
UserProfile response = webClient.get()
.uri("/api/users/123")
.retrieve()
.bodyToMono(UserProfile.class)
.block();
assertNotNull(response);
assertEquals(123, response.getId());
assertEquals("Alice", response.getName());
assertEquals("alice@example.com", response.getEmail());
assertEquals("ACTIVE", response.getStatus());
}
Here, the test is validating whether the consumer (in this case, a frontend) receives the correct data from a provider (User API). This contract will be verified by the provider to ensure that the API returns the correct information.
Contract Tests Are Not Necessary in Small Ecosystems? Some engineers believe that if the microservices ecosystem is small, contract tests are an unnecessary luxury. But this is a shortsighted view. As the system grows, the lack of contracts can result in communication failures that will only be detected at later stages, such as in E2E tests or, worse, in production.
Even in a small environment, it is beneficial to introduce contract tests from the beginning. This not only establishes good practices but also lays the groundwork for more organized and secure growth.
Common Concerns About Adopting Contract Testing
Maintaining Consumer and Provider: One of the most common concerns is the need to maintain both sides of the contract: the consumer and the provider. And yes, this means there will be extra effort. However, the maintenance can be highly automated and integrated into the CI/CD pipeline. The real benefit here is the visibility that contract tests provide. When a service changes, the contract helps quickly identify which consumers will be impacted, facilitating coordination between teams.
Loss of Flexibility: Another concern is that contract tests might limit the ability to innovate or make quick changes. But contracts are designed to allow flexibility within acceptable limits. New contracts can be introduced while older versions are maintained until all consumers are ready for the transition. This allows services to evolve without disruptions.
Fear of Initial Overhead: Implementing contract tests in an already live system can seem like a monumental task. Many prefer to delay until a "more appropriate time," which often never comes. The truth is that contract tests don’t need to be implemented all at once. They can be introduced gradually, starting with the most critical services or new functionalities.
To illustrate this better, consider the example of an API consumer:
// Teste de contrato para um consumidor API que processa pedidos de compra
@Pact(consumer = "OrderProcessingService")
public RequestResponsePact createOrderProcessingPact(PactDslWithProvider builder) {
return builder
.given("Product with ID 456 is available in stock")
.uponReceiving("A request to place an order for a product")
.path("/api/orders")
.method("POST")
.body("{\"productId\": 456, \"quantity\": 3, \"userId\": 789}")
.willRespondWith()
.status(201)
.body("{\"orderId\": 1010, \"status\": \"CONFIRMED\", \"estimatedDelivery\": \"2024-09-15\"}")
.toPact();
}
@Test
@PactTestFor(providerName = "OrderAPIProvider", port = "8080")
public void testCreateOrderPact() {
WebClient webClient = WebClient.create("http://localhost:8080");
OrderResponse response = webClient.post()
.uri("/api/orders")
.bodyValue(new OrderRequest(456, 3, 789))
.retrieve()
.bodyToMono(OrderResponse.class)
.block();
assertNotNull(response);
assertEquals(1010, response.getOrderId());
assertEquals("CONFIRMED", response.getStatus());
assertEquals("2024-09-15", response.getEstimatedDelivery());
}
In this second example, we have an order processing service that consumes an API to create new orders. The contract ensures that when creating an order for a product available in stock, the API will respond with a status of "CONFIRMED" and an estimated delivery date, thus validating the integrity of the transaction.
Overcoming initial concerns, contract tests offer undeniable advantages. They allow for early detection of problems, facilitate communication between teams, and reduce the load on E2E tests. Over time, they can become an integral and valuable part of the development cycle.
It’s clear that adopting contract tests requires an initial effort, but this investment pays dividends in terms of stability, reliability, and the ability to safely scale the system. For those who are still hesitant, the best approach is to start small with critical services and expand as the team gains confidence in the practice.
But perhaps you prefer an approach with multiple testing strategies.
A Combined Approach: Contract Testing and Acceptance Testing
When we talk about ensuring the quality of complex systems, we need several testing strategies that are both efficient and comprehensive. In the case of Nubank, this challenge became evident as the company grew, and they realized that their reliance on end-to-end testing was becoming a major bottleneck. In response, Nubank adopted a combined strategy of contract and acceptance testing, which proved to be more effective and scalable.
In the article "Why We Killed Our End-to-End Test Suite: How Nubank Switched to a Contract and Acceptance Testing Strategy to Scale to Over 1k Engineers", Nubank details the problems they faced by relying exclusively on E2E testing. With a test suite that covered a large number of end-to-end scenarios, they began to notice a series of challenges:
Slowness in Test Execution: As the codebase expanded, E2E tests began to take longer and longer to execute. This slowed down the feedback cycle, affecting the team’s agility.
Lack of Reliability in Results: Many of the end-to-end tests were unstable, failing inconsistently and generating a high number of false positives, which reduced the team’s confidence in the test suite.
High Maintenance Costs: Maintaining the E2E test suite was a laborious and costly task, especially as the number of microservices grew and the interactions between them became more complex.
Faced with these problems (which we have discussed in this article), Nubank decided to shift their approach to a combination of contract and acceptance testing.
What Are Acceptance Tests?
Acceptance tests are designed to validate whether a system or functionality meets business requirements and stakeholder expectations. They focus on verifying whether the software meets predefined acceptance criteria by simulating real-world usage scenarios to ensure that everything works as expected.
The structure of an acceptance test typically involves the following elements:
Acceptance Criteria: These are defined in collaboration with stakeholders such as Product Owners, business analysts, and software engineers. They specify exactly what needs to be validated for a feature to be considered complete and ready for delivery.
Testing Environment: Acceptance tests are conducted in an environment that closely replicates the production environment. This ensures that the test results are representative of the system’s actual behavior.
Test Scenarios: Each acceptance criterion is translated into one or more test scenarios that describe step-by-step user interactions with the system. These scenarios may include different variables, such as user types, permissions, or specific conditions.
Execution and Validation: The test scenarios are then executed, and the results are compared against the acceptance criteria to determine whether the functionality is correctly implemented.
For example, in a flight booking system, an acceptance test might ensure that, when selecting a flight and applying a voucher, the discount is correctly applied, the seat is reserved, and a confirmation is sent to the user. These tests are crucial for ensuring that business rules are met and that the final product aligns with stakeholder expectations.
A Combined Strategy
This combined approach can also be applied in other contexts. Imagine you’re working with a microservice responsible for managing vouchers, as we discussed earlier. By adopting a combination of contract, acceptance, and E2E tests, you can:
Contract Tests: Ensure that the voucher service API works correctly with other services, such as authentication or payments, validating that all necessary parameters are present and correctly formatted.
Acceptance Tests: Validate that the main business rules are followed, such as ensuring that a voucher cannot be applied to a completed purchase or that users can only apply a voucher if they are authenticated.
End-to-End Tests: Validate the complete purchase flow with the application of a voucher, from product selection to payment confirmation, ensuring that all steps of the process work together as expected.
Nubank’s decision to reduce reliance on end-to-end tests and adopt a combined strategy of contract and acceptance testing was crucial for scaling their codebase and engineering team efficiently. This move not only improved testing efficiency but also allowed the company to maintain software quality as they continued to grow.
For financial corporations and other organizations facing similar challenges, the lesson is that a combined testing approach can offer a more effective way to ensure software quality, reduce bottlenecks, and maintain the agility needed to continuously deliver value to customers.
Conclusion
Are end-to-end tests a problem in microservices to the point where we should suggest their complete removal? The answer isn’t simple, and as we’ve discussed throughout this article, it’s not about attacking the E2E strategy or considering it obsolete. Instead, we’re highlighting that this approach, when misused or overloaded, can introduce significant challenges, especially in complex architectures.
E2E tests have their place in the development process, as they provide a comprehensive view of how different system components work together to fulfill a business flow. However, the problem arises when they are seen as the ultimate solution for ensuring quality. As we’ve seen through the opinions of renowned engineers and Nubank’s success story, simply inverting the testing pyramid, placing most of the trust in E2E tests, doesn’t necessarily result in higher quality or confidence.
Reflections on the Challenges of End-to-End Testing:
Complexity and Cost: E2E tests, by their nature, are complex and expensive to maintain. They involve multiple dependencies and often suffer from instabilities, such as "flaky tests," which can generate false positives or negatives. This not only consumes time and resources but also affects the development team’s agility.
Slow Feedback: Increasing reliance on E2E tests can lead to a slower feedback cycle. Instead of receiving quick responses on code changes, teams end up waiting for hours for the entire test suite to run. This delay can compromise the ability to iterate quickly and deliver continuous value.
Trust and Quality: Inverting the testing pyramid and relying heavily on E2E tests can create a false sense of security. While these tests validate the system’s behavior as a whole, they don’t always capture granular-level failures, such as integration issues between microservices that could be more easily identified with contract or unit tests.
Focus on Business Scenarios: E2E tests should be reserved for validating critical business flows, not for covering every possible scenario. By focusing E2E tests on essential tests while delegating integration and basic behavior checks to contract and unit tests, we achieve a more efficient testing strategy that is less prone to bottlenecks.
Questions for Reflection:
Is your team using E2E tests efficiently, or are they becoming a bottleneck in value delivery?
How can you balance the load between unit, contract, and E2E tests to get quick feedback and maintain quality without sacrificing agility?
Is there an opportunity to adopt practices like those implemented by Nubank, combining contract and E2E tests to achieve a smoother and more reliable development cycle?
End-to-end tests are not the problem themselves; the problem lies in how they are applied and overloaded in complex systems like microservices. They are a powerful tool, but like any tool, their use should be carefully considered and balanced with other testing approaches. By integrating contract and unit tests and reserving E2E tests for truly critical business scenarios, we can maintain agility in development and ensure that the final product is robust and reliable.
The decision of how to apply end-to-end tests in your architecture depends on your specific context and the challenges you face. It’s important to reflect on your system’s objectives, business needs, and team structure to define the testing strategy that best suits your reality. The answer isn’t to eliminate E2E tests, but to use them sparingly and intelligently as part of a well-balanced testing strategy.
This article was quite extensive and deep. In a future post, we will conclude the discussion by addressing the following topic: Contract Testing vs. End-to-End Testing: Efficiency and Performance. See you soon! 👨🏻💻