Im pretty sure we’ve all heard about the “global IT outage” from last week. Most of us were affected by it, either directly or indirectly. I couldn’t use the ATM and my wife’s flight was delayed for a couple of hours; these were mere inconveniences compared to the impacts that others felt. This CrowdStrike incident will be remembered for quite a while.
I suspect that CrowdStrike never anticipated something like this could or would occur. I also suspect that their previous experiences with releasing this kind of content were low-risk. But, in this specific case, it wasn’t, was it? There are lots of people out on social media saying things similar to, “This is what happens when you don’t have testers”, “why didn’t they test this?”, and “QA jobs are going to be a lot more plentiful now”. While these statements and questions may be valid, they are rather one-dimensional. I’m not here to dogpile on CrowdStrike; I’m here to talk about risk.
Why do we test? To ensure quality? No; we can’t ensure quality. To find bugs? No, though, finding and reporting bugs is valuable output from testing. We test to provide information to decision-makers: the people who decide whether our software is ready to release with a tolerable level of risk. As testers, we report what we tested and, just as importantly, what we did not test, including real-world configurations that we aren’t equipped to reproduce. We also report what we experienced while testing, including issues, potential issues, unexpected changes, and risks to the quality of our software systems.
At its core, testing is about uncovering and reporting risk. All the stuff about bugs and defects and issues are by-products. They are valuable, but the core value proposition is reporting risk.
Risk comes in many forms. If we have an issue where ordering one of our products fails, that’s potentially a lost sale and lost revenue; an order failure is a rather obvious risk to revenue, reputation, future orders, etc.
Here are some less obvious realized risks:
- It takes us a long time to debug and resolve issues, hurting our time to market and increasing our operating expenses.
- For each “significantly impactful” outage, we will owe one or more of our customers a financial penalty.
- Our feature delivery is slower than our competitors because our test automation is inconsistent, causing us to lose trust and to “redo some things by hand”.
- Our app allowed a child to order an age-inappropriate product and now we are trending on social media, but not for a positive reason.
- Our company is being talked about on every news outlet in the world, but not for a positive reason.
There is, however, one other big, but implicit, risk: treating testing, and by extension, test automation, as a confirmatory activity. Make no mistake, confirmation is important; at a minimum, the software must be able to do the core things it was created to do. There are, however, other aspects of both testing and automation that are not just confirmatory and those aspects can expose potential issues that might put our company at risk. We need to test scenarios where things go wrong or might go wrong. This week, I was invited to a meeting called a “pre-mortem”: let’s discuss the things that might go wrong. This should be a wonderful source of testing ideas and, hopefully, a place to describe opportunities to increase testability and automatability.
An additional risk that many of us undertake, but that far fewer of us really discuss as a risk, is switching our outsourcing partners. There can be justifiable business reasons for a switch. Before I started consulting, I was part of a company where our current contracting partner was unable to keep up with our technological requirements; they just didn’t have the competencies in-house and seemed to have trouble obtaining those competencies. We discussed our options, described the risks we saw, and decided that a switch was the most appropriate way forward for us. We reduced our risk by going with a partner that our leader had worked with before, so they were a known quantity, and this partner specialized in the industry in which we worked, thereby reducing schedule risk by reducing the time needed for their ramp-up. Though cost is the driving force for many companies’ decision to switch partners, vendors, tools, etc., in our case, our driving force was our previous partner’s capabilities; we wound up paying a bit more for some of the consultants we used from the new company, that that cost difference allowed us to work with consultants who had the appropriate skillset which reduced our risk of defects. I’ve seen other companies make a switch exclusively for cost savings and it didn’t turn out as well for them because they didn’t adequately describe and mitigate the risks that the switch presented for them.
Now, what about automation in all of this? Automation is here to help testers be more effective or more efficient at their jobs. Since testing is largely about uncovering and reporting risks, automation helps testers by allowing them to uncover and report those risks faster, in more detail, across a broader subset of an application. To gain these kinds of benefits, however, we must trust our automation and it must have appropriate logging and reporting so that we can minimize our time-to-feedback.
Risk comes in many forms. A specific risk may be a huge risk for one company, but a small, tolerable risk for another. We need to uncover as many risks as possible so that sound decisions can be made about product releases. I suspect we’d all like our respective companies to trend for positive reasons.
Like this? Catch me at or book me for an upcoming event!