"How did [some bug] pass QA?"

08/02/2013 08:37:01

When things break, people will always ask why and in a continuously delivered system, people have to understand that the "break things" in "move fast and break things" applies.

The answer to the question is an obvious one: 100% QA, unit, or regression coverage is a myth. We endeavor for a high degree of test coverage - we cover the things that are likely to break, and as we discover things, we expand coverage. Software is unpredictable, we cover what we can. In any fast moving system, things will break, and the measure of your maturity is the speed in which you react. The fact that you can deliver software that doesn't break is always a testament to the coverage you have.

The responsibility is shared by everyone involved, but blaming is useless. Understanding how to react is important, so know how to react in the confines of your system. Ensure test coverage is very good over critical aspects of the system, ensure your core functionality doesn't break and never "break the till".

Software isn't perfect, things break, we fix them and continue. This is the life of continuously delivered software. The reason it's powerful is that it allows us to change without friction. Don't panic. If you panic about small breakages, you absolutely cannot have big new features in a reasonable timescale and cost.

The more mature your test suite is, the more test driven your code is, all contributes to lowering this risk. If you play right, and test drive from the start, you can reduce the likelihood of hitting errors, without full stack coverage, you'll never be able to predict all the possible outcomes because, in the end, you're still maintaining software.