Testing 1, 2, 3, Infinity

This is an experiment in blogging, on various and sundry topics (though with more emphasis on software testing than some others), by Alex Groce and Josie Holmes.

The title is a take-off on the George Gamow book:

The title is intended to refer to software testing, microphones (thus old time radio), abstraction itself, old books, and exploration.


Popper and Testing

The idea that Karl Popper’s notions about empirical science as demarcated by an emphasis on falsification have something to say about software testing is pretty commonplace, at least among people who might think about such things. “Everyone” knows a test should aim not to demonstrate that a program works, but to falsify the claim that it works. Good tests have high potential to prove the program wrong, like good experiments are those that might prove a theory wrong.

It’s not completely absent from the literature (see Aichernig’s work), but Popperism as a way to think about mutation testing has not been nearly as much remarked upon. It’s great to use tests to doubt your code, but who tests the tests? Who tests the spec itself? Who watches the watchmen? A key point of Popper’s ideas is that there is no non-provisional, not-to-be-tested statement in science, not even “basic” empirical observations. Program mutations can falsify not just the program, but its test suite. That’s why they are useful, not just an additional hoop for people with 100% code coverage to jump through.

The more general statement is interesting to think about. Consider a program, with a passing test suite, that you think is correct. Any semantically meaningful change to the program you make, that does not cause the test suite to fail, forces you to accept at least one of three claims:

  • The test suite was not complete in specifying interesting behaviors of the program (maybe you didn’t even run the changed code). You have falsified your “experimental apparatus” in a sense, might be a Popper/science analogy.
  • The specification was not complete in unambiguously specifying what the program should do. This could be harmless (genuine room for variance), or a hole in your concept of correctness. In this case the spec/oracle is in the role of a scientific theory for Popper.
  • The program and specification were both just wrong. You falsified your code (which is also in the role of a theory).

Such a change is what an unkilled program mutant is, and there isn’t anything a passing mutant can’t potentially falsify of interest in our “science of this here program.” That’s powerful Popper magic!