Friday, January 1, 2010

Tests can show presence of bugs, and not their absence?

Some interesting discussions are related to the famous statement:
"program testing can be used to show the presence of bugs, but never to show their absence!"
which is a variant of the fact that "absence of evidence is not evidence of the absence".

The discussions takes a simple example:
"the fact that there is no evidence that there is an elephant here in the room, is not an evidence that there is no elephant in the room".
We can drop the "not" because it is very unlikely that an elephant is here without being any evidence so:
"the fact that there is no evidence that there is an elephant here in the room, acually is an evidence that there is no elephant in the room!".

Seems reasonable, about being elephant in this room or not that "absence of evidence is also evidence of absence", but what if we talk about a fly:
"the fact that there is no evidence that there is a fly in this room is not an evidence that there is not a fly in this room".

The question is about "how much is close to zero the probability that I'm doing a mistake using the rule "absence of evidence implies evidence of absence"?
It's about the probability that using an abductive inference logical rule we get the wrong conclusion.

Probability of being in mistakes in terms of conditional probability:
p(there is an elephant|no evidence of elephant) is low (probability that there is an elephant in this room given that there is no evidence that an elephant is in this room)
p(there is a fly| no evidence of fly) is higher (probability that there is a fly in this room given that there is no evidence that a fly is here in this room)

If there are no evidence of being a fly here does not mean that we could really conclude reasonably that there is not a fly here.

Let's go back to the software testing question. The software testing analogy, is closer to the elephant or the fly example?
How much confident we are that "no evidence of bugs (i.e. tests passes) is evidence of no bugs"?

if p(bug | test passes) = 0 then "no evidence => evidence of not" would be a valid logical role.

But can be p(bug | test passes) =0?

That would mean: we are sure that we are not wrong when we interpret "all test passes" as "there is no bug".

From theoretical computer science the answer is that we could never be sure of it, i.e. the previous statement is theoretically wrong.
The "Rice theorem" states that cannot exist any automated procedure (testing program) that can recognize if another program (tested program) has some particular property (being bug free).

However it does mean not so much practically speaking, I mean, for example because of the following reasons:
- what do we mean by bug? From some point of view there does not exist a program bug free.
- we need p(bug | test passes) close to zero, not zero.
- Rice theorem talks about the impossibility that there exist a program that take in input any other program, like using it as a black box. Here is different: in tdd the program under test is not a black box. It's something that I know because it's what I'm writing in order to solve a problem that I know. Program grows without breaking existing tests, and that's a way to lower the probability of introducing bug by program changing.
- tests are a way to rewrite specifications in a language suitable for the coder and for the code under test. Test cannot be perfect as the specifications can't.

No comments: