Hello. My name is Mike Whalen and I'm here to talk about Test Selection and Test Adequacy. And these are two topics that are really intertwined. Test selection is choosing the right inputs to use to test my program. And test adequacy is, are those inputs any good? So, what we're really trying to do is we're trying to determine this thing called adequacy. So, what we want is we want to look at our program given a set of tests, and we want to determine whether or not the program is correct. Or if we can't do that, we want to determine whether or not it has the desired level of dependability that we want. But this is very difficult. It's very difficult to ascertain whether a program is dependable in all cases and it's even harder to do correctness. In fact, for correctness, it's generally impossible. There's a famous problem called "the halting problem" that states that we can't in all cases determine whether or not our software is correct. So, we can't exactly measure adequacy. So, what do we do? We approximate it. So, we have a variety of different metrics that people have come up with that try and approximate whether or not we have adequately tested our program. And these metrics are based on things like the program structure. So, have we covered all the statements, all the branches or other things? The program inputs. Have we chosen an input from one of several different partitions that we think if we look at all those partitions that's going to be a good test suite? We can look at requirements. Do we have at least one test for every requirement? That kind of thing. And so, what we're going to do is we're going to use this measurement as a way to determine the thoroughness of the test suite, or the lack of thoroughness if we don't get a very good score. So, in order to talk about this stuff we have to define some terms. The first thing is a test case which is just a set of inputs, and a pass-fail criterion. So, we're going to throw some inputs at a program and we're going to have some oracle that's going to decide whether or not the test passes or fails. And then we also have a test case specification which is a requirement that has to be satisfied by one or more tests that we're going to use this to define essentially our adequacy criteria. And then we have a test obligation which you can think of as a partial test case specification. So, it could be that we have a strict requirement and then we have some aspect of it related to the program structure. That would be a test obligation. We have a test suite which is just a suite of these input evaluations. We have a test execution. So, a test case is just the set of inputs that we have. A test execution is that set of inputs that it's actually running on the program. And then finally, we have an adequacy criteria. And an adequacy criteria is a predicate which means basically thumbs up or thumbs down. It's a boolean expression that's true if a program and a set of tests passes that adequacy criterion. Let me make this concrete. We have a criterion that says we have to have a test that covers every statement in the program. So, every statement in the program has to be executed by at least one test. For that adequacy criterion we can have a program and a test suite. And if that test suite does in fact execute every statement in the program then it passes. Otherwise it doesn't. So, an adequacy criterion we can think of is a set of these test obligations. In the case of statement coverage one obligation would be for each statement. And the test suite passes the adequacy criterion if, first of all, all the tests pass. So, we can't satisfy the adequacy criterion if some of the tests fail even if we cover all the statements, and if every test obligation in the criterion is satisfied by at least one of the test cases. So, we might have several test cases that execute the same statement within the program, but as long as there's at least one test that executes each statement we call it good. Then the question is, what do we know? What do we know when we satisfy one of these criterion or not? So, when a criterion is not satisfied, we know some information about the test suite. There may be some problem with it. So, for example, when we have the statement coverage metric, if there are some statements within the program that are not executed by any test in the tests suite, we don't know anything about them. They may be faulty, they may be right, but our test suite doesn't have anything to say about them because they weren't executed. And so, for each of the criterion that we have, when we don't achieve full coverage of that criteria, we can get some information back about the quality of our test suite. So, maybe our criterion is requirements coverage. We want to have one test case for each of the requirements. Well, if we have a requirement that doesn't have any test cases, then we know that we aren't testing that particular requirement for our program. So, that's what we know if a criterion is not satisfied. How about if it is satisfied? Well, we know something. We have some evidence that the test suite has some level of thoroughness. So, for statement coverage, again, if we have a test suite that's adequate for statement coverage, then we know that every statement in the program has been executed by a test at least once. But that may not or may be enough to give us confidence in the correct execution of the program. Depending on how much dependability we want, that coverage metric may not be enough for us to definitively say this is a good program. So, to think of an analogy you wouldn't buy a house because it's up to code, but you might avoid a house that's not up to code. When we have a test suite that meets a certain criterion, we can at least say, well, it has some level of thoroughness to it. It doesn't mean that the program is right, but it means that we've at least checked a certain set of structures in the program and we have some confidence in them.