EVALUATING MEDICAL UTILITY
A laboratory study has medical utility if it meets
a clinical need as well as or better that the alternative
approaches used to address that need. To determine
if a particular test has utility as a classification study,
it is necessary to find out how well it performs its
role as a clinical classifier and how that performance
compares with the performance of the other means
used to achieve the classification. The investigation
of the classification performance of a laboratory
study is referred to as a performance evaluation.
Reference classification
A complete report of a performance evaluation
includes the seven components listed in Table 4.1.
The necessary start to the report is a clear statement
of the diagnostic classes or prognostic classes meant
to be distinguished by use of the study and of the
criteria used to assign study subjects to the classes.
Ideally, the method employed for the ultimate classi-
fication of subjects, the reference method, should be
a perfect classifier, a so-called gold standard. In
reality, of course, reference methods usually fall
short of perfection. Often, reference methods for
diagnostic classification are completely specific but
not completely sensitive. This is true, for example,
for methods based upon pathologic examination.
They are not completely sensitive because a mild
form of a disorder or a small focus of disease can be
missed. Reference methods for prognostic classifi-
cation can be perfect classifiers, such as when the
prognostic groups are "dead in five years" and "alive
in five years". However, in many situations the
methods are not completely accurate. Consider, for
example, when "disease recurrence at five years"
and "no disease recurrence at five years" are the
prognostic groups. Even the best reference method
can be expected to misclassify some patients in
whom recurrent disease is present but not yet clini-
cally detectable. Imperfect reference methods that
are completely specific give a correct estimate of the
sensitivity of a diagnostic study but lead to underesti-
mation of its specificity (Statquet
et al.
1981).
Similarly, for two-group prognostic studies, an
imperfect reference method that is completely
accurate in classifying members of the non-event
group leads to a correct estimate of the fraction
correctly classified in the event group but yields an
underestimate of the fraction correctly classified in
the non-event group (Table 4.2).
Sometimes the reference methods that are used
are neither completely specific nor sensitive, when
evaluating a diagnostic study, or are inaccurate in
classifying the members of both the event and
non-event groups, when evaluating a prognostic
study. This is often the case when more definitive
reference methods are unduly invasive, painful,
expensive, or inconvenient. Also there are disorders
for which no widely accepted gold standard classifi-
cation method exists. Estimates of the classification
performance measures determined in an evaluation
using such reference methods are subject to error
and thus, if uncorrected, must be considered rough
approximations (Walter and Irwig 1988). Correction
of the estimates is sometimes possible, however.
For instance, if the performance measures of a refer-
ence method have been determined at some other
time by comparison with a true gold standard, those
values can be used to calculate the corrected
estimates of the performance measures of the method
being evaluated (Statquet
et al.
1981). In addition,
corrected performance measure estimates can be
derived using other more elaborate evaluation
designs such as repeat testing (Yanagawa and Gladen
1984, Schulzer
et al.
1991), testing with multiple
studies (Yang and Becker 1997, Torrance-Rynard
and Walter 1997, Qu
et al.
1996), and testing in two
Evaluating Classification Studies
4-1
Chapter 4
EVALUATING CLASSIFICATION STUDIES
© 2001 Dennis A. Noe
Table 4.1
Components of a Performance Evaluation Report
1. Definition of the diagnostic or prognostic classes
2. Description of the reference method or technique used to
assign subjects to the diagnostic or prognostic classes
3. Definition of the clinical setting
4. Description of the study population
5. Description of the analytic procedures and mathematical
techniques used
6. Description of performance
7. Description of validation study