If the evaluation recommends the use of a certain
critical value, the performance of the method at that
value should be subjected to a statistical test of
significance. This amounts to asking the statistical
question: to a stipulated level of confidence (again,
usually 95 percent certainty), are the number of
misclassifications at the critical value less than
would be expected on the basis of chance alone? If
the critical value has been specified prior to review-
ing the performance data, the appropriate statistical
tools are the Fisher exact test, for small sample
numbers, and the chi-square test, for large sample
numbers. If, however, the critical value has been
selected after examining the data, as happens when it
is chosen so as to maximize study efficiency, statisti-
cal significance should be assessed by using p values
corrected for
post hoc
selection of the critical value
or by using a statistical test designed for such
circumstances such as the one developed by Gail and
Green (1976).
Werner
et al.
report the statistical method,
stepwise discriminant function analysis, and the
computer program, BMDP7M, they employed for
their investigation of combination testing. The use
of stepwise regression as a tool in the analysis of
multivariate rules is very common even though the
method can be problematic (Harrell
et al.
1985,
Diamond 1989, Simon and Altman 1994). Difficul-
ties associated with the method include inconsistency
in the selection of the significant variables, bias in
the estimation of the regression coefficients, and
overestimation of the statistical significance of the
coefficients. A validation study (
vide infra
) is an
absolutely essential component of any performance
evaluation in which this analytic technique is used.
Presentation of findings
The presentation of the performance data, the
sixth component of the evaluation report, should be
in as complete a form as possible. Ideally, the refer-
ence result frequency distributions should be given.
From these, readers can construct the ROC curve
and the likelihood ratio curve for the study. The
ROC curve describes the performance of a study in
the form appropriate for comparing with alternative
studies and the likelihood ratio curve describes the
performance in the form needed for the Bayesian
assessment of classification probabilities. It is desir-
able, of course, that the ROC and likelihood ratio
curves be presented in the report rather than having
the readers generate them (Jaeschke
et al.
1994). In
the evaluation of a multivariate diagnostic or
prognostic rule, it is impossible to present the joint
reference result frequency distributions if more than
two studies are involved. It is possible to display the
reference result frequency distributions defined by
the rule and this should be done. The ROC and
likelihood curves for the rule derived from these
distributions should also be presented. Werner
et al.
present their performance data as ROC curves, one
for each of the clinical settings considered in the
article:
The method for graphing ROC curves is not
standardized. In this example, the horizontal and
vertical axes are, respectively, specificity and sensi-
tivity. This agrees with the convention used in this
book. Graphs in which the horizontal axis is one
minus specificity (usually called the "false positive
rate") and the vertical axis is sensitivity are often
found. They give curves that are left-to-right mirror
images of those obtained when the horizontal axis is
specificity. The practice of identifying at least some
of the points on the curve with the corresponding
critical values is not a standard practice either.
Here, the point associated with 120 U/L is circled.
Unfortunately, one does not always find a
complete presentation of the performance results.
Not uncommonly, study evaluations report the
classification performance of the study at a single
critical value. In that case, the performance descrip-
tion should state explicitly the basis for the authors'
selection of the value. The three most frequently
Evaluating Classification Studies
4-6