critical values thereby making a comprehensive
performance evaluation impossible.
The subjects comprising the population studied
by Werner et al. are described thus:
The sample represents the aggregate of admis-
sions to the Cardiac Care Unit (GWU Med.
Center) during the period for study, and in
this sense reflects the prevalence of myocar-
dial disease in a "complaining" population.
However, from this sample, only individuals
for whom enzyme assays were done on at
least two consecutive days were included in
this study ... Table 1 lists the patients used in
the former analyses by disease state, age, and
sex.
The authors have attempted to avoid sampling
bias in the assembly of the study population by using
a naturalistic sampling scheme with enrollment of all
of the patients admitted to their Cardiac Care Unit.
Unfortunately, not all of the patients had plasma
enzyme studies performed on at least two consecu-
tive days, so some of the potential subjects were
excluded from the evaluation. Because the patients
in whom repeat diagnostic testing was not pursued
were probably not selected in a random fashion from
the Unit's population, their exclusion does introduce
a bias into the evaluation. If the excluded patients
were those in whom the initial clinical findings
indicated only a small likelihood of myocardial
infarction, the bias is of the work-up type. The
information that the investigators collected regarding
the clinical and biologic spectrum represented in the
study population is summarized in Table 1 of the
report. There are subjects in all of the indicated
clinical subgroups but for some, such as infarct-free
patients with prior myocardial infarction, the number
of subjects studied is small. This makes subgroup
differences difficult to demonstrate by statistical
analysis. More importantly, though, small numbers
necessarily limit the degree of biologic variability
among the subjects and thereby lessen the reliability
of the performance estimates.
Analytic methodology
The final component of study design considered
in the evaluation report is the description of the
analytic procedures used to make the study measure-
ments. Although it often happens that little attention
is given to this description, the procedures chosen
can affect the clinical utility of the study dramati-
cally. Patient preparation, the manner of specimen
collection and handling, and the analytic methodol-
ogy, including instrumentation, need to be specified.
The use of inaccurate methods, especially those
suffering from poor analytic specificity, or imprecise
methods will lead to underestimation of study
performance. As was mentioned in the description
of diagnostic-review bias, study performance will be
overestimated if there is a bias in favor of having
study results agree with reference classifications. In
the case of diagnostic-review bias, this can happen
when study results are known at the time the refer-
ence classifications are made. A similar bias may
arise if the test under evaluation has a subjective
element to its interpretation and the reference classi-
fications of the subjects are known to the individual
reviewing the test results at the time the interpreta-
tions are made. This is called test-review bias
(Ransohoff and Feinstein 1978). "Blind" interpreta-
tion of study results protects against this bias.
Information about the analytic methods should
be made available in the evaluation report either in
the form of summary statements of their technical
performance attributes or by referencing separate
technical method evaluations. Werner
et al.
use the
latter approach:
All enzymes were assayed at 37° C. A
mechanized system (System TR; Beckman
Instruments, Inc., Fullerton, CA 92634) was
used to perform the following kinetic assays:
the CK assay of Oliver
The mathematical techniques used for data
exploration and analysis are crucial methodological
elements of a performance evaluation. They should
be identified and, when necessary, the appropriate-
ness of their use should be discussed (Wasson
et al.
1985, Concato
et al.
1993, Simon and Altman
1994). It is especially important that statistical
assumptions that may be violated by the data be
addressed. An example would be the need to
demonstrate the approximate normality of the data if
statistical methods based upon normal distributions
are used. When a multivariate analytic approach
such as discriminant or logistic regression is used for
interpreting study result combinations, the goodness-
of-fit of the derived classification rule should be
assessed (Hosmer
et al.
1991, Harrell
et al.
1996,
Hosmer et al. 1997).
Evaluating Classification Studies
4-5