The Logic of laboratory Medicine - page 69

consist entirely of patients who are in the advanced
stages of a disorder or who have an extreme progno-
sis. There should also be considerable heterogeneity
among the individuals in favorable diagnostic and
prognostic groups. Some of the subjects should
suffer from related disorders and illnesses that can
be confused with the condition under investigation.
Indeed, there should be many patients in whom the
reference classification cannot be made without use
of the reference technique. Also, the subjects should
represent a diverse sample in terms of biologic
attributes, such as age and gender. Failure to assem-
ble a population with an adequate biologic spectrum
usually results in overestimation of study
performance.
Sensitivity will be overestimated if the study
population consists of subjects selected because their
chances of having a particular disorder are great
enough to justify the use of an invasive, painful, or
costly reference method of classification. This form
of bias is called work-up bias (Ransohoff and
Feinstein 1978). In the evaluation of a prognostic
study, work-up bias originates from the preferential
selection of subjects with a high likelihood of
belonging to prognostic classes that are serious
enough to warrant the use of an impractical or
expensive reference method. Work-up bias leads to
overestimation of the fraction correctly classified in
those prognostic classes. Related to work-up bias is
selection bias (also called verification bias) which
arises when the results of the classification method
under study determine which subjects will undergo
reference classification and thereby be included in
the evaluation. This form of bias is particularly
likely to appear in evaluations of screening tests
because these investigations often have a study
design in which performance of the reference
method is limited to those individuals who test
positive when screened. If selection bias exists, the
sensitivity of a diagnostic method will be overesti-
mated and its specificity underestimated; similarly,
for prognostic studies, the fraction correctly classi-
fied will tend be overestimated in classes with poor
prognoses and underestimated in classes with good
prognoses. There are parametric (Begg and Greenes
1983, Gray
et al.
1984) and nonparametric (Zhou
1996) methods for obtaining unbiased estimates of
study performance when selection bias exists .
The schemes that are employed to sample
individuals from a stipulated clinical setting are of
three general types, referred to as naturalistic,
retrospective, and prospective by Kraemer (1992).
Naturalistic sampling is characterized by either
random sampling or strict consecutive sampling of
the population of interest. Such sampling results in
a study population with a prevalence of disease
comparable to that of the clinical population. Using
this scheme, the estimate of the efficiency of the
study is unbiased but the estimates of sensitivity and
specificity or fraction correctly classified are biased
in inverse proportion to the number of subjects
studied and the prevalence of the diagnostic or
prognostic class. With retrospective sampling,
members of the pertinent clinical population are
screened at random or consecutively using the refer-
ence method and then random subsets of individuals
in each diagnostic or prognostic class are tested
using the study under evaluation. This sampling
approach yields estimates of sensitivity and specific-
ity or fraction correctly classified that are unbiased.
A practical and financial advantage of this approach
compared to that of naturalistic sampling is that the
study under evaluation needs to be performed in
considerably fewer individuals in the diagnostic or
prognostic groups that are common. For instance,
in the evaluation of a study used to diagnose a disor-
der with a prevalence of 0.2, if all of the screen-
positive individuals are subsequently studied, only
one quarter of the screen-negative individuals need
to be studied to have the same number of data points
for the estimation of specificity as there are for the
estimation of sensitivity. The last type of sampling
scheme, prospective sampling, is, as its name
implies, the inverse of retrospective sampling.
Here, the clinical population is screened using the
study being evaluated and then subsets of individuals
classified as to their diagnostic or prognostic class
according to the study are further tested using the
reference method. The unbiased performance
measures obtained using this scheme are the predic-
tive values of a test result. Sensitivity and specific-
ity or fraction correctly classified must be derived
from the respective predictive values and an indirect
estimate of prevalence by using Bayes' formula
(Choi 1992, Kraemer 1992). When compared to
naturalistic sampling, this scheme results in the
reference method being performed on fewer
individuals in the diagnostic or prognostic groups
that are common. This is advantageous when the
reference method is expensive or risky. The disad-
vantage of prospective sampling is that the perform-
ance of a study can only be evaluated at a few
Evaluating Classification Studies
4-4
1...,59,60,61,62,63,64,65,66,67,68 70,71,72,73,74,75,76,77,78,79,...238
Powered by FlippingBook