curves can be constructed. This is ideal. If result
frequency distributions are not given but ROC
curves are presented, ordinal regression methods can
be used to model an aggregate ROC curve (Tosteson
and Begg 1988). In addition, an aggregate likeli-
hood ratio curve can be modeled using logistic
regression techniques (Irwig 1992).
If each evaluation reports only one or a few
sensitivity and specificity pairs or, in the case of a
two-group prognostic study, only a few of the
fraction correctly classified pairs, the findings from
all the evaluations should be plotted together gener-
ating an aggregate ROC curve. The data can also be
modeled to yield a summary ROC curve (Littenberg
and Moses 1993, Irwig et al. 1994). Data pairs that
lie at some distance from a fitted summary ROC
curve are outliers. Explaining such outliers is an
essential component of a quantitative meta-analysis.
A thorough and systematic examination of the
methods employed in the evaluations must be
conducted to identify the methodological differences
that resulted in outlying findings (Charlson
et al.
1987).
Sometimes it is not possible to generate aggre-
gate performance data as part of a meta-analysis
because the reported findings are not consistent with
the assumption of a shared underlying classification
performance. In that case, methodological review of
the evaluations should reveal the causes of the
variability in the data. It can also happen that the
assembled data do not combine in such a way as to
yield a complete description of the diagnostic or
predictive performance of a study. This happens,
for instance, when the evaluations are concerned
with study performance only in a restricted range,
such as when describing the performance of a
diagnostic study only at critical values for which the
specificity is near 0.95. Then all that can be done is
to average the data to produce a single performance
pair—not one associated with a stipulated critical
value but, rather, one associated with a predeter-
mined value of one of the members of the pair.
REFERENCES
Begg CB and Greenes RA. 1983. Assessment of diagnos-
tic tests when disease verification is subject to selec-
tion bias. Biometrics 39:207.
Charlson ME, Ales KL, Simon R, and MacKenzie R.
1987. Why predictive indexes perform less well in
validation studies. Arch Intern Med 147:2155.
Choi BC. 1992. Sensitivity and specificity of a single
diagnostic test in the presence of work-up bias. J Clin
Epidemiol 45:581.
Concato J, Feinstein AR, and Holford TR. 1993. The risk
of determining risk with multivariate models. Ann
Intern Med 118:201.
Dallman PR, Reeves JD, Driggers DA, and Lo EYT.
1981. Diagnosis of iron deficiency: the limitations of
laboratory tests in predicting response to iron treat-
ment in 1-year-old infants. J Pediatr 99:376.
Diamond GA. 1989. Future imperfect: the limitations of
clinical prediction models and the limits of clinical
prediction. J Am Coll Cardiol 14:12A.
Dickersin K and Berlin JA. 1992. Meta-analysis: state of
the science. Epidemiol Rev 14:154.
Fleiss JL. 1981.
Statistical Methods for Rates and
Proportions
. 2nd edition. John Wiley and Sons, New
York.
Gail MH and Green SB. 1976. A generalization of the
one-sided two-sample Kolmogorov-Smirnov statistic
for evaluating diagnostic tests. Biometrics 32:561.
Gray R, Begg CB, and Greenes RA. 1984. Construction
of receiver operating characteristic curves when
disease verification is subject to selection bias. Med
Decis Making 4:151.
Harrell FE, Lee KL, and Mark DB. 1996. Multivariable
prognostic models: issues in developing models,
evaluating assumptions and adequacy, and measuring
and reducing errors. Stat Med 15:361.
Harrell FE, Lee KL, Matchar DB, and Reichert TA.
1985. Regression models for prognostic prediction:
advantages, problems, and suggested solutions.
Cancer Treat Rep 69:1071.
Hosmer DW, Hosmer T, Le Cessie S, and Lemeshow S.
1997. A comparison of goodness-of -fit tests for the
logistic regression model. Stat Med 16:965.
Hosmer DW, Taber S, and Lemeshow S. 1991. The
importance of assessing the fit of logistic regression
models: a case study. Am J Public Health 81:1630.
Hui SL and Walter SD. 1980. Estimating the error rates
of diagnostic tests. Biometrics 36:167.
Irwig L. 1992. Modelling result-specific likelihood ratios.
(Letter). J Clin Epidemiol 45:1335.
Irwig L, Tosteson ANA, Gatsonis C, Lau J, Colditz G,
Chalmers TC, and Mosteller F. 1994. Guidelines for
meta-analyses evaluating diagnostic tests. Ann Intern
Med 120:667.
Jaeschke R, Guyatt G, and Sackett DL. 1994. III. How to
use an article about a diagnostic test A. Are the results
of the study valid? JAMA 271:389; B. What are the
results and will they help me in caring for my
patients? JAMA 271:703.
Jenicek M. 1989. Meta-analysis in medicine. J Clin
Epidemiol 42:35.
Kraemer HC. 1992.
Evaluating Medical Tests
. Sage
Publications, Newbury Park CA.
Evaluating Classification Studies
4-11