The Logic of laboratory Medicine - page 13

computation of the central range of values.
Patterned on conventional statistical practice for
determining significance, this range is defined as the
central 95% of the results. Thus, this representation
of the frequency distribution consists of a statement
of the 2.5 and 97.5 percentile values for the cumula-
tive frequency distribution of the study results. A
more informative description of the frequency distri-
bution involves the complete characterization of the
cumulative frequency distribution. Additionally, the
utility of the study in clinical classification is
increased if the frequency density distribution of
results is also characterized. The frequency density
distribution relates the frequency of occurrence of a
study result to the value of the result while the
cumulative frequency distribution relates a study
result to the frequency of all study results equal to or
less than it.
Frequency distributions may be described
empirically (the nonparametric approach) in which
case the frequencies assigned to a study result are
those observed among the subjects studied or the
data may be mathematically modeled (the parametric
approach) in which case the assigned frequencies are
those predicted by the fitted model. The nonpara-
metric approach has the advantage of not depending
upon the appropriateness of the model chosen to
describe the relationship between the study results
and their frequencies. The major disadvantage of
the approach is the large number of study results that
must be collected in order to describe the distribu-
tion precisely. This shortcoming is particularly
troublesome when the distribution is defined only in
terms the 2.5th and 97.5th percentiles of the distri-
bution because the precision of the approach is worst
in the distribution tails. In general, the nonparamet-
ric approach requires the enrollment of one and
one-half to two times as many subjects as the model-
ing approach in order to derive precise estimates of
the 2.5th and 97.5th percentiles values (Linnet
1987). This disadvantage can be partly overcome by
mathematical smoothing of the distribution.
(Smoothing differs from modeling in that no global
formula for describing the distribution is assumed;
instead, the smoothed line is computed from the data
within successive intervals of the distribution).
Methods are available for smoothing frequency
density distributions (e.g., Willard and Connelly
1992 and Strike 1996) and cumulative frequency
distributions (e.g., Shultz
et al.
1985). The model-
ing of frequency distributions attempts to reveal the
form of the underlying "true" relationship between
the result values and their frequencies and thereby to
describe the frequency distribution more accurately
than is possible by empirical means. The advantages
of modeling include not only the potential for greater
accuracy in the description of the frequency distribu-
tion but also the ability to predict frequencies for all
possible study values, not just those appearing
among the study subjects, and the ability to describe
the full distribution using simply the values of the
parameters defining the model. This permits consid-
erable ease in the manipulation of the frequency data
for the purpose of generating quantitative estimates
of diagnostic and prognostic probabilities. The
major disadvantage of modeling is the potential for a
less accurate description of the frequency distribu-
tion owing to the use of an inappropriate model.
The application of statistical tests of the goodness-of-
fit of a model substantially reduces the chances that
an incorrect model will be used to describe a distri-
bution (Solberg, 1987b).
Figure 1.4 illustrates the calculation of
frequency distributions using nonparametric and
modeling approaches. Panel A shows a set of 200
hypothetical study results generated from a normal
distribution with a mean of 100 and a standard
deviation of 10. The corresponding frequency
density distributions are shown in panel B. The
nonparametric distribution (symbols) has been
plotted without data grouping to demonstrate the
appreciable, and typical, degree of data irregularity.
In general, the grouping of frequencies into result
intervals (histogram bins) lessens the irregularity of
nonparametric distributions (Scott 1979). Knowing
that the data arose from a normal distribution, a
normal distribution has been used for the model
distribution (line). The mean and standard deviation
derived from the data set are 100.35 and 10.37,
respectively. The model distribution appears to
correspond fairly closely to the nonparametric distri-
bution but the scatter in the data makes it difficult to
judge the quality of the fit. Panel C presents the
frequency data as cumulative frequency distri-
butions. The nonparametric distribution (symbols) is
quite well-behaved; although some data irregularity
persists, the scatter seen in the nonparametric
frequency density distribution is largely eliminated
when the data are plotted as cumulative frequencies.
The normal distribution model (line) clearly fits the
empirical data very well. The 2.5 and 97.5 percen-
tiles calculated for these data are 82.98 and 121.03,
Laboratory-based Medical Practice
1-8
1...,3,4,5,6,7,8,9,10,11,12 14,15,16,17,18,19,20,21,22,23,...238
Powered by FlippingBook