Journal
of Health
Economics
THE
9 (1990) 429-445.
SIXTH
STOOL
!$47 Million
Kaye BROW N
Public
Sector
Managemenf
Institute,
Clayton.
Received
August
North-Holland zyxwvutsrqponmlkjihgfedcbaZYXWV
GUAIAC
TEST
That Never Was*
and Colin BURROWS
Faculty of Economics and Politics,
Victoria 3168. Ausrralia
1989, final version
received
Monash
Unicersity,
April 1990
In a 1975 paper, Neuhauser
and Lewicki analysed a colorectal cancer screening policy approved
by the American Cancer Society. Their analysis yielded an incremental
cost per case detected in
excess of 547million.
This vivid demonstration
of the impact of marginal analysis is frequently
cited by health economists
and is often used for pedagogic purposes.
The analysis is incorrect because of two fundamental
errors. We have reanalysed
the protocol
in two stages. After correction
for these errors, the 247million
disappears,
the marginal cost is
quite modest and the policy appears to be defensible on economic grounds.
1. Introduction
The relevance of marginal analysis to the economic evaluation of health
care is not intuitively obvious to non-economists. Small wonder then that
Neuhauser and Lewicki’s (1975a) analysis of the six stool guaiac test in
screening for colorectal cancer is so widely cited by health economists and
used extensively for pedagogic purposes. It highlights the discrepancies that
can exist between marginal and average costs, and makes an overwhelming
case for the relevance of economics to health policy. Specifically, it illustrates
the need to make explicit the implicit opportunity cost of a seemingly
reasonable screening policy recommendation.
That the screening protocol
analysed by Neuhauser and Lewicki was endorsed by the American Cancer
Society (ACS) [see Leffall (1974)] is simply icing on the cake.
In brief, the analysis shows that, despite an unremarkable average cost of
$2,451 per case detected, the marginal cost of the recommended sixth test
exceeds S47million (see table 1).
Although health economics references [e.g., Culyer (1985) Drummond
(1987), Drummond, Stoddart and Torrance (1987), Mooney and Drummond
*We are
earlier draft
Thanks are
revisions to
grateful to John Goss, Heather
Mitchell and David Evans for comments
on an
of this manuscript
and to Kim Yong and Raymond
Li for programming
assistance.
also due to Christine
Hennings
who, as usual, remained
atTable through
all the
tables included herein.
0167-6296/91/803.50
Q 1991-Elsevier
Science
Publishers
B.V. (North-Holland)
?i
Sc re e ning
o ulc o m e s
Sc re e ning
True
;m d c o sls (sc nsilivily
0.9107,
spe c ilic ity
0.6351)
o utc o m e s
po sitive
re sults
Fa lse
po sitive
re sults
Num b e r
Num b e r
Num b e r
Inc re m e nta l
of
Of
Ol
g a in in c a se s
le sls
Pe rc e nt
c a se s
Percent
cases
d e te c te d
Y I .6667
65.Y46Y
36.5079
30’). 1652
65.9469
99.3065
7 I .4424
59.6876
505.4606
5.4Y.56
7 I .9003
74.4048
630.0926
99.Y42
I
9Y.9952
i~nd pre v;de nc e o f 72 c a se s/ lO ,O KM)pe o ple .
7
I .93x5
X3.74’)
I
Sc re e ning
To ta l
c o sts (S)
lnc re m e nla l
I
1,175
1,175
107,690
30, I 79
5,492
I.507
0.45x0
130.1199
22,509
49, I50
I,XIO
7O Y.2240
0.0382
14X.116
17.917
469,534
2,05Y
4,724,695
2,268
77,51
YY.YYY6
7 I .Y4 I7
SY.681’)
759.4661
0.0032
163,141
15,024
99.999’)
71.Y420
Y3.4489
79).3660
0.0003
I76,33
I
13,IYO
*Ne uha use r
a nd Le wic ki
(1975a ,
ta b le s
Ave ra g e
Ma rg ina l
I
77,51
I a nd 2, p. 227).
47.1072
14
2,451
K. Brown
and C. Burrows,
The sixth
stool guaiac
test
331
(1982), Mooney, Russell and Weir (1984), Thompson and Fortess (1980)]
continue to cite Neuhauser and Lewicki’s results, their analysis is, in fact,
seriously incorrect. It’s incorrect in the calculation of one of its parameters,
in its analysis of multiple-test screening and, dramatically so, in its conclusions. Although a number of people [e.g., Kelleher and Vautrain (197%
Prescott, McPherson and Bell (1980)] have expressed misgivings about
aspects of the analysis in brief letters to the New zyxwvutsrqponmlkjihgfedcbaZ
England Jaurnnl of
M edicine, they seem to have been ignored by health economists.
The matter is not trivial. The distinction between average and marginal
costs is central to the analysis of health programmes and the colorectal
cancer screening protocol advocated by Greegor (1971) is typical of many
large-scale screening programmes. From the available evidence, it was
endorsed without formal evaluation of its efficacy or its economic consequences and this is to be deplored. However, it does little good to the
advocacy of economic analysis as an integral part of health services
evaluation if such an analysis is invalid, especially, when, once corrected, it
suggests a contrary conclusion.
The Neuhauser and Lewicki analysis is invalid because two fundamental
errors were made. First, their calculations of the sensitivity and specificity of
the test are incorrect. Second, even if one accepts their stated sensitivity and
specificity assumptions, their calculation of the number of false positive tests
is wrong. Both errors flow from a misunderstanding of the mechanics of
multiple testing and together these misspecitications flow through into the
calculation of total, average and marginal costs. Correcting for the latter
error yields marginal costs that are even greater than those reported by
Neuhauser and Lewicki. Correcting for both errors yields average and
marginal costs of very modest size; not the startling results that made the
Neuhauser and Lewicki paper such a vivid illustration of the benefits of
marginal analysis.
To demonstrate how and why the errors occurred, this paper will renalyse
the screening policy, advocated by Greegor (1971) in two stages. The first
stage is more in the nature of a technical correction and shows the effect on
screening costs of Neuhauser and Lewicki’s underestimation of the test’s false
positive yield, given their specificity and sensitivity assumptions. The result is a
spectacular increase in the apparent marginal cost of the sixth test. The
second stage reanalyses the protocol using levels of sensitivity and specificity
correctly specified from Greegor’s data.
As a caveat, we stress that we do not put forward our reanalysis as a valid
evaluation of the screening policy. First, it is by no means proven that occult
blood testing will reduce mortality from colorectal cancer [Miller (1986),
Simon (1985), Knight, Fielding and Battista (1989), U.S. Preventive Services
Task Force (1989)]. Secondly, barium enema is no longer the only, or usual,
follow-up test. Thirdly, Greegor’s data came from a very small study
432
K. Brown and C. Burrows, The slrth stool guaiac
test
(N=278) and should not be regarded as providing valid estimates of
sensitivity, specificity or prevalence. Fourthly, the assumption that the results
of successive tests are conditionally independent may not be appropriate.
Finally, the protocol should not be analysed as a six-test series. Because the
test materials consist of three slides, each with two windows (see below), the
relevant choice should concern the number of slides collected and tested.
2. Neuhauser and Lewicki’s assumptions
2.1. zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
The screening protocol
Greegor’s faecal occult blood test (FOBT) protocol involves placing
patients at standard risk of colorectal cancer (asymptomatic and usually over
40 years of age) on a meat-free, high-residue diet for 4 days. Subjects are
instructed to smear a small amount of stool onto a guaiac-impregnated slide
using the applicator supplied. Each slide contains two windows. Greegor
advised that six Haemoccult smears be collected for each subject, representing two different portions of stool on each of three consecutive days.
In the event that the FOBT series is positive (see below for positivity
criterion used), a follow-up barium enema study is performed. For the
purpose of their analysis Neuhauser and Lewicki treat the barium enema
study as a definitive test capable of detecting all cancers and yielding no false
positive results (i.e., 100% sensitivity and specificity).
2.2. Yield and cost assumptions
Neuhauser and Lewicki based their analysis on a specific set of assumptions regarding the yield and costs of faecal occult blood tests. For the
purposes of this exercise we have used the same data they used to calculate
test yield and adopted the same cost assumptions.
Neuhauser and Lewicki based their estimates of the sensitivity (true
positive rate) and specificity (true negative rate) on Greegor’s early findings
summarized in table 2. Thus, based on Greegor’s findings that 11 out of 12
tests in two patients shown to have cancer were positive, they assumed that
any single guaiac test would detect 91.67% of the cases of colorectal cancer
in the screening population. That is, the true positive rate (sensitivity) is
0.9167. Similarly, on the basis of Greegor’s finding of 46 false positive tests
out of a total of 126 tests obtained in 22 subjects, Neuhauser and Lewicki
assumed a false positive rate of 36.51%. The specificity (true negative rate) is
therefore 0.6349.
The prevalence of colorectal cancer was estimated to be 71.94 cases per
10,000 people screened given 2 surgically confirmed asymptomatic cancer
cases among the initial 278 people screened by Greegor.
K. Brown
and C. Burrows,
The sixth
stool guaiac
test
133
Table 2
Summary
of Greegor’s
(1969) data - and Neuhauser
specificity assumptions.
and
Lewicki’s
sensitivity
and
Disease status
No cancer
Test result
Cancer
Positive
2 subjects (I
P(+veICa)
=0.916667
(sensitivity)
Negative
( + ve)
( - ve)
‘Assuming
individuals.
(Ca)
I tests)
0 subjects
P(-ve/Ca)
=0.083333
(false - ve)
incidence
of
approximately
(Ca’)
Average”
22 subjects
(46 tests)
P( + ve 1Ca’)
= 0.365079
(false + ve)
254 subjects (1,524 tests)
P( - ve 1Ca’)
=0.634921
(specificity)
72
cases
per
10,000
P( + ve)
= 0.369047
P( - ve)
= 0.630953
asymptomatic
Following Neuhauser and Lewicki, it is assumed that individual guaiac
test outcomes represent zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHG
independent results in the sense that successive results
are not correlated.
The FOBT is assumed to have a variable cost of $1 per test. Overhead
costs of $3 are assumed for administrative expenses and directions about diet
and testing procedures.
2.3. M ultiple tests and the choice of a positicity criterion
The FOBT screening protocol analysed by Neuhauser and Lewicki is not,
strictly speaking, a set of sequential tests. Multiple tests like the FOBT that
are performed and interpreted simultaneously (i.e., as one battery), are
referred to as parallel tests. On the other hand, multiple tests that are
ordered in a sequential manner, such that the nature and extent of
subsequent tests are contingent upon the normal or abnormal results of the
previous test(s), as with the follow-up barium enema examination in
Greegor’s screening protocol, are referred to as serial rests. The type of
analysis carried out by Neuhauser and Lewicki (and by us) is concerned,
essentially, with determining the number of tests to be included in the test
battery; that is the length of the parallel test series.’
In such cases, it is necessary to specify a positiuity criterion or rule that
prescribes whether the test series is positive or negative, given that each
individual test may be normal (negative) or abnormal (positive) and the
results for a battery of n tests may include conflicting combinations of
positive and negative results. The choice of a positivity criterion has a direct
‘Although,
strictly speaking
the alternatives
are limited to batteries
of 2-, 4-, or 6-tests,
following Neuhauser
and Lewicki, the present reanalysis assumes that the range of possibilities
extends from one to six tests.
K. Brown zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFE
and C. Burrows, The sixth stool guaiac rest
434
Table
‘Corrected’
Number
tests
true
positives
sensitivity
and false
(0.916667)
3
positives
with Neuhauser
and specificity (0.634921).
and
Levvicki’s
Sensitivity
True positive
cases’
Specificity
False positive
cases’
0.9 16667
0.993056
0.99942 I
0.999952
0.999996
0.999999 plus
65.945023
7 1.440422
71.898376
71.936531
71.939720
71.939987
0.634921
0.403125
0.255952
0.162510
0.103181
0.065512
3.6245’5579
5925.813477
7.386.949707
8.314.655273
8.903.674805
9.277.656250
of
“CasesilO.
people screened assuming
of cancer for this screening population.
an incidence
of approximately
72 cases
effect on the nature and extent of the trade-off between the test battery’s
combined sensitivity and combined specificity as the number of tests involved
increases. In principle, the choice of a decision rule should depend on the
relative weighting of the consequences of true and false positive and negative
test classifications.
Greegor advocated, and Neuhauser and Lewicki followed, the ‘any-testabnormal’ positivity criterion, that is, any one positive test result is treated as
primae facie indicative of the presence of the disease. This ‘conservative’
criterion will, of course, result in a greater number of false positives for any
given number of parallel tests and, consequently, higher follow-up costs per
case detected than alternative positivity criteria. It will also maximize the
number of true positive cases detected for a battery of n tests.
The effects of applying this positivity criterion, using the sensitivity and
specificity levels for individual guaiac tests used by Neuhauser and Lewicki,
are shown in table 3.’ Noteworthy
here, is the large number of false
positive cases which follows from the comparatively low specificity.
2.4. Screening costs under Neuhauser
and Lewickis
assumptions
Table 4 shows the total, incremental, marginal and average costs associated with the screening outcomes summarised in table 3. This should be
compared with Neuhauser and Lewicki’s (1975a, table 2, Cancer detection
and screening costs with sequential guaiac tests, p. 227), which is reproduced
in our table 1 above. They estimated that the marginal cost of using FOBT
screening to detect colorectal cancer rose from $1,200 to $47.1 million per
cancer detected at the first and sixth tests, respectively. As shown in table 4,
*Under the any-test-abnormal
criterion, zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
combined
sensitivity for the test series T,, TZ,. , 7. =
(sensitivity
of 7,)+(sensitivity
of T,) x [I -sensitivity
of T,)+ ... +(sensitivity
of T,] X
and combined
specificity for the test series T,, Tz ,..., 7.=
[l-sensitivity
of T,, T2 ,..., T,_,)]
(specificity of T,) x (specificity of T,) x ... x (specificity of T,).
T;1 hle 4
Screwing
oulcomcs
and screening COSIS with sequcnliai l’aecd occult blood les~s using Neuhauser
(0.0 16667) and spwilicily (0.63492 I) assumptions.
antI Lewicki’s
sensitivity
Screening oulcrmles
True
positive
Incremental
true
positive
2
65.945023
7 I .440422
65.945023
5.495400
3
4
5
6
71.898376
71.936531
7 1.939720
7 I .939987
Number
Of
IeslS
I
0.457954
0.038 I55
0.003 I 89
0.000267
False
positive
lncremenlal
false
posilive
Total”
Incrrmenl;~lb
Marginal’
Average“
3,624.525879
5.925.8 13477
7.386.949707
X,3 14.655273
8.903.674805
9,277.656250
3.624.525879
2.30 I .287598
1.461.136230
927.105566
589.019532
373.781445
409,047
649,725
805,885
908.659
977,56 I
I ,024,960
409,047
240.678
156,159
102,774
68,902
47,398
6,203
43,796
340,993
2,693,630
2 I ,605,644
177,502,100
6,203
9,095
I I.209
12.63 I
13,589
14,247
Screening cosls ($)
“To~al=cosl
of’ becal occult blood testing on 10,000 people plus coGI of Ibllow-up barium enema examination
individuals with positive resuhs.
blncremenlal=change
in total COSI of screening programme associaled with one more fecal occult blood test.
‘Marginal = incremental COSI per incremental true positive case detected.
‘Average=lolal
cost per true positive case detected.
on all
436
K. Brown zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFE
and C. Burrows, The sixth stool guniac test
the correct estimate of the marginal cost of guaiac screening based on
Neuhauser and Lewicki’s stated assumptions, increases from approximately
$5,750 with the first test to approximately S177.5 million for the sixth test.)
Furthermore, the average cost per cancer detected where the series is made
up of six tests is S14,247, not $2,500 as Neuhauser and Lewicki calculate.
Hence, it would be cheaper to use the barium enema examination as a
screening test ($100 x 10,000/71.94=$13,900) and capture all the true positives rather than to spend S14,247 per person screened on a six-test series and
leave some undetected.
The discrepancy between the two sets of screening cost calculations is
explained by Neuhauser and Lewicki’s underestimation of the number of
false positive classifications associated with the alternative test series. As
Prescott et al. (1980) note, by their own assumptions Neuhauser and Lewicki
underestimate the number of false positive results by a very large order of
magnitude. For example, given the assumed false positive rate of 36.51%,
there are actually 3,625 false positive results associated with a single guaiac
test, not 309 as Neuhauser and Lewicki report. Similarly, whereas Neuhauser
and Lewicki report only about 791 false positive results for a 6-test series
(1975a, table 1, p. 227) in fact, all but 650 of the 9,928 true negatives in the
screening population require a follow-up barium enema study, that is, there
are about 9,278 false positive results.
That there are so many false positive results should be intuitively obvious
from the initial specification of the test’s specificity, given the low prevalence
of colorectal cancer. Neuhauser and Lewicki tell us at the outset that the
false positive rate for the FOBT is 36.5% and we know there are 9,928
individuals who do not have the disease in the screening population. Each
person in the screening population has about a 36.9% chance of getting a
positive result with each test (see table 2). It would seem that in accepting
Neuhauser and Lewicki’s findings at face value we have, as we all do at
times, fallen prey to a common judgemental bias of underestimating the
disjunctive probability of false positive outcomes [Bar-Hillel (1973) Cohen,
Chesnick and Haran (1982)]. As we all should know, the probability of a
series of true negative test results is only (specificity)“.
It is important to see how Neuhauser and Lewicki erred in their
determination of the number of false positives because it is an error of
methodology. Their calculation of the combined false positive rate for
T6 is incorrect. Curiously, they proceed as if there is an overall
T,,T,,..., zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
sPrescott
et al’s (1980) estimate of SlS7.9million
rather than our S177.5 million is explained
by the number of decimal places to which screening outcomes, and the number of true positive
cases in particular,
is calculated
(i.e., a ‘decimal dust’ factor), given a screening population
of
10,000 people. We report our calculations
to six decimal places. If the screening outcomes and
costs were calculated
with greater precision for the actual population
that is the subject of the
ACS recommendation
(viz. all asymptomatic
individuals
over the age of 40 years), then the
marginal cost of the sixth test would be even greater.
K. Brown and C. Burrows,
number offalse
posititle ‘ cases’
The sixth stool guaiac
to be identified,
test
just as there is a gicen
437
number
That is, they assume
an ‘incidence’ for false positives, and each test is assumed to detect 36.51
percent of the false positive ‘cases’.
This adoption of the (odd) assumption of a false positive incidence rate is
explained in Neuhauser’s reply to Prescott et al. (1980). Neuhauser states:
of cancer cases to be detected
in the screening
population.
’ . . . our model assumed that 7.9 percent of the population screened with
six tests would have false positive tests.. . . We assumed that each
additional test would find 36.5 percent of the remaining patients whose
tests would be false positive. Six tests would find 93.4 percent of all false
positive results, or 791 of every 10,000 patients screened. We assumed
that one test would find 3,009 false positive results per 10,000 or 3
percent’.
[Neuhauser (1980, p. 1306)]
The problem here lies in the logical implications of such reasoning; the
notion that, in order to determine the number of false positive cases
associated with a single guaiac test, one must first know the false positive
rate for a single test and the incidence of false positive cases for a test series.
Neuhauser reasons as if one calculates the number of false positive cases in
the same way one calculates the number of true positive cases under the anytest-abnormal criterion. The implication is that the false positive rate per
10,000 persons can be known a priori before the screening protocol (number
of tests in the series, positivity criterion, confirming tests, etc.) is specified.
In fact, the false positive rate for a series of n independent tests is simply
[l -(specificity)“] under the any-test-abnormal criterion. If the false positive
rate for a single test is 0.365, then the false positive rate for a series of six
tests is 0.9345, not 0.079 per 10,000 persons screened. Alternatively, if the
false positive rate is 0.079 per 10,000 persons, then the false positive rate for
a single test is 0.0135, not 0.365. Both statements cannot be true.
However, as Kelleher and Vautrain (1975) pointed out, this 0.365 false
positive rate for a single test itself is incorrect, given Greegor’s (1969)
findings. Neuhauser and Lewicki calculate it by dividing the 46 positive test
results obtained by subjects who were shown not to have colorectal cancer
by the total number of tests completed by the 22 subjects concerned. As
indicated in table 2, there are a further 254 subjects in Greegor’s study who
did not have cancer. Based on the usual definition of the false positive rate,4
the probability of a positive test result given no disease, the correct rate is
46/[ 126 + (6 x 254)] = 0.027879 assuming
Greegor’s
results provide
an
‘Galen (1982) has stated that ‘the term “false oositive rate” should be abandoned’
(D. 690) because there are no fewer than four definitions: (i) FP/(FP+ TN); (ii) FP/(FP+ TN+ fP+ Fb);
(iii) FP/(FP+ TP): and (iv) FP/(FP+ FN). This confusion
notwithstandinn
Neuhauser
and
Lewicki’s assumption
of a specilicity level ‘of 63.49 percent does not confoG
to any of these
detinitions.
K. Brown and C. Burrows, The sixth srool guaiac cesf
438 zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
100
91.7
NEUHAKJSER
,_.-
& LEW ICKI’S
i:
rY
..--__--
_.--- zyxwvutsrqponmlkjihgf
__--
,.*’
ANALYSIS
,/
,/’
I’
,.”
z
g
,’
50-
,/’
,’
,/’
33.9
-
l
/
GREEGOR’S
01
DATA
I
I
I
I
I
1
2
3
4
5
NUMBER
(LENGTH
Fig. 1 Combined
sensitivity
zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQ
of FOBT
OF
TESTS
OF TEST
series: Neuhauser
data.
I
.6
SERIES)
and Lewicki’s
analysis
versus Greegor’s
adequate basis for estimating test accuracy.5 But this is not the end of it.
This is an estimate of the combined false positive rate since the false positive
cases had one or more positive guaiac test results in a battery of six tests. In
fact, the false positive rate for a single test is 0.0047 cases, assuming statistical
independence of successive tests under the any-test-abnormal
criterion.
Similarly, the sensitivity figure of 0.9167 is an estimate of the combined
sensitivity for a battery of six tests in which the sensitivity of a single test is
0.339 1, again assuming that successive tests are statistically independent.(j
The effect of this error is illustrated by figs. 1 and 2. Neuhauser and
Lewicki have implicitly used the levels of sensitivity and specificity after six
tests to derive the screening outcomes and costs associated with doing six
more tests. The original Neuhauser and Lewicki assumption of a sensitivity
‘The source of the 126 in the numerator
of the expression 46/( 126 +6 x 254) is Greegor (1969).
Evidently, not all the subjects with false positive results completed
a full battery of six tests as
per Greegor’s protocol, so there are 126 rather than 6 x 22= 132 test results.
‘Though
we resolved at the outset not to debate the validity of Neuhauser
and Lewicki’s
analysis on the grounds that it is based on Greegor’s early data, we do note parenthetically
that
this estimate of the sensitivity of a single guaiac test accords more nearly with that reported by
Applegate (1981). He cites a personal communication
(from investigators
conducting
a randomized controlled
trial of FOBT screening for colorectal
cancer) to the effect that the number of
cancer cases detected with one card (slide) is only 28%. while three cards increased the yield to
77%. This is the only reference we have found that provides any data on the way in which test
yield changes with the length of the test series.
K. Brown and C. Burrows,
The sixth stool guaiac test
*
Y
GRJZEGOR’S DATA
*
o....... . . . . . . . . . * .___.,...,,_.,.o...... .. ..._..__
439
. .........‘.......................~
NEUHAUSER & LRWICRI’S
REPORTED RESULTS
;/
(
1
‘-;-...
~_.__.___;__~~~~~*s
2
3
NUMBER
(LENGTH
Fig, 2 Combined
4
OF
OF TEST
6
6
TESTS
SERIES)
specificity of FOBT series: Neuhauser
and Lewicki’s
and Lewicki’s stated assumption
versus Greegor’s
analysis
data.
versus Neuhauser
value of 0.9167 for a single test makes it appear that Greegor was advocating
‘flat of the curve medicine’.’
3. Greegor
revisited
It is possible now to calculate the average and marginal costs that do flow
from the six-test protocol, using values for sensitivity and specificity correctly
derived from Greegor’s data.
Table 5 indicates the sensitivity, specificity and screening outcomes associated with a series of up to six independent guaiac tests under the
assumptions that the sensitivity and false positive rate (1 -specificity) of a
single test are 0.3391 and 0.0047, respectively. This table invites direct
comparison with table 3, the ‘corrected’ yields from Neuhauser and Lewicki’s
stated sensitivity and specificity assumptions. The number of false positive
cases after the sixth test falls from 9,278 to 277. Similarly, but not quite so
dramatically, a comparison with their own analysis (table 1) shows a decline
from 791 false positives.
Table 6 gives the cost consequences of the screening outcomes using the
‘A more detailed
explanation
of the complicated
relationships
among
Neuhauser
and
Lewicki’s screening outcomes,
and our corrections
thereof, is available
in K. Brown and C.
Burrows (1989). Did Greegor get it right? Neuhauser
and Lewicki revisited, Working paper no.
3, Public Sector Management
Institute, Monash University.
440
K. Brown
and C. Burrows,
The sixth
Table
stool guaiac
test
5
True positive and false positive test results using correct Greegor assurnptions (sensitivity = 0.339099; specificity = 0.995299; prevalence = 0.007 194).
Number
tests
1
2
3
4
5
6
of
Sensitivity
True positive
cases
Specificity
False positive
cases
0.339099
0.563210
0.711325
0.8092 I4
0.873910
0.916667
24.394783
40.517323
51.172722
58.214882
62.869051
65.945007
0.995299
0.990620
0.985963
0.981328
0.9767 15
0.972123
46.671810
93.124214
139.358249
185.374936
231.175298
216.760354
correct Greegor assumptions and table 7 brings together the true positives
and false positives for the Neuhauser and Lewicki analysis; for their analysis
after correcting for their miscalculation of the number of false positives; and
for the correct Greegor assumptions. In table 8, the cost data from table 1,
table 4 and table 6 are brought together for ease of comparison.
Not surprisingly, there are striking differences. Average cost per case
detected is about $1,884 for the six-test series compared with S2,451 in
Neuhauser and Lewicki’s original analysis, and $14,247 when that analysis is
corrected for their underestimation of the number of false positives.
More dramatic, however, is the decline in marginal cost from their original
$47million ($177 million ‘corrected’ for miscalculation of false positives) to
$4,833 - probably regarded as quite modest for a colorectal cancer screening
program.
The explanation for the vanishing $47million is, by now, straightforward.
Using the combined six-test sensitivity as the single test sensitivity means
that, after two tests, one is beginning to look at ‘decimal dust’ in terms of the
incremental number of true positive cases detected. The problem is compounded by the incorrect definition of specificity employed and the corresponding miscalculation of the false positive rate. The result is a spectacular
overestimation
of the number of false positive cases with consequent
expenditure on follow-up testing.
4. zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
C o nc lusio n
The conclusion reached from all this is somewhat embarrassing. The
screening protocol that has been used as a vehicle for demonstrating the dire
consequences of not incorporating economic analysis into health programme
evaluations appears to be, in terms of cost effectiveness, quite defensible given Neuhauser and Lewicki’s assumptions regarding costs and test independence, and Greegor’s data on test diagnosticity and prevalence.
Prescott et al, (1980) have stated that ‘economic analyses of medical
Table
Screening
outcomes
and
Screening
tesls
I
2
3
4
5
6
‘.“.‘.“For
costs
using
6
correct Greegor assumptions
prevalence = O.OOlI94).
(sensitivity
=0.339099;
specilicity=0.995299;
oulcomes
True
posilive
lncremenlal zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
incremental
Screening cosls ($)
true
False
false
positive
positive
positive
Total”
lncremend
Maryinal’
Averaged
24.394183
40.511323
51.112122
5X.2 14882
62.869051
65.945001
24.394183
16.122540
10.655399
1.042 I60
4.654 I15
3.015951
Number
Ol
screening
explanation
of footnotes,
46.611X10
93.124214
139.35x249
185.314936
23 I. 115298
216.160354
see table 4.
46.611810
46.452404
46.234041
46.0 I6663
45.800369
45.585052
41,101
63,364
19,053
94,359
109,404
124.21 I
41,LOl
16,251
15,689
15,306
15,045
14,866
I.931
1,008
1,412
2,113
3,233
4,833
1,931
1,564
1,545
1,621
1,140
1,884
K. Brown and C. Burrows, zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONM
The sixrh stool guainc test
442
Table 7
Summary
Number
tests
1
2
3
4
5
6
of screening
of
outcomes
(prevalence=
71.94 cases,‘lO,OOO asymptomatic
persons).
Table 1
[Neuhauser
(1975a)l’
and Lewicki
Table 4
[Neuhauser
‘corrected’lb
and Lewicki
Table 5
[correct Greegof
assumptions]
True
positive
False
positive
True
positive
False
positive
True
positive
False
positive
65.9469
71.4424
71.9003
71.9385
71.9417
71.9420
309.1652
505.4606
630.0926
709.2240
759.4661
791.3660
65.9450
71.4404
71.8984
71.9365
71.9397
71.9400
3.624.5259
5.925.8135
7.386.9497
8.314.6553
8,903.6748
9,217.6563
24.3948
40.5173
51.1727
58.2149
62.8691
65.945 1
46.6718
93.1242
139.3583
185.3749
231.1753
276.7604
^Specificity =0.634921; sensitivity=0.916667.
%pecificity = 0.995299; sensitivity = 0.339099.
Table 8
Summary
Table 1
[Neuhauser
( 1975a)Ia
Number
tests
1
2
3
4
5
6
of marginal
and Lewicki
and average
Table 4
[Neuhauser
‘corrected’lb
costs.
and Lewicki
Table 5
[correct Greegor
assumptions]
of
MC’
AC’
MC’
ACd
MC’
AC*
1,175
5,492
49,150
469,534
47724,695
47.107.214
1,175
1,507
1,810
2,059
2,268
2.45 1
6,203
43,796
340,993
2.693.630
2 1.605.644
177,502,100
6,203
9,095
11,209
12,631
13,589
14,247
1,931
1,008
1,472
2,173
3,233
4,833
1.93 1
1,564
1,545
1,621
1,740
1,884
“Speciticity =0.634921; sensitivity = 0.916667.
‘Specificity =0.995299; sensitivity=0.339099.
‘Marginal
cost = incremental
cost per true positive case detected.
dAverage cost = total cost/number
of true positive cases detected.
procedures would be more readily accepted by the medical profession if
examples of its application were supported by accurate analysis of the data
available’ (p. 1306). We concur, but it does the argument for economic
evaluations no good if such a vivid demonstration is found to be invalid especially when the demonstration case appears to be quite defensible.
We can, and should, also make the comment that most evaluations of
health programmes and medical procedures require expertise in a number of
disciplines. These are, par excellence, areas for multidisciplinary teams and, in
this instance, the analysis of the parallel test protocol is more than usually
complicated.
Finally, we can also point to the depressing likelihood that there is a long
way to go before doctors are convinced of the relevance of cost effectiveness
as an outcome measure - however vivid the effect. Certainly, Greegor did not
K. Brown and C. Burrows.
The sixth stool guaiac
test
seem to be impressed by a marginal cost of $47million
associated with a six- rather than a five-test series:
443
per case detected
‘The study raised the question whether the third set of slides should be
omitted. The last 11 cancers I have detected using the slides were all
guaiac positive on the first day. Long-term statistics with large groups
may prove that Dr. Neuhauser’s suggestion can be adopted.
For the time being, however, I am unwilling to emasculate a life-saving
to save 30 cents per patient (the cost of the third set of slides).’
[Neuhauser
test
and Lewicki (1975b, p. 994, emphasis added)]
This response to what has been widely accepted as a graphic illustration of
the folly of ignoring the costs of a screening policy makes plain the problems
economists face in trying to demonstrate to doubting physicians that saving
‘statistical lives’ is necessarily reconcilable with saving the lives of individual
patients, albeit different patients. They implicitly accept the notion of
maximising collective benefit when conducting randomised clinical trials to
determine efficacy but often do not see it in the application of economic
evaluation to health care programmes. zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLK
Appendix A
Formulas used in the recalculation of the sensitivity and specificity of
parallel test series comprising n independent tests are as follows:
A = i
(n,)ax( 1 -a)“-x,
(A.11 zyxwvutsrqponm
x=k
D=l-B=l-
i(n,)b”(l-b)“-“,
(A.2)
x=k
where
A =combined
sensitivity;
D = combined specificity;
B = ‘combined’ false positive rate = (1 -D);
a =sensitivity
of the individual test;
b =false positive rate for a single test;
n =length
of the parallel test series;
x =number of individual tests in the n test series that yield positive
results; and
444
K. Brown and C. Burrows,
The zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPO
sixrh stool guaiac rest
k =positivity criterion, i.e., the number of individual tests that must
be positive in order for the test zyxwvutsrqponmlkjihgfedcbaZYXWVUTSR
series to be classed as positive.
Under the any-test-abnormal
(A.2) can be written as
positivity criterion (i.e., zyxwvutsrqponmlkjihgfedcbaZYXWVUTSR
k> l), eqs. (A.l) and
A=l-(l-a)“=l-c”,
(A.1’)
1 -B=(l
(A.2’)
D=
-b)“=d”,
where
c = the false negative rate associated with a single test; and
d = the true negative rate (specificity) of a single test.
Greegor’s data (summarized in table 1) reflect the disease status of
individuals screened after the any-test-abnormal positivity criterion has been
applied to their test results. Accordingly, the sensitivity and specificity rates
calculated, 0.9167 and 0.9721, respectively, can be interpreted as providing
estimates of the combined sensitivity and specificity, respectively, associated
with the six-test protocol, that is, A =0.9167 and D =0.9721.
The sensitivity and specificity values associated with a single (guaiac) test
are obtained by solving eqs. (A.1’) and (A.2’) for a and zyxwvutsrqponmlkjihgfedcbaZYX
6, respectively, for
n = 6, as per Greegor’s protocol. Thus,
A=0.9167=(1-a)6
o
a=1-(0.0833)“6=0.3391
D=0.9721=(1-b)6
o
(l-b)=(0.9721)“6,
and
b=0.0047.
References
AmericanCancer Society, 1980,Guidelines
for the cancer-related
checkup:
Recommendations
and rationale, A Cancer Journal for Clinicians 30, 194-240.
Applegate, W.B., 1981, Colorectal cancer screening, Journal of Community
Health 7, 138-151.
Bar-Hillel, M., 1973, On the subjective probability
of compound
events, Organisational
Behavior
and Human Performance
9, 396-406.
Cohen, J.C., E.I. Chesnick
and D. Haran,
1982, Evaluation
of compound
probabilities
in
sequential
choice, in: D. Kahneman,
P. Slavic and A. Tversky,
eds., Judgment
under
uncertainty:
Heuristics and biases (Cambridge
University
Press, Cambridge).
Culyer, A.J., 1985. Economics (Blackwell, Oxford).
Department
of Clinical
Epidemiology
and Biostatistics
(Stoddart,
G.L. et al.), McMaster
University
Health Sciences Centre, 1984, How to read clinical journals:
VII, To understand
an economic evaluation (part B), Canadian
Medical Association
Journal 130, 1542-1549.
K. Brown
and C. Burru~s.
The sixth
stool guaiac
test
445
Drummond.
M.F.. 1987, Methods
for economic
appraisal
of health technology.
in: M.F.
Drummond,
ed., Economic
appraisal
of health technology
in the European
community
(Oxford University Press, Oxford).
Drummond.
M.F.. G.L. Stoddart
and G.W. Torrance,
1987, Me:hods
for the economic
evaluation of health programmes
(Oxford University Press. Oxford).
Galen. R.S.. 1982. Application
of the predictive value model in the analysis of test effectiveness.
Clinical and Laboratory
Medicine 2, 685-699.
Greegor, D.H.. 1969, Detection of silent colon cancer in routine examination,
A Cancer Journal
for Clinicians 19, 330-337.
Greegor. D.H., 1971, Occult blood testing for detection of asymptomatic
colon cancer. Cancer
28, 131-134.
Greegor, D.H.. 1975, [letter], New England Journal of Medicine 293, 994.
Kelleher, M. and R. Vautrain.
1975. [letter], New England Journal of Medicine 293, 995.
Knight, K.K.. J.E. Fielding and R.N. Battista, 1989, Occult blood screening for colorectal cancer.
Journal of the American Medical Association
261. 587-593.
Lefhdll, L.D.. 1974, Early diagnosis
of colorectal
cancer, A Cancer Journal
for Clinicians
24,
152-159.
Miller, A.B., 1986. Principles
of screening
for colorectal
cancer, Frontiers
in Gastrointestinal
Research 10. 35-4.5.
Mooney, G.H. and M.F. Drummond,
1982, Essentials of health economics - What is economics?
Part 1. British Journal of Medicine 285, 10211025.
for health
care (McMillan,
Mooney,
G.H., E.M. Russell and R.D. Weir, 1980, Choices
Houndmills).
Neuhauser.
D., 1980. [letter], New England Journal of Medicine 303. 13061307.
Neuhauser.
D. and A.M. Lewicki, 1975a. What do we gain from the sixth stool guaiac?, New
England Journal of Medicine 293. 226-228.
Neuhauser.
D. and A.M. Lewicki, 1975b. [letter]. New England Journal of Medicine 293, 995.
Prescott, N.. K. McPherson
and J. Bell, 1980. Cost effectiveness of screening for occult blood in
the stool: Another look, New England Journal of Medicine 303, 1306.
Simon, J.B., 1985, Occult blood screening for colorectal
carcinoma:
A critical review. Gastreoenterology 88, 820-837.
Thompson,
MS. and E.E. Fortess. 1980, Cost-effectiveness
analysis in health program
evaluation, Evaluation
Review 4, 549-568.
U.S. Preventive Services Task Force, 1989, Recommendations
for fecal occult blood screening,
Journal of the American Medical Association
261. 586.