Academia.eduAcademia.edu
Journal of Health Economics THE 9 (1990) 429-445. SIXTH STOOL !$47 Million Kaye BROW N Public Sector Managemenf Institute, Clayton. Received August North-Holland zyxwvutsrqponmlkjihgfedcbaZYXWV GUAIAC TEST That Never Was* and Colin BURROWS Faculty of Economics and Politics, Victoria 3168. Ausrralia 1989, final version received Monash Unicersity, April 1990 In a 1975 paper, Neuhauser and Lewicki analysed a colorectal cancer screening policy approved by the American Cancer Society. Their analysis yielded an incremental cost per case detected in excess of 547million. This vivid demonstration of the impact of marginal analysis is frequently cited by health economists and is often used for pedagogic purposes. The analysis is incorrect because of two fundamental errors. We have reanalysed the protocol in two stages. After correction for these errors, the 247million disappears, the marginal cost is quite modest and the policy appears to be defensible on economic grounds. 1. Introduction The relevance of marginal analysis to the economic evaluation of health care is not intuitively obvious to non-economists. Small wonder then that Neuhauser and Lewicki’s (1975a) analysis of the six stool guaiac test in screening for colorectal cancer is so widely cited by health economists and used extensively for pedagogic purposes. It highlights the discrepancies that can exist between marginal and average costs, and makes an overwhelming case for the relevance of economics to health policy. Specifically, it illustrates the need to make explicit the implicit opportunity cost of a seemingly reasonable screening policy recommendation. That the screening protocol analysed by Neuhauser and Lewicki was endorsed by the American Cancer Society (ACS) [see Leffall (1974)] is simply icing on the cake. In brief, the analysis shows that, despite an unremarkable average cost of $2,451 per case detected, the marginal cost of the recommended sixth test exceeds S47million (see table 1). Although health economics references [e.g., Culyer (1985) Drummond (1987), Drummond, Stoddart and Torrance (1987), Mooney and Drummond *We are earlier draft Thanks are revisions to grateful to John Goss, Heather Mitchell and David Evans for comments on an of this manuscript and to Kim Yong and Raymond Li for programming assistance. also due to Christine Hennings who, as usual, remained atTable through all the tables included herein. 0167-6296/91/803.50 Q 1991-Elsevier Science Publishers B.V. (North-Holland) ?i Sc re e ning o ulc o m e s Sc re e ning True ;m d c o sls (sc nsilivily 0.9107, spe c ilic ity 0.6351) o utc o m e s po sitive re sults Fa lse po sitive re sults Num b e r Num b e r Num b e r Inc re m e nta l of Of Ol g a in in c a se s le sls Pe rc e nt c a se s Percent cases d e te c te d Y I .6667 65.Y46Y 36.5079 30’). 1652 65.9469 99.3065 7 I .4424 59.6876 505.4606 5.4Y.56 7 I .9003 74.4048 630.0926 99.Y42 I 9Y.9952 i~nd pre v;de nc e o f 72 c a se s/ lO ,O KM)pe o ple . 7 I .93x5 X3.74’) I Sc re e ning To ta l c o sts (S) lnc re m e nla l I 1,175 1,175 107,690 30, I 79 5,492 I.507 0.45x0 130.1199 22,509 49, I50 I,XIO 7O Y.2240 0.0382 14X.116 17.917 469,534 2,05Y 4,724,695 2,268 77,51 YY.YYY6 7 I .Y4 I7 SY.681’) 759.4661 0.0032 163,141 15,024 99.999’) 71.Y420 Y3.4489 79).3660 0.0003 I76,33 I 13,IYO *Ne uha use r a nd Le wic ki (1975a , ta b le s Ave ra g e Ma rg ina l I 77,51 I a nd 2, p. 227). 47.1072 14 2,451 K. Brown and C. Burrows, The sixth stool guaiac test 331 (1982), Mooney, Russell and Weir (1984), Thompson and Fortess (1980)] continue to cite Neuhauser and Lewicki’s results, their analysis is, in fact, seriously incorrect. It’s incorrect in the calculation of one of its parameters, in its analysis of multiple-test screening and, dramatically so, in its conclusions. Although a number of people [e.g., Kelleher and Vautrain (197% Prescott, McPherson and Bell (1980)] have expressed misgivings about aspects of the analysis in brief letters to the New zyxwvutsrqponmlkjihgfedcbaZ England Jaurnnl of M edicine, they seem to have been ignored by health economists. The matter is not trivial. The distinction between average and marginal costs is central to the analysis of health programmes and the colorectal cancer screening protocol advocated by Greegor (1971) is typical of many large-scale screening programmes. From the available evidence, it was endorsed without formal evaluation of its efficacy or its economic consequences and this is to be deplored. However, it does little good to the advocacy of economic analysis as an integral part of health services evaluation if such an analysis is invalid, especially, when, once corrected, it suggests a contrary conclusion. The Neuhauser and Lewicki analysis is invalid because two fundamental errors were made. First, their calculations of the sensitivity and specificity of the test are incorrect. Second, even if one accepts their stated sensitivity and specificity assumptions, their calculation of the number of false positive tests is wrong. Both errors flow from a misunderstanding of the mechanics of multiple testing and together these misspecitications flow through into the calculation of total, average and marginal costs. Correcting for the latter error yields marginal costs that are even greater than those reported by Neuhauser and Lewicki. Correcting for both errors yields average and marginal costs of very modest size; not the startling results that made the Neuhauser and Lewicki paper such a vivid illustration of the benefits of marginal analysis. To demonstrate how and why the errors occurred, this paper will renalyse the screening policy, advocated by Greegor (1971) in two stages. The first stage is more in the nature of a technical correction and shows the effect on screening costs of Neuhauser and Lewicki’s underestimation of the test’s false positive yield, given their specificity and sensitivity assumptions. The result is a spectacular increase in the apparent marginal cost of the sixth test. The second stage reanalyses the protocol using levels of sensitivity and specificity correctly specified from Greegor’s data. As a caveat, we stress that we do not put forward our reanalysis as a valid evaluation of the screening policy. First, it is by no means proven that occult blood testing will reduce mortality from colorectal cancer [Miller (1986), Simon (1985), Knight, Fielding and Battista (1989), U.S. Preventive Services Task Force (1989)]. Secondly, barium enema is no longer the only, or usual, follow-up test. Thirdly, Greegor’s data came from a very small study 432 K. Brown and C. Burrows, The slrth stool guaiac test (N=278) and should not be regarded as providing valid estimates of sensitivity, specificity or prevalence. Fourthly, the assumption that the results of successive tests are conditionally independent may not be appropriate. Finally, the protocol should not be analysed as a six-test series. Because the test materials consist of three slides, each with two windows (see below), the relevant choice should concern the number of slides collected and tested. 2. Neuhauser and Lewicki’s assumptions 2.1. zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA The screening protocol Greegor’s faecal occult blood test (FOBT) protocol involves placing patients at standard risk of colorectal cancer (asymptomatic and usually over 40 years of age) on a meat-free, high-residue diet for 4 days. Subjects are instructed to smear a small amount of stool onto a guaiac-impregnated slide using the applicator supplied. Each slide contains two windows. Greegor advised that six Haemoccult smears be collected for each subject, representing two different portions of stool on each of three consecutive days. In the event that the FOBT series is positive (see below for positivity criterion used), a follow-up barium enema study is performed. For the purpose of their analysis Neuhauser and Lewicki treat the barium enema study as a definitive test capable of detecting all cancers and yielding no false positive results (i.e., 100% sensitivity and specificity). 2.2. Yield and cost assumptions Neuhauser and Lewicki based their analysis on a specific set of assumptions regarding the yield and costs of faecal occult blood tests. For the purposes of this exercise we have used the same data they used to calculate test yield and adopted the same cost assumptions. Neuhauser and Lewicki based their estimates of the sensitivity (true positive rate) and specificity (true negative rate) on Greegor’s early findings summarized in table 2. Thus, based on Greegor’s findings that 11 out of 12 tests in two patients shown to have cancer were positive, they assumed that any single guaiac test would detect 91.67% of the cases of colorectal cancer in the screening population. That is, the true positive rate (sensitivity) is 0.9167. Similarly, on the basis of Greegor’s finding of 46 false positive tests out of a total of 126 tests obtained in 22 subjects, Neuhauser and Lewicki assumed a false positive rate of 36.51%. The specificity (true negative rate) is therefore 0.6349. The prevalence of colorectal cancer was estimated to be 71.94 cases per 10,000 people screened given 2 surgically confirmed asymptomatic cancer cases among the initial 278 people screened by Greegor. K. Brown and C. Burrows, The sixth stool guaiac test 133 Table 2 Summary of Greegor’s (1969) data - and Neuhauser specificity assumptions. and Lewicki’s sensitivity and Disease status No cancer Test result Cancer Positive 2 subjects (I P(+veICa) =0.916667 (sensitivity) Negative ( + ve) ( - ve) ‘Assuming individuals. (Ca) I tests) 0 subjects P(-ve/Ca) =0.083333 (false - ve) incidence of approximately (Ca’) Average” 22 subjects (46 tests) P( + ve 1Ca’) = 0.365079 (false + ve) 254 subjects (1,524 tests) P( - ve 1Ca’) =0.634921 (specificity) 72 cases per 10,000 P( + ve) = 0.369047 P( - ve) = 0.630953 asymptomatic Following Neuhauser and Lewicki, it is assumed that individual guaiac test outcomes represent zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHG independent results in the sense that successive results are not correlated. The FOBT is assumed to have a variable cost of $1 per test. Overhead costs of $3 are assumed for administrative expenses and directions about diet and testing procedures. 2.3. M ultiple tests and the choice of a positicity criterion The FOBT screening protocol analysed by Neuhauser and Lewicki is not, strictly speaking, a set of sequential tests. Multiple tests like the FOBT that are performed and interpreted simultaneously (i.e., as one battery), are referred to as parallel tests. On the other hand, multiple tests that are ordered in a sequential manner, such that the nature and extent of subsequent tests are contingent upon the normal or abnormal results of the previous test(s), as with the follow-up barium enema examination in Greegor’s screening protocol, are referred to as serial rests. The type of analysis carried out by Neuhauser and Lewicki (and by us) is concerned, essentially, with determining the number of tests to be included in the test battery; that is the length of the parallel test series.’ In such cases, it is necessary to specify a positiuity criterion or rule that prescribes whether the test series is positive or negative, given that each individual test may be normal (negative) or abnormal (positive) and the results for a battery of n tests may include conflicting combinations of positive and negative results. The choice of a positivity criterion has a direct ‘Although, strictly speaking the alternatives are limited to batteries of 2-, 4-, or 6-tests, following Neuhauser and Lewicki, the present reanalysis assumes that the range of possibilities extends from one to six tests. K. Brown zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFE and C. Burrows, The sixth stool guaiac rest 434 Table ‘Corrected’ Number tests true positives sensitivity and false (0.916667) 3 positives with Neuhauser and specificity (0.634921). and Levvicki’s Sensitivity True positive cases’ Specificity False positive cases’ 0.9 16667 0.993056 0.99942 I 0.999952 0.999996 0.999999 plus 65.945023 7 1.440422 71.898376 71.936531 71.939720 71.939987 0.634921 0.403125 0.255952 0.162510 0.103181 0.065512 3.6245’5579 5925.813477 7.386.949707 8.314.655273 8.903.674805 9.277.656250 of “CasesilO. people screened assuming of cancer for this screening population. an incidence of approximately 72 cases effect on the nature and extent of the trade-off between the test battery’s combined sensitivity and combined specificity as the number of tests involved increases. In principle, the choice of a decision rule should depend on the relative weighting of the consequences of true and false positive and negative test classifications. Greegor advocated, and Neuhauser and Lewicki followed, the ‘any-testabnormal’ positivity criterion, that is, any one positive test result is treated as primae facie indicative of the presence of the disease. This ‘conservative’ criterion will, of course, result in a greater number of false positives for any given number of parallel tests and, consequently, higher follow-up costs per case detected than alternative positivity criteria. It will also maximize the number of true positive cases detected for a battery of n tests. The effects of applying this positivity criterion, using the sensitivity and specificity levels for individual guaiac tests used by Neuhauser and Lewicki, are shown in table 3.’ Noteworthy here, is the large number of false positive cases which follows from the comparatively low specificity. 2.4. Screening costs under Neuhauser and Lewickis assumptions Table 4 shows the total, incremental, marginal and average costs associated with the screening outcomes summarised in table 3. This should be compared with Neuhauser and Lewicki’s (1975a, table 2, Cancer detection and screening costs with sequential guaiac tests, p. 227), which is reproduced in our table 1 above. They estimated that the marginal cost of using FOBT screening to detect colorectal cancer rose from $1,200 to $47.1 million per cancer detected at the first and sixth tests, respectively. As shown in table 4, *Under the any-test-abnormal criterion, zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA combined sensitivity for the test series T,, TZ,. , 7. = (sensitivity of 7,)+(sensitivity of T,) x [I -sensitivity of T,)+ ... +(sensitivity of T,] X and combined specificity for the test series T,, Tz ,..., 7.= [l-sensitivity of T,, T2 ,..., T,_,)] (specificity of T,) x (specificity of T,) x ... x (specificity of T,). T;1 hle 4 Screwing oulcomcs and screening COSIS with sequcnliai l’aecd occult blood les~s using Neuhauser (0.0 16667) and spwilicily (0.63492 I) assumptions. antI Lewicki’s sensitivity Screening oulcrmles True positive Incremental true positive 2 65.945023 7 I .440422 65.945023 5.495400 3 4 5 6 71.898376 71.936531 7 1.939720 7 I .939987 Number Of IeslS I 0.457954 0.038 I55 0.003 I 89 0.000267 False positive lncremenlal false posilive Total” Incrrmenl;~lb Marginal’ Average“ 3,624.525879 5.925.8 13477 7.386.949707 X,3 14.655273 8.903.674805 9,277.656250 3.624.525879 2.30 I .287598 1.461.136230 927.105566 589.019532 373.781445 409,047 649,725 805,885 908.659 977,56 I I ,024,960 409,047 240.678 156,159 102,774 68,902 47,398 6,203 43,796 340,993 2,693,630 2 I ,605,644 177,502,100 6,203 9,095 I I.209 12.63 I 13,589 14,247 Screening cosls ($) “To~al=cosl of’ becal occult blood testing on 10,000 people plus coGI of Ibllow-up barium enema examination individuals with positive resuhs. blncremenlal=change in total COSI of screening programme associaled with one more fecal occult blood test. ‘Marginal = incremental COSI per incremental true positive case detected. ‘Average=lolal cost per true positive case detected. on all 436 K. Brown zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFE and C. Burrows, The sixth stool guniac test the correct estimate of the marginal cost of guaiac screening based on Neuhauser and Lewicki’s stated assumptions, increases from approximately $5,750 with the first test to approximately S177.5 million for the sixth test.) Furthermore, the average cost per cancer detected where the series is made up of six tests is S14,247, not $2,500 as Neuhauser and Lewicki calculate. Hence, it would be cheaper to use the barium enema examination as a screening test ($100 x 10,000/71.94=$13,900) and capture all the true positives rather than to spend S14,247 per person screened on a six-test series and leave some undetected. The discrepancy between the two sets of screening cost calculations is explained by Neuhauser and Lewicki’s underestimation of the number of false positive classifications associated with the alternative test series. As Prescott et al. (1980) note, by their own assumptions Neuhauser and Lewicki underestimate the number of false positive results by a very large order of magnitude. For example, given the assumed false positive rate of 36.51%, there are actually 3,625 false positive results associated with a single guaiac test, not 309 as Neuhauser and Lewicki report. Similarly, whereas Neuhauser and Lewicki report only about 791 false positive results for a 6-test series (1975a, table 1, p. 227) in fact, all but 650 of the 9,928 true negatives in the screening population require a follow-up barium enema study, that is, there are about 9,278 false positive results. That there are so many false positive results should be intuitively obvious from the initial specification of the test’s specificity, given the low prevalence of colorectal cancer. Neuhauser and Lewicki tell us at the outset that the false positive rate for the FOBT is 36.5% and we know there are 9,928 individuals who do not have the disease in the screening population. Each person in the screening population has about a 36.9% chance of getting a positive result with each test (see table 2). It would seem that in accepting Neuhauser and Lewicki’s findings at face value we have, as we all do at times, fallen prey to a common judgemental bias of underestimating the disjunctive probability of false positive outcomes [Bar-Hillel (1973) Cohen, Chesnick and Haran (1982)]. As we all should know, the probability of a series of true negative test results is only (specificity)“. It is important to see how Neuhauser and Lewicki erred in their determination of the number of false positives because it is an error of methodology. Their calculation of the combined false positive rate for T6 is incorrect. Curiously, they proceed as if there is an overall T,,T,,..., zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA sPrescott et al’s (1980) estimate of SlS7.9million rather than our S177.5 million is explained by the number of decimal places to which screening outcomes, and the number of true positive cases in particular, is calculated (i.e., a ‘decimal dust’ factor), given a screening population of 10,000 people. We report our calculations to six decimal places. If the screening outcomes and costs were calculated with greater precision for the actual population that is the subject of the ACS recommendation (viz. all asymptomatic individuals over the age of 40 years), then the marginal cost of the sixth test would be even greater. K. Brown and C. Burrows, number offalse posititle ‘ cases’ The sixth stool guaiac to be identified, test just as there is a gicen 437 number That is, they assume an ‘incidence’ for false positives, and each test is assumed to detect 36.51 percent of the false positive ‘cases’. This adoption of the (odd) assumption of a false positive incidence rate is explained in Neuhauser’s reply to Prescott et al. (1980). Neuhauser states: of cancer cases to be detected in the screening population. ’ . . . our model assumed that 7.9 percent of the population screened with six tests would have false positive tests.. . . We assumed that each additional test would find 36.5 percent of the remaining patients whose tests would be false positive. Six tests would find 93.4 percent of all false positive results, or 791 of every 10,000 patients screened. We assumed that one test would find 3,009 false positive results per 10,000 or 3 percent’. [Neuhauser (1980, p. 1306)] The problem here lies in the logical implications of such reasoning; the notion that, in order to determine the number of false positive cases associated with a single guaiac test, one must first know the false positive rate for a single test and the incidence of false positive cases for a test series. Neuhauser reasons as if one calculates the number of false positive cases in the same way one calculates the number of true positive cases under the anytest-abnormal criterion. The implication is that the false positive rate per 10,000 persons can be known a priori before the screening protocol (number of tests in the series, positivity criterion, confirming tests, etc.) is specified. In fact, the false positive rate for a series of n independent tests is simply [l -(specificity)“] under the any-test-abnormal criterion. If the false positive rate for a single test is 0.365, then the false positive rate for a series of six tests is 0.9345, not 0.079 per 10,000 persons screened. Alternatively, if the false positive rate is 0.079 per 10,000 persons, then the false positive rate for a single test is 0.0135, not 0.365. Both statements cannot be true. However, as Kelleher and Vautrain (1975) pointed out, this 0.365 false positive rate for a single test itself is incorrect, given Greegor’s (1969) findings. Neuhauser and Lewicki calculate it by dividing the 46 positive test results obtained by subjects who were shown not to have colorectal cancer by the total number of tests completed by the 22 subjects concerned. As indicated in table 2, there are a further 254 subjects in Greegor’s study who did not have cancer. Based on the usual definition of the false positive rate,4 the probability of a positive test result given no disease, the correct rate is 46/[ 126 + (6 x 254)] = 0.027879 assuming Greegor’s results provide an ‘Galen (1982) has stated that ‘the term “false oositive rate” should be abandoned’ (D. 690) because there are no fewer than four definitions: (i) FP/(FP+ TN); (ii) FP/(FP+ TN+ fP+ Fb); (iii) FP/(FP+ TP): and (iv) FP/(FP+ FN). This confusion notwithstandinn Neuhauser and Lewicki’s assumption of a specilicity level ‘of 63.49 percent does not confoG to any of these detinitions. K. Brown and C. Burrows, The sixth srool guaiac cesf 438 zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA 100 91.7 NEUHAKJSER ,_.- & LEW ICKI’S i: rY ..--__-- _.--- zyxwvutsrqponmlkjihgf __-- ,.*’ ANALYSIS ,/ ,/’ I’ ,.” z g ,’ 50- ,/’ ,’ ,/’ 33.9 - l / GREEGOR’S 01 DATA I I I I I 1 2 3 4 5 NUMBER (LENGTH Fig. 1 Combined sensitivity zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQ of FOBT OF TESTS OF TEST series: Neuhauser data. I .6 SERIES) and Lewicki’s analysis versus Greegor’s adequate basis for estimating test accuracy.5 But this is not the end of it. This is an estimate of the combined false positive rate since the false positive cases had one or more positive guaiac test results in a battery of six tests. In fact, the false positive rate for a single test is 0.0047 cases, assuming statistical independence of successive tests under the any-test-abnormal criterion. Similarly, the sensitivity figure of 0.9167 is an estimate of the combined sensitivity for a battery of six tests in which the sensitivity of a single test is 0.339 1, again assuming that successive tests are statistically independent.(j The effect of this error is illustrated by figs. 1 and 2. Neuhauser and Lewicki have implicitly used the levels of sensitivity and specificity after six tests to derive the screening outcomes and costs associated with doing six more tests. The original Neuhauser and Lewicki assumption of a sensitivity ‘The source of the 126 in the numerator of the expression 46/( 126 +6 x 254) is Greegor (1969). Evidently, not all the subjects with false positive results completed a full battery of six tests as per Greegor’s protocol, so there are 126 rather than 6 x 22= 132 test results. ‘Though we resolved at the outset not to debate the validity of Neuhauser and Lewicki’s analysis on the grounds that it is based on Greegor’s early data, we do note parenthetically that this estimate of the sensitivity of a single guaiac test accords more nearly with that reported by Applegate (1981). He cites a personal communication (from investigators conducting a randomized controlled trial of FOBT screening for colorectal cancer) to the effect that the number of cancer cases detected with one card (slide) is only 28%. while three cards increased the yield to 77%. This is the only reference we have found that provides any data on the way in which test yield changes with the length of the test series. K. Brown and C. Burrows, The sixth stool guaiac test * Y GRJZEGOR’S DATA * o....... . . . . . . . . . * .___.,...,,_.,.o...... .. ..._..__ 439 . .........‘.......................~ NEUHAUSER & LRWICRI’S REPORTED RESULTS ;/ ( 1 ‘-;-... ~_.__.___;__~~~~~*s 2 3 NUMBER (LENGTH Fig, 2 Combined 4 OF OF TEST 6 6 TESTS SERIES) specificity of FOBT series: Neuhauser and Lewicki’s and Lewicki’s stated assumption versus Greegor’s analysis data. versus Neuhauser value of 0.9167 for a single test makes it appear that Greegor was advocating ‘flat of the curve medicine’.’ 3. Greegor revisited It is possible now to calculate the average and marginal costs that do flow from the six-test protocol, using values for sensitivity and specificity correctly derived from Greegor’s data. Table 5 indicates the sensitivity, specificity and screening outcomes associated with a series of up to six independent guaiac tests under the assumptions that the sensitivity and false positive rate (1 -specificity) of a single test are 0.3391 and 0.0047, respectively. This table invites direct comparison with table 3, the ‘corrected’ yields from Neuhauser and Lewicki’s stated sensitivity and specificity assumptions. The number of false positive cases after the sixth test falls from 9,278 to 277. Similarly, but not quite so dramatically, a comparison with their own analysis (table 1) shows a decline from 791 false positives. Table 6 gives the cost consequences of the screening outcomes using the ‘A more detailed explanation of the complicated relationships among Neuhauser and Lewicki’s screening outcomes, and our corrections thereof, is available in K. Brown and C. Burrows (1989). Did Greegor get it right? Neuhauser and Lewicki revisited, Working paper no. 3, Public Sector Management Institute, Monash University. 440 K. Brown and C. Burrows, The sixth Table stool guaiac test 5 True positive and false positive test results using correct Greegor assurnptions (sensitivity = 0.339099; specificity = 0.995299; prevalence = 0.007 194). Number tests 1 2 3 4 5 6 of Sensitivity True positive cases Specificity False positive cases 0.339099 0.563210 0.711325 0.8092 I4 0.873910 0.916667 24.394783 40.517323 51.172722 58.214882 62.869051 65.945007 0.995299 0.990620 0.985963 0.981328 0.9767 15 0.972123 46.671810 93.124214 139.358249 185.374936 231.175298 216.760354 correct Greegor assumptions and table 7 brings together the true positives and false positives for the Neuhauser and Lewicki analysis; for their analysis after correcting for their miscalculation of the number of false positives; and for the correct Greegor assumptions. In table 8, the cost data from table 1, table 4 and table 6 are brought together for ease of comparison. Not surprisingly, there are striking differences. Average cost per case detected is about $1,884 for the six-test series compared with S2,451 in Neuhauser and Lewicki’s original analysis, and $14,247 when that analysis is corrected for their underestimation of the number of false positives. More dramatic, however, is the decline in marginal cost from their original $47million ($177 million ‘corrected’ for miscalculation of false positives) to $4,833 - probably regarded as quite modest for a colorectal cancer screening program. The explanation for the vanishing $47million is, by now, straightforward. Using the combined six-test sensitivity as the single test sensitivity means that, after two tests, one is beginning to look at ‘decimal dust’ in terms of the incremental number of true positive cases detected. The problem is compounded by the incorrect definition of specificity employed and the corresponding miscalculation of the false positive rate. The result is a spectacular overestimation of the number of false positive cases with consequent expenditure on follow-up testing. 4. zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA C o nc lusio n The conclusion reached from all this is somewhat embarrassing. The screening protocol that has been used as a vehicle for demonstrating the dire consequences of not incorporating economic analysis into health programme evaluations appears to be, in terms of cost effectiveness, quite defensible given Neuhauser and Lewicki’s assumptions regarding costs and test independence, and Greegor’s data on test diagnosticity and prevalence. Prescott et al, (1980) have stated that ‘economic analyses of medical Table Screening outcomes and Screening tesls I 2 3 4 5 6 ‘.“.‘.“For costs using 6 correct Greegor assumptions prevalence = O.OOlI94). (sensitivity =0.339099; specilicity=0.995299; oulcomes True posilive lncremenlal zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA incremental Screening cosls ($) true False false positive positive positive Total” lncremend Maryinal’ Averaged 24.394183 40.511323 51.112122 5X.2 14882 62.869051 65.945001 24.394183 16.122540 10.655399 1.042 I60 4.654 I15 3.015951 Number Ol screening explanation of footnotes, 46.611X10 93.124214 139.35x249 185.314936 23 I. 115298 216.160354 see table 4. 46.611810 46.452404 46.234041 46.0 I6663 45.800369 45.585052 41,101 63,364 19,053 94,359 109,404 124.21 I 41,LOl 16,251 15,689 15,306 15,045 14,866 I.931 1,008 1,412 2,113 3,233 4,833 1,931 1,564 1,545 1,621 1,140 1,884 K. Brown and C. Burrows, zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONM The sixrh stool guainc test 442 Table 7 Summary Number tests 1 2 3 4 5 6 of screening of outcomes (prevalence= 71.94 cases,‘lO,OOO asymptomatic persons). Table 1 [Neuhauser (1975a)l’ and Lewicki Table 4 [Neuhauser ‘corrected’lb and Lewicki Table 5 [correct Greegof assumptions] True positive False positive True positive False positive True positive False positive 65.9469 71.4424 71.9003 71.9385 71.9417 71.9420 309.1652 505.4606 630.0926 709.2240 759.4661 791.3660 65.9450 71.4404 71.8984 71.9365 71.9397 71.9400 3.624.5259 5.925.8135 7.386.9497 8.314.6553 8,903.6748 9,217.6563 24.3948 40.5173 51.1727 58.2149 62.8691 65.945 1 46.6718 93.1242 139.3583 185.3749 231.1753 276.7604 ^Specificity =0.634921; sensitivity=0.916667. %pecificity = 0.995299; sensitivity = 0.339099. Table 8 Summary Table 1 [Neuhauser ( 1975a)Ia Number tests 1 2 3 4 5 6 of marginal and Lewicki and average Table 4 [Neuhauser ‘corrected’lb costs. and Lewicki Table 5 [correct Greegor assumptions] of MC’ AC’ MC’ ACd MC’ AC* 1,175 5,492 49,150 469,534 47724,695 47.107.214 1,175 1,507 1,810 2,059 2,268 2.45 1 6,203 43,796 340,993 2.693.630 2 1.605.644 177,502,100 6,203 9,095 11,209 12,631 13,589 14,247 1,931 1,008 1,472 2,173 3,233 4,833 1.93 1 1,564 1,545 1,621 1,740 1,884 “Speciticity =0.634921; sensitivity = 0.916667. ‘Specificity =0.995299; sensitivity=0.339099. ‘Marginal cost = incremental cost per true positive case detected. dAverage cost = total cost/number of true positive cases detected. procedures would be more readily accepted by the medical profession if examples of its application were supported by accurate analysis of the data available’ (p. 1306). We concur, but it does the argument for economic evaluations no good if such a vivid demonstration is found to be invalid especially when the demonstration case appears to be quite defensible. We can, and should, also make the comment that most evaluations of health programmes and medical procedures require expertise in a number of disciplines. These are, par excellence, areas for multidisciplinary teams and, in this instance, the analysis of the parallel test protocol is more than usually complicated. Finally, we can also point to the depressing likelihood that there is a long way to go before doctors are convinced of the relevance of cost effectiveness as an outcome measure - however vivid the effect. Certainly, Greegor did not K. Brown and C. Burrows. The sixth stool guaiac test seem to be impressed by a marginal cost of $47million associated with a six- rather than a five-test series: 443 per case detected ‘The study raised the question whether the third set of slides should be omitted. The last 11 cancers I have detected using the slides were all guaiac positive on the first day. Long-term statistics with large groups may prove that Dr. Neuhauser’s suggestion can be adopted. For the time being, however, I am unwilling to emasculate a life-saving to save 30 cents per patient (the cost of the third set of slides).’ [Neuhauser test and Lewicki (1975b, p. 994, emphasis added)] This response to what has been widely accepted as a graphic illustration of the folly of ignoring the costs of a screening policy makes plain the problems economists face in trying to demonstrate to doubting physicians that saving ‘statistical lives’ is necessarily reconcilable with saving the lives of individual patients, albeit different patients. They implicitly accept the notion of maximising collective benefit when conducting randomised clinical trials to determine efficacy but often do not see it in the application of economic evaluation to health care programmes. zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLK Appendix A Formulas used in the recalculation of the sensitivity and specificity of parallel test series comprising n independent tests are as follows: A = i (n,)ax( 1 -a)“-x, (A.11 zyxwvutsrqponm x=k D=l-B=l- i(n,)b”(l-b)“-“, (A.2) x=k where A =combined sensitivity; D = combined specificity; B = ‘combined’ false positive rate = (1 -D); a =sensitivity of the individual test; b =false positive rate for a single test; n =length of the parallel test series; x =number of individual tests in the n test series that yield positive results; and 444 K. Brown and C. Burrows, The zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPO sixrh stool guaiac rest k =positivity criterion, i.e., the number of individual tests that must be positive in order for the test zyxwvutsrqponmlkjihgfedcbaZYXWVUTSR series to be classed as positive. Under the any-test-abnormal (A.2) can be written as positivity criterion (i.e., zyxwvutsrqponmlkjihgfedcbaZYXWVUTSR k> l), eqs. (A.l) and A=l-(l-a)“=l-c”, (A.1’) 1 -B=(l (A.2’) D= -b)“=d”, where c = the false negative rate associated with a single test; and d = the true negative rate (specificity) of a single test. Greegor’s data (summarized in table 1) reflect the disease status of individuals screened after the any-test-abnormal positivity criterion has been applied to their test results. Accordingly, the sensitivity and specificity rates calculated, 0.9167 and 0.9721, respectively, can be interpreted as providing estimates of the combined sensitivity and specificity, respectively, associated with the six-test protocol, that is, A =0.9167 and D =0.9721. The sensitivity and specificity values associated with a single (guaiac) test are obtained by solving eqs. (A.1’) and (A.2’) for a and zyxwvutsrqponmlkjihgfedcbaZYX 6, respectively, for n = 6, as per Greegor’s protocol. Thus, A=0.9167=(1-a)6 o a=1-(0.0833)“6=0.3391 D=0.9721=(1-b)6 o (l-b)=(0.9721)“6, and b=0.0047. References AmericanCancer Society, 1980,Guidelines for the cancer-related checkup: Recommendations and rationale, A Cancer Journal for Clinicians 30, 194-240. Applegate, W.B., 1981, Colorectal cancer screening, Journal of Community Health 7, 138-151. Bar-Hillel, M., 1973, On the subjective probability of compound events, Organisational Behavior and Human Performance 9, 396-406. Cohen, J.C., E.I. Chesnick and D. Haran, 1982, Evaluation of compound probabilities in sequential choice, in: D. Kahneman, P. Slavic and A. Tversky, eds., Judgment under uncertainty: Heuristics and biases (Cambridge University Press, Cambridge). Culyer, A.J., 1985. Economics (Blackwell, Oxford). Department of Clinical Epidemiology and Biostatistics (Stoddart, G.L. et al.), McMaster University Health Sciences Centre, 1984, How to read clinical journals: VII, To understand an economic evaluation (part B), Canadian Medical Association Journal 130, 1542-1549. K. Brown and C. Burru~s. The sixth stool guaiac test 445 Drummond. M.F.. 1987, Methods for economic appraisal of health technology. in: M.F. Drummond, ed., Economic appraisal of health technology in the European community (Oxford University Press, Oxford). Drummond. M.F.. G.L. Stoddart and G.W. Torrance, 1987, Me:hods for the economic evaluation of health programmes (Oxford University Press. Oxford). Galen. R.S.. 1982. Application of the predictive value model in the analysis of test effectiveness. Clinical and Laboratory Medicine 2, 685-699. Greegor, D.H.. 1969, Detection of silent colon cancer in routine examination, A Cancer Journal for Clinicians 19, 330-337. Greegor. D.H., 1971, Occult blood testing for detection of asymptomatic colon cancer. Cancer 28, 131-134. Greegor, D.H.. 1975, [letter], New England Journal of Medicine 293, 994. Kelleher, M. and R. Vautrain. 1975. [letter], New England Journal of Medicine 293, 995. Knight, K.K.. J.E. Fielding and R.N. Battista, 1989, Occult blood screening for colorectal cancer. Journal of the American Medical Association 261. 587-593. Lefhdll, L.D.. 1974, Early diagnosis of colorectal cancer, A Cancer Journal for Clinicians 24, 152-159. Miller, A.B., 1986. Principles of screening for colorectal cancer, Frontiers in Gastrointestinal Research 10. 35-4.5. Mooney, G.H. and M.F. Drummond, 1982, Essentials of health economics - What is economics? Part 1. British Journal of Medicine 285, 10211025. for health care (McMillan, Mooney, G.H., E.M. Russell and R.D. Weir, 1980, Choices Houndmills). Neuhauser. D., 1980. [letter], New England Journal of Medicine 303. 13061307. Neuhauser. D. and A.M. Lewicki, 1975a. What do we gain from the sixth stool guaiac?, New England Journal of Medicine 293. 226-228. Neuhauser. D. and A.M. Lewicki, 1975b. [letter]. New England Journal of Medicine 293, 995. Prescott, N.. K. McPherson and J. Bell, 1980. Cost effectiveness of screening for occult blood in the stool: Another look, New England Journal of Medicine 303, 1306. Simon, J.B., 1985, Occult blood screening for colorectal carcinoma: A critical review. Gastreoenterology 88, 820-837. Thompson, MS. and E.E. Fortess. 1980, Cost-effectiveness analysis in health program evaluation, Evaluation Review 4, 549-568. U.S. Preventive Services Task Force, 1989, Recommendations for fecal occult blood screening, Journal of the American Medical Association 261. 586.