
Welcome! Here you can read about some of the fallacies in today's
statistical inference.
This text is written by dr.med. Branko Soric email address: branko.soric@zg.tcom.hr Note: In this text the Greek letter alpha is replaced by the following symbol: ¤ The exponents are denoted by ^ ; for example: 10^6 = 0.000001 (= "ten to minus sixth power"). Instead of subscripts, numbers follow after letters, e.g.: r1, r2, ¤1, ¤2; likewise: Qmax. Zagreb, March  June 2001 Branko Soric: THE SCIENCE IS INSUFFCIENTLY VERIFIED STATISTICALLY (IT IS NECESSARY EITHER TO CALCULATE A MAXIMAL PERCENTAGE OF FALSE DISCOVERIES, OR TO ATTAIN HIGHER SIGNIFICANCE LEVELS IN SINGLE EXPERIMENTS) A SURVEY (SUMMARY) Contradictory opinions have been expressed about the reliability and correctness of statistical testing. We do not know today how much truth there is in parts of medicine (as well as other sciences) that have only been "verified" statistically, without any other more reliable proofs. With chosen statistical significance levels, such as 5% or 1%, the percentage of untruth can easily be greater than 10%, or 20%, or 50% ... However, we should KNOW that the percentage of fallacies is smaller than 1% (or, perhaps, 5%)  because the science should comprise known facts, and not the unknown! It is necessary to correct statistical textbooks as well as the practice. Karl Pearson, having investigated the correctness of the Monte Carlo roulette in 1894, discarded a null hypothesis on the ground of an extremely small probability (far less than 10^9 or one in a thousand million) that the observed phenomena could randomly occur with an unbiased roulete. Namely, because such occurrences were "practically impossible" with an unbiased roulette, he inferred that the roulette must be biased. (Note: A null hypothesis is an assumption that some phenomenon or effect does NOT exist. To reject a true null hypothesis means to make a false dicovery). Later, the statisticians have (unjustifiably!) greatly loosened the criterion for discarding a null hypothesis, in order to moreeasily achieve "statistical verification" of statements i.e. to make more scientific discoveries. An event with a probability of about 0.1% could hardly be called "practically impossible"! Still, the statisticians now say that even a 5percent probability is good enough to discard a null hypothesis! This seems to have rendered the today's science insufficiently credible. In the last ten or more years (as far as I know) neither the research practice nor the statistical textbooks have been made any better. If there is a large number (a) of true null hypotheses in a very large number (n) of independent experiments, and if these n experiments produce r results ("discoveries") significant at the level ¤ (=alpha), then the probability that a discovery is false is not ¤ (as is often imagined) but it is: ¤a/r = ¤a/(¤a+fb) (and this is different from ¤, except if a = r; but this may not be so, and this is also unknown to us, because "a" is an unknown number). (See Figure 1). ¤a = number of false discoveries;
fb = number of true discoveries; a+b=n Some links (added on July 17, 2009)
http://www.jstor.org/pss/2289950 Sorić B.: "Statistical 'Discoveries' and Effectsize Estimation", Journal of the American Statistical
Association, Vol. 84, No. 406 (Theory and Methods), 1989 (str. 608610.) 
http://www.bepress.com/uwbiostat/paper259/
http://www.bepress.com/cgi/viewcontent.cgi?article=1092&context=uwbiostat UW Biostatistics Working Paper Series, University of Washington, Year 2005 Paper 259 / Storey, John D.: The Optimal Discovery Procedure: A New Approach to Simultaneous Significance Testing / 3 Optimal Discovery Procedure: Theory / 3.1 Optimality goal (Page 5) "(......) ....expected number of false positives (EFP). (......) Similarly, the sum of powers across all true alternative hypotheses is the ETP. Even
though there are many ways in which one can combine these values in order to assess the performance of multiple significance
tests, this particular one is focused on the overall “discovery rate” (Soric 1989). (......) An exact equality exists for large numbers of tests
with certain convergence properties (Storey et al. 2004), under Bayesian mixture model assumptions (Storey 2003), and under
alternative definitions of the FDR (Benjamini & Hochberg 1995, Storey 2003) (......)" 6.2 False discovery rate optimality
by the ODP (Page 16) "(......) The FDR is the proportion of false
positives among all tests called significant (Soric 1989, Benjamini & Hochberg 1995) (......)" 
http://www.math.tau.ac.il/~ybenja/MyPapers/benjamini_hochberg1995.pdf
J. R. Statist.
Soc. B (1995) 57, No. 1, pp. 289300 / Controlling the False Discovery Rate: a Practical and Powerful
Approach to Multiple Testing / By YOAV BENJAMINI and YOSEF HOCHBERG / Tel Aviv University, Israel [Received January 1993. Revised March 1994] (Page
290) "(......) In this work
we suggest a new point of view on the problem of multiplicity. (......) ....a desirable error rate to control may be the expected
proportion of errors among the rejected hypotheses, which we term the false discovery rate (FDR). This criterion integrates
Spjotvoll's (1972) concern about the number of errors committed in multiplecomparison problems, with Soric's (1989 concern
about the probability of a false rejection given a rejection. We use the term FDR after Soric (1989), who identified a rejected
hypothesis with a 'statistical discovery' (......)" 
http://genomebiology.com/2006/7/3/401 PMC Biophysics / Genome Biology A reanalysis of a published Affymetrix GeneChip
control dataset Alan R Dabney and John
D Storey Department of Biostatistics, University of Washington, Seattle,
WA 98195, USA Genome
Biology 2006, 7:401 doi:10.1186/gb200673401 Published: 22 March 2006 "(......) Falsediscovery rates were originally proposed by Soric [2] and Benjamini and Hochberg [3]. The gvalue was developed as the FDR analog of the pvalue [47]. There is sound statistical justification behind both FDR and gvalue methods (......) 
http://www.vanbelle.org/
Statistical Rules of Thumb http://www.vanbelle.org/rom/ROM_2002_06.pdf#search=%22FDR%20Soric%22
(Page 2) "(......) Hochberg (see Benjamini and Hochberg, 1995)
made a fundamental contribution to the multiple comparison problem by defining the False Discovery Rate (FDR). Their work can be linked to a seminal paper by Sorić (1989). Rather than fixing the Type I error rate he
proposed fixing the rejection region. (......) The Hochberg approach has found particular usefulness in situations where there
are many multiple comparisons
such as in microarray analysis with hundreds or even thousands of comparisons. Storey (2002) has sharpened the Hochberg procedure
(......)"  http://www.stat.berkeley.edu/techreports/633.pdf#search=%22FDR%20Soric%22 Resamplingbased multiple testing
for microarray data analysis Yongchao Ge, Sandrine Dudoit, and Terence
P. Speed; Jan.
2003 Technical Report # 633 / Department of Statistics, University of California,
Berkeley / Division of Biostatistics,
University of California, Berkeley / Division
of Genetics and Bioinformatics, The Walter and Eliza Hall Institute of Medical Research, Australia /
(Page 3)
"(......) Since a typical microarray experiment measures expression levels for several thousand genes simultaneously,
we are faced with an extreme multiple testing problem. (......)" (Page 4)
"(......) The present paper introduces a new algorithm for computing the Westfall
& Young (1993) stepdown minP adjusted pvalues. A second line of multiple testing is developed by Benjamini &
Hochberg (1995). They propose procedures to control the false discovery rate. This was further developed by Storey (2002)
(......)" (Page 21)
Benjamini & Hochberg (1995) suppose that V=R = 0 when R = 0, while Storey (2002) uses the conditional expectation
of V=R given R > 0, termed the positive false discovery rate. Earlier ideas related to FDR can be found in Seeger (1968)
and Sorić (1989) (......)".  http://www.stat.purdue.edu/~tlzhang/mathstat/pvaluerev.pdf On the False Discovery Rates of a
Frequentist: Asymptotic Expansions Anirban DasGupta and Tonglin Zhang, Department of Statistics, Purdue University (2006) (Page 1) "Abstract: Consider a testing problem for the null hypothesis (......) The standard frequentist
practice is to reject the null hypothesis when the pvalue is smaller than a threshold value (alpha), usually 0.05. We ask
the question how many of the null hypotheses a frequentist rejects are actually true.
(......) We show that the BenjaminiHochberg FDR in fact converges to
(delta n) almost surely under g for any fixed n. For onesided null hypotheses, we derive a third order asymptotic expansion for (delta n).... (......) 1. Introduction  In a strikingly interesting short note, Sorić [19] raised the question
of establishing upper bounds on the proportion of fictitious statistical discoveries in a battery of independent experiments.
Thus, if m null hypotheses are tested independently, of which mo
are rejected at a significance level (alpha), and another S among the
false ones are also rejected, Sorić essentially suggested E(V)/(V +S) as a measure of the false discovery rate in the chain of m independent
experiments. Benjamini and Hochberg [3] then looked at the question in much greater detail and gave a careful discussion....
(......) The practical importance comes from its obvious relation to statistical
discoveries made in clinical trials, and in modern microarray experiments. The continued importance of the problem is reflected
in two recent articles, Efron [5], and Storey [21], who provide serious Bayesian connections and advancements in the problem.
See also Storey [20], Storey, Taylor and Siegmund [23], Storey and Tibshirani [22], Genovese and Wasserman [10], and Finner
and Roters [9], among many others in this currently active area. "Around
the same time that Sorić raised the issue of fictitious frequentist discoveries made by a mechanical adoption of the
use of pvalues, a different debate was brewing in the foundation literature. Berger and Sellke [2], in a thought provoking
article, gave analytical foundations to the thesis in Edwards, Lindman and Savage [4] that the frequentist practice of rejecting
a sharp null at a traditional 5% level amounts to a rush to judgment against the null hypothesis. (......)". 
imsart ver. 2006/01/04
file: pvaluerev.tex date: April 9, 2006  THIS SITE IS NOT COMPLETE. Sorry, I don't
know when I shall be able to complete this site in English. You can see
the Croatian version at the following address: http://soricb.tripod.com/statistickozakljucivanje/
 



Counter (since March 11, 2008)
