<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" >

<channel><title><![CDATA[YANCO LAB - Blog]]></title><link><![CDATA[https://www.scottyanco.com/blog]]></link><description><![CDATA[Blog]]></description><pubDate>Wed, 06 May 2026 23:12:19 -0500</pubDate><generator>Weebly</generator><item><title><![CDATA[The Risk of Type I Errors Might be Higher Than You Think]]></title><link><![CDATA[https://www.scottyanco.com/blog/the-risk-of-type-i-errors-might-be-higher-than-you-think]]></link><comments><![CDATA[https://www.scottyanco.com/blog/the-risk-of-type-i-errors-might-be-higher-than-you-think#comments]]></comments><pubDate>Thu, 30 Jan 2020 22:04:25 GMT</pubDate><category><![CDATA[Uncategorized]]></category><guid isPermaLink="false">https://www.scottyanco.com/blog/the-risk-of-type-i-errors-might-be-higher-than-you-think</guid><description><![CDATA[There are a host of problems associated with using null hypothesis significance testing (NHST) to make binary decisions about whether research results are true or false (see the ASA statement and references therein&nbsp;(Wasserstein and Lazar 2016). Here we set most of those aside to focus on Type I error rates - specifically, we aim to show that Type I errors are probably more common than we&rsquo;d all like to think...&nbsp;Recall that a Type I error is when the null hypothesis is &ldquo;rejec [...] ]]></description><content:encoded><![CDATA[<div class="paragraph"><span><span style="color:rgb(51, 51, 51)">There are a host of problems associated with using null hypothesis significance testing (NHST) to make binary decisions about whether research results are true or false (see the ASA statement and references therein&nbsp;</span><a href="https://paperpile.com/c/4bONZ6/81Ex"><span style="color:rgb(0, 0, 0)">(Wasserstein and Lazar 2016)</span></a><span style="color:rgb(51, 51, 51)">. Here we set most of those aside to focus on Type I error rates - specifically, we aim to show that Type I errors are probably more common than we&rsquo;d all like to think...&nbsp;</span></span><br /><br /><span><span style="color:rgb(51, 51, 51)">Recall that a Type I error is when the null hypothesis is &ldquo;rejected&rdquo; but was in fact true - in other words, a false positive. Most of us were taught that we choose the significance level (&alpha;) for an individual significance test in order to guard against having too many Type I errors. In fact the most commonly used &alpha; (0.05, meaning that we only &ldquo;reject the null&rdquo; erroneously 5% of the time) is directly the result of Neyman and Pearson trying to control errors in the long run. Whenever &alpha; is set, it anticipates a certain probability of a Type 1 error in each test. These probabilities accumulate across multiple individual tests so that the more significance tests one preforms, the greater the overall probability of at least one Type I error becomes. In contexts where many significance tests are going to be run researchers account for this problem by modifying &alpha; so that the &ldquo;family-wise&rdquo; error rate is more acceptable (e.g., the Bonferroni Correction). However, over a researcher may not make such a family-wise correction over the course of their entire career - especially since you don&rsquo;t know in grad school how many significance tests you will perform during the rest of your life. This means that as a researcher performs more tests the probability that one or more of them will be a Type I error increases. Hopefully none of this is news. What may be new, though, is what happens when we try to quantify the probability of making a Type I error.&nbsp;</span></span><br /><br /><span><span style="color:rgb(51, 51, 51); font-weight:700">The Probability of At Least One Type I Error</span></span><br /><span><span style="color:rgb(51, 51, 51)">Describing the probability of making a Type I error feels trivial, but it&rsquo;s a bit more nuanced than you might think. At first glance it appears this is a simple case of adding the probabilities together (since you could get a Type I error on the first test </span><span style="color:rgb(51, 51, 51)">OR</span><span style="color:rgb(51, 51, 51)"> the second test </span><span style="color:rgb(51, 51, 51)">OR</span><span style="color:rgb(51, 51, 51)"> the third test&hellip;). That looks like:</span></span><br /><br /><span><span style="color:rgb(51, 51, 51)">P(A or B) = P(A) + P(B)</span></span><br /><br /><span><span style="color:rgb(51, 51, 51)">So for total Type I error rates across a bunch of NHSTs we could just take &alpha; as the probability of a Type I error on each test and multiply by the number of tests so that:</span></span><br /><br /><span><span style="color:rgb(51, 51, 51)">P(Type I Error) = &alpha; &times; N</span></span><br /><br /><span><span style="color:rgb(51, 51, 51)">where N is the number of tests.</span></span><br /><br /><span><span style="color:rgb(51, 51, 51)">Not so fast! Unfortunately, this only works if the probabilities are mutually exclusive (both A and B can&rsquo;t happen at the same time). Which we know isn&rsquo;t the case since we could, of course, make a Type I error on more than one test. We can quickly see that this won&rsquo;t work by assuming that N = 21 and &alpha; = 0.05 so that:</span></span><br /><br /><span><span style="color:rgb(51, 51, 51)">P(Type I Error)&nbsp;= &alpha; &times; N&nbsp;</span></span><br /><span><span style="color:rgb(51, 51, 51)">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; = 0.05 &times; 21&nbsp;</span></span><br /><span><span style="color:rgb(51, 51, 51)">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;= 1.05</span></span><br /><br /><span><span style="color:rgb(51, 51, 51)">But we know this can&rsquo;t be right because probabilities must be bounded between 0 and 1.</span></span><br /><br /><span><span style="color:rgb(51, 51, 51)">The problem is that we haven&rsquo;t properly used the addition rule which reminds us to not double count the shared probability between the outcomes (e.g., the probability of more than one test resulting in a Type I error). What we should have used was:</span></span><br /><br /><span><span style="color:rgb(51, 51, 51)">P(A or B) = P(A) + P(B) - P(A and B)</span></span><br /><br /><span><span style="color:rgb(51, 51, 51)">So now we just need to know the value of P(A and B)&hellip; but we don&rsquo;t actually know that. To get around this problem we need to get a bit more creative.</span></span><br /><br /><span><span style="color:rgb(51, 51, 51)">We&rsquo;re going to use complementary probabilities to get around the fact that we don&rsquo;t know the joint probability(ies) of multiple independent Type I errors. This type of work around is often showcased using the &ldquo;Birthday Paradox&rdquo; which calculates (with some assumptions) the probability of at least one shared birthday between any two members of a group of a given size. (Amazingly it only takes a group of 23 people for the probability of a shared birthday to be 0.5). In our situation we are asking about the probability of at least one</span><span style="color:rgb(51, 51, 51)"> Type I error in N independent significance tests. For notational simplicity, let&rsquo;s call that probability: P(R). We also know that the probability of </span><span style="color:rgb(51, 51, 51)">zero</span><span style="color:rgb(51, 51, 51)"> Type I errors (call that P(R&prime;)) is just the complement of P(R):</span></span><br /><br /><span><span style="color:rgb(51, 51, 51)">P(R&prime;) = 1 &minus; P(R)</span></span><br /><br /><span><span style="color:rgb(51, 51, 51)">It will be much easier to calculate the probability of no</span><span style="color:rgb(51, 51, 51)"> Type I errors across multiple tests because in this case we can use the multiplication rule since in order to have no Type I errors on multiple tests we must have no error on the first </span><span style="color:rgb(51, 51, 51)">AND</span><span style="color:rgb(51, 51, 51)"> no error on the second </span><span style="color:rgb(51, 51, 51)">AND</span><span style="color:rgb(51, 51, 51)"> on the third and so on. We know that we have used &alpha; as the probability of a Type I error on a given test,</span><span style="color:rgb(51, 51, 51)"> </span><span style="color:rgb(51, 51, 51)">leaving us with:</span></span><br /><br /><span><span style="color:rgb(51, 51, 51)">P(R&prime;) = (1 &minus; &alpha;1) &times; (1 &minus; &alpha;2) &times; &hellip; (1 &minus; &alpha;<span>N</span></span><span style="color:rgb(51, 51, 51)">)</span></span><br /><br /><span><span style="color:rgb(51, 51, 51)">We&rsquo;ll further assume that we use the same &alpha; in each test (which is common practise). This, then, further simplifies to:</span></span><br /><br /><span><span style="color:rgb(51, 51, 51)">P(R&prime;) = (1 &minus; alpha)<span>N</span></span></span><br /><br /><span><span style="color:rgb(51, 51, 51)">That all gets us the probability of exactly zero Type I errors across N number of tests. But we wanted the probability of at least one Type I error across those N tests. The complement rule comes to our rescue again since that probability is simply the complement of the probability of no errors:</span></span><br /><br /><span><span style="color:rgb(51, 51, 51)">P(R) = 1 &minus; P(R&prime;)</span></span><br /><br /><span><span style="color:rgb(51, 51, 51)">And we can substitute the equation we just derived for P(R&prime;) above and we get:</span></span><br /><br /><span><span style="color:rgb(51, 51, 51)">P(R) = 1 &minus;((1 &minus; alpha)<span>N</span></span><span style="color:rgb(51, 51, 51)">)</span></span><br /><br /><span><span style="color:rgb(51, 51, 51)">So let&rsquo;s use this with our initial example where N = 21 and &alpha; = 0.05:</span></span><br /><br /><span><span style="color:rgb(51, 51, 51)">P(R) = 1 &minus; ((1 &minus; 0.05)<span>21</span></span><span style="color:rgb(51, 51, 51)">)</span></span><br /><span><span style="color:rgb(51, 51, 51)">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;= 1 &minus; (0.3405616)</span></span><br /><span><span style="color:rgb(51, 51, 51)">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&asymp; 0.66</span></span><br /><br /><span><span style="color:rgb(51, 51, 51); font-weight:700">Expected Number of Type 1 Errors</span></span><br /><br /><span><span style="color:rgb(51, 51, 51)">Let&rsquo;s say we want to think about this problem a little differently. Let&rsquo;s now consider how many Type I errors we should expect</span><span style="color:rgb(51, 51, 51)"> in a given set of NHSTs. It turns out we already saw the solution to this at the top in our first attempt to calculate the </span><span style="color:rgb(51, 51, 51)">probability </span><span style="color:rgb(51, 51, 51)">&nbsp;of a Type I error. The expected value of a binomial distribution (which is what we get when we use NHST to generate a true/false outcome) is:&nbsp;</span></span><br /><br /><span><span style="color:rgb(51, 51, 51)">E[x] = n &times; p</span></span><br /><br /><span><span style="color:rgb(51, 51, 51)">Or, in our case:&nbsp;</span></span><br /><br /><span><span style="color:rgb(51, 51, 51)">E[x]=N&times;&alpha;</span></span><br /><br /><span><span style="color:rgb(51, 51, 51)">Let&rsquo;s again use our example of &alpha; = 0.05 and N = 21.</span></span><br /><br /><span><span style="color:rgb(51, 51, 51)">E[x] = N &times; &alpha;</span></span><br /><span><span style="color:rgb(51, 51, 51)">&nbsp; &nbsp; &nbsp; &nbsp;= 21 &times; 0.05</span></span><br /><span><span style="color:rgb(51, 51, 51)">&nbsp; &nbsp; &nbsp; &nbsp;= 1.05</span></span><br /><br /><span><span style="color:rgb(51, 51, 51)">In other words, in 21 NHSTs we should expect, on average, 1.05 Type I errors. This reflects very common thinking about Type I errors, we should expect an error about every 20 significance tests - this is true.&nbsp; But it considers this expectation as a single value. When we consider the expected number of tests to an error as a random variable that can be described as a probability distribution, the problem gets more complex.</span></span><br /><br /><span><span style="color:rgb(51, 51, 51); font-weight:700">The Probability Distribution of the Number of Tests Before and Error</span></span><br /><br /><span><span style="color:rgb(51, 51, 51)">We&rsquo;ll now think probabilistically about how many trials a researcher should expect to run before encountering their first</span><span style="color:rgb(51, 51, 51)"> Type I error - this is a little different than just the average expectation in a given set. It turns out that we can use what&rsquo;s called the geometric distribution for this - it tells us probability distribution of &ldquo;success&rdquo; on trial k in a given number of Bernoulli trials. It says that:</span></span><br /><br /><span><span style="color:rgb(51, 51, 51)">P(X = k) = (1 &minus; p) <span>k&minus;1</span></span><span style="color:rgb(51, 51, 51)">p</span></span><br /><br /><span><span style="color:rgb(51, 51, 51)">Where X = k means that success occurs on the k<span>th</span></span><span style="color:rgb(51, 51, 51)"> trial. (Note that for our purposes we&rsquo;re calling &ldquo;success&rdquo; a Type I error (ironic as that may be) so that p = &alpha;. Hopefully you can see that this is basically the same thing that we calculated before. The expected number of trials up to and including the first success for the geometric distribution is just:</span></span><br /><br /><span><span style="color:rgb(51, 51, 51)">E[X] = 1/p&nbsp;</span></span><br /><span><span style="color:rgb(51, 51, 51)">&nbsp; &nbsp; &nbsp; &nbsp;&nbsp;= 1/&alpha;</span></span><br /><span><span style="color:rgb(51, 51, 51)">&nbsp; &nbsp; &nbsp; &nbsp;&nbsp;= 1/0.05</span></span><br /><span><span style="color:rgb(51, 51, 51)">&nbsp; &nbsp; &nbsp; &nbsp;&nbsp;= 20</span></span><br /><br /><span><span style="color:rgb(51, 51, 51)">Which is another way to say that we should expected, on average, to have to do 20 significance tests before getting our first Type 1. Not much new here. But now let&rsquo;s look at it distributionally to see how much any instance might deviate from that expectation.&nbsp;</span></span></div>  <div><div class="wsite-image wsite-image-border-none " style="padding-top:10px;padding-bottom:10px;margin-left:0;margin-right:0;text-align:center"> <a> <img src="https://www.scottyanco.com/uploads/1/1/8/5/118587946/1_orig.png" alt="Picture" style="width:auto;max-width:100%" /> </a> <div style="display:block;font-size:90%"></div> </div></div>  <div class="paragraph"><span style="color:rgb(0, 0, 0)">The cumulative distribution function plot shows us the number of tests on the x-axis and the cumulative probability on the y-axis.&nbsp; The first thing to note is that while the&nbsp;</span><span style="color:rgb(0, 0, 0)">expected</span><span style="color:rgb(0, 0, 0)">&nbsp;value of the geometric distribution for&nbsp;</span><span style="color:rgb(51, 51, 51)">&alpha; = 0.05 may be 20, the probability of getting a Type I error for in fewer than 20 significance test is certainly not zero.&nbsp;<br /><br />We can use the quantiles of the distribution to characterize this a few different ways.&nbsp;</span><span style="color:rgb(0, 0, 0)">First, we can ask for the value of X (number of tests) that corresponds to a given quantile (probability of a Type I error). For example, the number of trials until the first Type I error for the lower half of the probability distribution is 13.</span><span style="color:rgb(51, 51, 51)">&nbsp;Meaning that&nbsp;half of researchers will get a Type I error in 13 trials or fewer. This is far fewer than the intuitive E[X]=20. That&rsquo;s should feel a little scary.<br /><br />We can also use the probability distribution to ask basically the converse question: what is the quantile associated with a value of X; in other words, what is the probability of a researcher getting a Type I error in X trials? Let&rsquo;s try: What&rsquo;s the probability of getting a Type I error in your first trial? Remember, the expected number of trials to error is still 20&nbsp;</span><span style="color:rgb(51, 51, 51)">on average, but some will hit this error right off the bat - what&rsquo;s that number? 0.0975. Almost 10% of researchers will experience a Type I error on their first trial. That might put &alpha; = 0.05 in a different perspective for many people.</span><br /><br /><span style="color:rgb(51, 51, 51); font-weight:700">Conclusion</span><br /><span style="color:rgb(51, 51, 51)">As we stated at the top: there are plenty of reasons to be suspicious of NHST. Many of these reasons are even more compelling than the risk of a Type I error. That said, looking at Type I error rates probabilistically should give us pause - the chances of making a predictable Type I error with NHSTs are wildly high and we never know which of our tests represent that error&hellip;<br />&#8203;</span><br /><span style="color:rgb(0, 0, 0); font-weight:700">References</span><br /><span style="color:rgb(0, 0, 0)"><a href="https://paperpile.com/b/4bONZ6/81Ex">Wasserstein, R. L., and N. A. Lazar (2016). The ASA&rsquo;s Statement on p-Values: Context, Process, and Purpose. 70:129&ndash;133.</a></span></div>]]></content:encoded></item></channel></rss>