On the topic of condition checks, Josh Tabor made a claim that bore some proving. He said that the conditions checks are not equal. In fact, some are essential and mandatory, and others are simply … well, optional.
For instance, the n < 10% of the population condition is completely optional and has not been required in the AP FRQ rubrics. Check it out. Totally true. The FRQ rubrics require the randomness condition and the large counts (or normality) condition, but not the n < 10% condition.
Mind blown. The textbook makes them all equal and all mandatory and doesn’t really explain why we do them, just that they are all required.
So here is the skinny on the conditions checks for z and t interval, why we check them, and the implications if the conditions are violated.
So let’s look at the formula for the z and t Confidence Interval and break them down.
1. The first part of the formula is the mean or point estimator for the interval. What is special about the point estimator? It is supposed to be an unbiased estimate of the mean.
Wait, what’s that? An unbiased estimator? How do we get the sample mean to be an unbiased estimate of the population mean? Oh, yea, we RANDOMLY SAMPLE from the population and calculate the mean and the CLT tells us the mean of a sample that is random is an unbiased estimator.
What happens if we do not have an unbiased estimator then? What if phat or xbar is in the wrong spot on the number line? We get an interval (or a test statistic) that is completely wrong! Garbage In, Garbage Out, and the VERY first check we have to do is a check for random sampling so we have confidence in the point estimator!
What happens if we violate this condition? Automatic failure of the interval or test. This condition is essential and vital to the process of doing statistics. Without an unbiased estimator, we have garbage going in.
2. Next up in the formulas we have the wonderful z-star and t-star. So what do we have to check to make sure those values are appropriate to use. How do we do that? What needs to be in place to make sure the z-star and t-star are appropriate?
Well, if the sample we collected is unimodal and symmetric, then we should be very comfortable using the z or t value for the appropriate interval or test. How can we be assured if the sample is normal? Well, if the sample is composed of proportions, making sure we have ENOUGH in our sample will make sure it is unimodal and symmetric (even normal!) How many is enough? Some books say np>5 some books say np>10, but the simplest idea is we are checking to make sure the sample is normal or big enough.
If our sample is composed of numerical quantities, then it is even easier. Graph the sample, look at it. Is it unimodal and symmetric? Good enough.
What happens if we violate this condition? Through several fathom sampling exercises, we discovered that if this condition is violated the z-star or t-star we use OVERESTIMATES the interval. We are claiming we are 95% confident, but in reality we are only 92% or 91%. That is bad. We are lying to people if we violate this condition. Not good. If we don’t check this, it is a failure.
3. Independence ….. Independence is tricky. For most of the problems we do, we find that n < 10% is enough to check. Does that guarantee that every person answered independently? What about experiments where independence is much much more difficult to ensure? What Josh and the editors of the book he helped author is saying that n< 10% is enough for the single sample cases.
Why? Because we cannot guarantee independence, but we can try to make sure it is there. But in the end, we don’t really care about the sample size as long it is done in a way to ensure independence.
Again, why? Because of what happens if we violate the sample size condition. Um, nothing. We can adjust for it if necessary, but the REALLY cool thing is that if we violate the sample size condition all we are doing is UNDERESTIMATING our confidence.
If we say that we are 95% confident and we have a sample size that is too big, what happens in reality is that we are 97% or 99% confident. We are lying, but we are lying to the GOOD side.
Which is why the AP exam grading rubric does not penalize learners for forgetting to check the third condition!
So to simplify and make it easy to understand, we have the following case:
The colors match up to the why of our conditions, and the condition 3 is really about independence, not just the n<10% but independence is very difficult and ends up being optional.
Besides, if we really want to, we can just do this:
And work in the Finite Population Correction Factor. This is not part of the AP curriculum, but it is nice to know WHY we check these, and WHAT happens when we fail to check them.
I know I will do a much better job teaching and making this understandable for learners knowing this.
4 responses to “Conditions Checks–Why and What if we don’t”
Glenn, Thank you so much for putting this on your blog. It is really helpful for me to see this all summarized here. I really appreciate it!
You are most welcome Natalie. There is more coming, I promise!
I have been very anti-formula for a while, but this is making me come around quite a bit. This does a fantastic job of actually reducing memorization via the formula, instead of the other way around, because of how clearly it communicates the conditions.
I know. It makes so much more sense when you look at the conditions and the formulas generated together and for each other. This really changed my approach to thinking and teaching these.