UsableStats: 1

Question 821:

Asked on October 18, 2009

Tags: Sample Size , Power , 1-Proportion Test , Effect Size

Answer:

No answer provided yet.There are a few things you will need to consider to determine the sample size. First, from how I understand you problem, you need to verify that a process is running at least at a .995 level of quality. There are two ways to determine whether a proportion is in excess of a criterion: using a confidence interval, or employing a hypothesis test and testing the sample proportion against the test criterion of .995. I'll start with the confidence interval.

For example, if you observed 100 out of 100 devices without failure, you would want to know if there is sufficient evidence that this sample proportion of 1 is above .995. The 90% confidence interval, using the adjusted-wald formula, gets you an interval between .977 and 1%, so we cannot be 90% confident at this sample size that the proportion is above .995. If you observe 500 out of 500 then you can be 90% confident the true proportion is in excess of .995. See the free calculator here to run some what-if scenarios http://www.measuringusability.com/wald.htm

The second approach which I think will work for you is to use a hypothesis test. What you want to determine is at what sample size will you have at least a reasonable shot for seeing if the proportion is above or below .995. For example, you could imagine seeing 50 devices without defect, but when you have something that only happens 1 time in 200 events (=.995) then your chance of even being able to see such an event are low when your sample size is 50. It would be like saying you want to know if a coin is biased towards heads but you only get 3 chances to flip it. You really can't tell much with only 3 flips.

So conceptually, keep in mind that you don't know what proportion your process is running at so at any sample size you run you can make 2 errors: saying the sample proportion exceeds .995 when it doesn't (called a Type I error) and saying it does not exceed .995 when in fact it does (Type II error). Because your proportion is so high, you are more likely to commit a Type II error since you need larger sample sizes.

So much for the theory, now to find the answer the two other pieces of information you need to determine your sample size are how sure you want to be at being able to detect a difference (.80, .85, .90, .95?) and how small of a difference you want to be able to detect. For example, if the process is really running at say .90 and not .995 it will be easier to see this difference with smaller sample sizes. The difference here is 9.5 percentage points, which is considered a large effect size (using the statistical jargon). If however your process is really running at say .98, then this is only a 1.5 percentage point difference (small effect size) and you will need a larger sample size.

Finding your sample size is then balancing

How small of a difference (if one exists at all) you feel you need to detect.
How much of a chance you want to have in being able to detect a difference of that size or larger (this is called power and by default is set to 80%).

So the parameters we'll use to determine your sample size are a confidence level of 90% (from what you mentioned in the email), a power of 80% and a small effect size of .20 (which is what you get from a difference between .995 and .98). To determine the sample size we need to consult tables of power with these parameters. Many can be found in for example Power Analysis for the Behavioral Sciences by Jacob Cohen 1988 and is the one I consulted for this question.

Results
You would need to plan on testing 223 devices to have an 80% chance of detecting a difference of 1.5 percentage points or larger at a confidence level of 90%.

Here are some additional results for your sample size planning using assuming you're using a 1-sided confidence level ( meaning you just want to see if the values is lower than .995).

Alpha	Power	Difference	Effect Size	Sample Size Needed
.10	.70	1%	.147	302
.10	.70	1.5%	.20	162
.10	.70	2.0%	.25	106
.10	.80	1%	.147	417
.10	.80	1.5%	.20	223
.10	.80	2.0%	.25	146
.10	.90	1%	.147	608
.10	.90	1.5%	.20	325
.10	.90	2.0%	.25	213

So you can see for example if you want to have a 90% chance of detecting a difference as small as 1 percentage point you would need to plan on a sample size of 608.

Finally, let's say you run 608 users and observe no failures at that level, you would then conduct a statistical test on your data called a 1-sample proportion test of 1 against .995. Given that data you could be 90% confident that there is less than a 5% chance the true proportion is less than .995.

You can see there's no one-lined answer, rather sample size planning touches on many aspects of statistical theory(which is why it is hard to apply). At the simplest level though remember than since you have such a rare even (1 in 200) you need a reasonably large sample size to have a decent chance of detecting any departure from .995.

If you're interested, I can put together some excel calculators for you to run some of your own scenarios or would be happy to assist with your analysis. Let me know if you have any questions.

Not what you were looking for or need help?

Ask a new Question

Question 821:

Answer:

Not what you were looking for or need help?

Copyright © 2004-2024 Measuring Usability LLC