## Question 890:

1

A couple of suggestions. When you conduct a test using an ANOVA, the ANOVA outputs a p-value which will tell you whether there is a difference between the groups. It won't tell you which groups are significantly different.  This is complicated more by the fact that you're using a MANOVA. The MANOVA will give you p-values for the overall combination of measures, but the you won't know which measure is significant for which group.  You will ultimately have to some paired-comparisons using t-tests (assuming all the data is continuous). You mention you're measuring (1) effectiveness, (2) efficiency and (3) satisfaction but these aren't measures, their sub-constructs of usability. I will assume for effectiveness you'll be using a completion rate (binary maybe), task time for efficiency and some task-level questionnaire for satisfaction. Using binary data in a MANOVA and ANOVA is risky and typically not recommended. You would use a chi-square instead.

Fortunately I can offer a partial solution for you to plan your sample size. You need an estimate of the standard deviation for some or all of the measures and some idea about the smallest difference you're hoping to detect that's meaningful. These are the necessary steps in a power analysis.  Becuase you will eventually need to do paired comparisons by measure (e.g. is there a difference in the completion rate between Group A and B) then we can use a sample size for a 2-proportion test.  For completion rates we can just assume the highest variability, which occurs at a 50%. Then we only need to specify the difference in proportions.

I'm assuming your test condition is between-subjects where different sets of users will attempt tasks or answer questions on different products (not the same users on all products).

A large difference would something like a 90% vs a 50% completion rate (effect size of .9273) and would require a sample size of 27 users in each group.
A medium difference 70% vs 50% (.41 effect size) would require 118 in each group.
A small difference 60% vs 50% (.20 effect size) would require 489 in each group (or for 5 groups 2445 subjects!).

If you're using a continuous measure like the System Usability Scale we can use the historical standard deviation of 21 and provide some differences again assuming a between subjects setup.

For an 18 point difference in SUS scores (e.g. a 80 and 62) you'd need 31 in each group (effect size .857).
For a 10 point diff you'd need 95 in each group (effect size .48)
For a 5 point diff you'd need 373 in each group (.24 effect size).

And to your 3rd question, yes, as long as the smallest sample size for the groups is close to the recommended sample size you'll be fine.

For references you can see Jacob Cohen 1988 Power Analysis in the behavioral science. I computed the sample sizes using the Usability Statistics Package Expanded http://www.measuringusability.com/products/expandedStats
and the SUS Guide and Calculator Package http://www.measuringusability.com/products/SUSpack

Let me know if this makes sense and if you have more questions. I'd be happy to assist you further with the setup and analysis of your study. let me know if this is something you have budget for.