## Question 197:

1

When no estimate of the population standard deviation is available (even a sample standard deviation estimate) one technique when the data is normally distributed is to take the range of values and divide by 6 (which would assume to get 99% of the values since it is +/- 3 standard deviations) and so is a reasonable estimate of the SD. This works because of the rules of the properties of the normal distribution in which 99% of values are roughly within 3 standard deviations (+/-) the mean. For this data that would be 45-30k = 15  and divided by 6 it gives us a ball-park estimate of the Standard deviation, or \$2500.

The estimate of the mean is the center of the 95% CI range or \$37,500. This assumes the data are normally distributed, which would also make the CI symmetrical. Often with salary data there is a positive skew and the data are not normally distributed (there are usually very highly paid individuals). To correct for this, one usually takes the log of the salary to generate a log-transformed distribution. The CI and mean are then estimated from the transformed data and reported by taking the antilog to get back to the original units (dollars).

To answer the next questions we'd need to work backward from the margin of error, given our estimate of the standard deviation.

1. The margin of error is made up of the standard error of the mean (SEM) times a critical value for the level of confidence. For a 2-sided 95% interval for a large sample size, this value is 1.96.
2. So we have SEM*1.96 = 500.
3. The standard error of the mean is composed of the standard deviation divided by the square root of the sample size. Lets fill in what we know and solve for the unknown n.
4. SEM*1.96 = 500
5. SEM = 255.102
6. 2500/SQRT(n) = 255.102
7. 2500 = SQRT(n)*255.102
8. 9.8 = SQRT(n)
9. 96.04 = n

So we'd round up to the nearest person, 97, to obtain a margin of error of \$500.  We'd follow the same procedure for the rest of the parts.

For \$200 we'd need a sample size of 600 (round up to 601).

For \$100 we'd need a sample size of 2401

I would not recommend aiming for a margin of \$100 since that sample size is very large and I don't think we'd need to be that precise in salaries. Perhaps knowing the salaries to within \$500 is sufficient.