| |
SAMPLE SIZE
DETERMINATION and
examples
[For power analysis nomogram
click here]
NOTE (NEW!) I
have added some EXTRA DETAILS for the calculation of the first (blue) practice
sheet.
If you need to check how we arrived at a particular figure, please
scroll down to the new section or click here
ALSO NEW
ANSWERS TO third practice set of questions
Hint: Re:
finding the level of precision. Turn the above equation around so that the
unknown (є) is on the left. Then substitute the terms on the right
with the known values. so:
є becomes the
SQ.ROOT of (1.96)² P(1-P)
n
Here is a worksheet using the methods we have just
examined.
Something like this is very likely
going to be part of the mid-term exam.
MID-TERM TEST PRACTICE SHEET #1
The following takes the form of a sequence of facts and decisions.
Please answer the questions or provide the information in the correct sequence.
Here are some notes to assist you in the
completion of the worksheet:
In A-D you are considering taking a
single sample, but soon realize that with three groups present, if you want to
be able to study each group, you cannot because of the small sample size
for each group (especially groups A and B). So from (E) onwards you consider
taking three separate samples, one for each population group. Up to (I)
you are predicting the outcome with some uncertainty. in (J) and (K)
you are able to see the real results and clean up the final statement.
A) You intend to study the health-knowledge of 8,000 recent
(<5yr) immigrants to a city. A single random sample would contain how many
respondents (if you needed to be accurate to within +/- 5.5% 95% of the
time) ? The questions are of the dichotomous type (right/wrong, yes/no, etc), and as
you have no idea what proportion will have the correct answer, you need to use P = 0.5
(so 1-P is also 0.5) |
(all) 317.5
=(318) |
B) It becomes evident that almost all immigrants are
Azerbaijanis, Balkans, and Catalans, and they are in the ratio 1:2:5. If you
were to take a single random sample, (as in A) you should end up with the sample in the
same ratio as in the population. How many would you have from each group?. |
A
39.70 = (40) |
B
79.37 =(80) |
C198.43
=(199) |
C) What would the sample fraction be for each
group? |
0.0397 |
0.0397 |
0.0397 |
D) If you reported the results separately for each group, what
would be the level of precision for each group? |
+/- 15.97% |
+/- 11.12% |
+/- 6.93% |
E) Now the precision is clearly too poor (too wide) to so you
want +/- 5.5% precision for each group. How many would you need from each group?
|
318 |
318 |
318 |
F) What would the sample fraction be for each group now? |
0.318 |
0.159 |
0.064 |
g) The budget will allow only 750 completed interviews for the
whole study. At 250 per group, what would the sample fractions be now? |
0.250 |
0.125 |
0.050 |
H) What would the level of precision be now? |
+/- 0.062 |
+/- 0.062 |
+/- 0.062 |
I) You expect a response rate to be 20% (people who
complete the interviews. How many original attempts are needed to produce the final
number that you need from each group? |
1250 |
1250 |
1250 |
J) Assume the study is now complete. You have sent
out the number of questionnaires shown in (I) above, and the response was 20%. But
it turns out that 72% knew the correct answer. Calculate the precision again for
this more detailed information |
+/- 0.056 |
K) Now give the "72% were correct" statement showing
the confidence limit around the answer |
72.0% CL95%: 66.43% - 77.57%
|
Note also that for B the answer is rounded up to the next whole person,
but the calculation for C uses an exact value.
....ADDED DETAILS - How we calculated
these results:
A )
You intend to study the
health-knowledge of 8,000 recent (<5yr) immigrants to a city. A single
random sample would contain how many respondents (if you needed to be
accurate to within +/- 5.5% 95% of the time) ? The
questions are of the dichotomous type (right/wrong, yes/no, etc), and as you
have no idea what proportion will have the correct answer, you need to use P
= 0.5 (so 1-P is also 0.5) |
(all)
31 7.5 or 318 |
B) It
becomes evident that almost all immigrants are Azerbaijanis, Balkans, and
Catalans, and they are in the ratio 1:2:5. If you were to take a
single random sample, (as in A) you should end up with the sample in the
same ratio as in the population.
How many would you have from each
group?. |
A
39.70
= (40) |
B 79.37 =(80) |
C
198.43 =(199)
|
For (B) We are told that the ratio of A:B:C is 1:2:5 This
is solved by counting the 'total' number of 'units' or 'shares'. 1+2+5 = 8 so A
has 1/8, B has 2/8 and C has 5/8 (Altogether 8/8) You have calculated a single
sample 'n' as 318, so multiply 318 by 1/8 to obtain A's 'share', 2/8 to obtain
B's share, and 5/8 to obtain C's share.
C) What would the sample fraction
be for each group? |
0.0397 |
0.0397 |
0.0397 |
For (C) The sample fraction is n (sample) divided by N
(population), and in this case it is the n/N for EACH of the three groups. You
have the numerators (n) for each, and need the denominators. We are told the
WHOLE population is 8 ,000
people, so divide 8,000
into the 1:2:5 ratio as in the last question. In this way (because I have used
VERY simple figures) you get *(for A:) 40/1,000, for B: 80/2,000 and C:
200/5,000. They all turn out to be the same of course.
D) ) If you reported the results
separately for each group, what would be the level of precision
for each group? |
15.5% |
|
|
For (D) Here you need to turn the equation around as
shown in the slides. You are looking for the precision (e
), so this comes out to the left side = everything else on the right side.
Starting with
n = [1.96]2 P(P-1)
[ e ]2
For A: e
=
|
√ |
1.962
P(1-P)
n
|
e
=
|
√ |
0.024 = 0.155 or (± )
15.5% |
Similarly calculated for the other two groups
E ) Now
the precision is clearly too poor (too wide) to so you want
+/-
5.5% precision for each group. How many would you need from each
group? |
318 |
318 |
318 |
Here, you are
not satisfied with the sometimes WIDE precision in the last calculation, and
insist that the
precision term
(e ) is 5.5%, or 0.055. SO you need to calculate the
new "n" using e as 0.055:
BUT THIS IS
THE SAME as question (A).. EACH of the three groups woild need n=318
F) What
would the sample fraction be for each group now? |
0.318 |
0.159
|
0.064 |
You have the new numerators (318),
and the original denominators......
G) The
budget will allow only 750 completed interviews for the whole study. At 250
per group, what would the sample fractions be now? |
0.250 |
0.125
|
0.050 |
Now you
are restricted to total 750 (250 each), so calculate the new sample fractions
H) What
would the level of precision be now? |
0.062 |
0.062 |
0.062 |
... again work
with this
For A e
=
|
√ |
1.962
(0.25) = 0.062 (for each one)
250
|
I) You
expect a response rate to be 20% (people who complete the interviews.
How many original attempts are needed to produce the final number that you
need from each group? |
1250
|
1250 |
1250 |
(If only 1 in 5 respond, you need five times 250 to get
250 completed responses)
J) Assume the study is now
complete. You have sent out the number of questionnaires shown in (I) above,
and the response was 20%. But it turns out that 72% knew the correct
answer. Calculate the precision again for this more detailed
information |
+/- 0.056 |
(J) Here, the response rate WAS 20%, but now we have
the results and 72% knew the correct response, whereas we had taken 50% (0.5)
for the calculations.
Re calculate the precision using P=0.72 (and 1-P=0.28)
e
=
|
√ |
1.962
(0.72)(0.28) = 0.0556
250
|
K ) Now
give the "72% were correct" statement showing the confidence limit around
the answer
|
"The survey showed that 72.0 percent of the
sample were able to answer correctly, with 95% confidence limits :
66.43% to 77.57%
|
(K) This could be described in greater detail as
follows: While the sample showed 72% correct, we can be 95% certain that
the larger population from which the sample was taken would have responded
correctly between 66.4% and 77.6%.
In reality, you should make this final statement
separately for EACH of the 3 groups .
MID-TERM TEST PRACTICE SHEET #2
Try this sheet yourself.
A) You intend to study the responses to a food-borne disease
quiz among 10,000 food handlers. A single random
sample would contain how many
respondents (if you needed to be accurate to within +/- 4 percent,
95% of the time) ?
The questions are of the dichotomous type (right/wrong, yes/no, etc),
and as you have no idea what proportion
will have
the correct answer, assume that 50% will respond correctly |
600.25
(=601 people) |
B)
It is reasonable to assume that those who have had some technical education
are better prepared, but there are
only 5% of the workers (type W) who have had the full week course and
another 15% who have had the one day
course
(type D).
The rest (80%) are untrained (type U). If
you were to take a single random sample, (the "n" as in A)
you
should end up with the sample in the same ratio as in the
population. How
many would you have from each
group?. |
W:30 |
D:90 |
U:480 |
C) What would the sample fraction be for each
group?
|
30/500
=0.06 |
90/1500
=0.06 |
480/8000
=0.06 |
D) If you reported the results separately for each group, what
would be the level of precision for each group? |
(using t=2.04, df=30)
+
0.1862 |
(using t=1.97, df=90)
+
0.1038 |
(using t=1.96, df=480)
+
0.0447 |
E) Now the precision is clearly too poor (too wide) to so you
want +/- 5.5% precision for
each
group.
How many would you need from each group? |
318 |
318 |
318 |
F) What would the sample fraction be for each group now? |
318/500
= 0.636 |
318/1500
= 0.212 |
318/8000
= 0.040 |
g)
The budget will allow only 720 completed interviews for the whole study. If
you assume equal n for each
sub
group, what would the sample fractions be now? |
240/500
= 0.480 |
240/1500
= 0.160 |
240/8000
= 0.030 |
H) What would the level of precision be now? |
+ 0.063 |
+ 0.063 |
+ 0.063 |
I)
You expect a response rate to be 24% (people who complete the interviews.
How
many original attempts are needed to produce the final number that you need
from each group? |
240/0.24
= 1000 |
1000 |
1000 |
J)
Assume the study is now complete. You sent out the number of questionnaires
shown in (I) above.
But it
turns out that only 22% responded, but that 78% knew the correct
answer.
Calculate the precision again, using this more detailed information |
+ 0.0547 |
K) Now give the "78% were correct" statement showing
the confidence limit around the answer |
The proportion answering "yes" was 78 %...with 95 % conf
limits between 72.53 % and 83.47 %
|
Note that where appropriate, the calculation was repeated with an adjusted t value corresponding
to the best estimate of n.
MID-TERM TEST PRACTICE SHEET #3
SOLUTION..........
1.
384
2.
43 (4y) and 341 (2y)
3.
sample fractions 0.028 for both
4.
precision (uncorrected for t value): 0.150 0.053
(corrected for t value): 0.154 0.053
5.
(stratify) number needed for each group to be ±0.05 precise at 95%
confidence: 384 for both
6.
sample fraction now: 0.256, 0.032
7.
precision: 0.05, 0.05
8.
Number to be contacted with response rate of 20%: 1,920 (each group)
9.
(Results) 0.23 and 0.21
10.
final precision: 0.0383, 0.0481
11.
Four-yr graduates on average passed at rate of 79 percent, (95% conf. limits:
75.2%, 82.8%)
12. Two-yr graduates on average passed at
rate of 58 percent, (95% conf. limits: 53.2%, 62.8%)
|