| |
BIOSTATISTICS:
PRACTICE PROBLEM solutions
RETURN TO ENH440 SPECIAL PAGE
These are the solutions to
the set of questions [1-56]
for ENH440
as well as additional questions 56-78 which appear on THIS page below the answers. Just keep scrolling.) For
most questions the
simple numeric answers have been given; you will need to supply the interpretation
for each situation. Some of the more complex
questions will also link to a more detailed solution.
EXACT 'P' RESULTS: In some cases I have
given the P value you should have extracted using the
tables. In other cases an "exact" P value from a computerized calculation
has been given to illustrate the interpretation of such a value.
For example,
if P is shown as
= 0.0347 this still agrees with P
< 0.05 but not with
P < 0.01.
Likewise, P
= 0.0017 is clearly < 0.005 but not < 0.001.
We WILL be calculating ONE example of an 'exact P' as part of the "Fisher's Exact Test".
|
1 |
OR: 1.94,
1.39, 1.19, 3.26 (CHI-SQ=4.06,
P=0.04), 0.63
detailed
solution |
2 |
t = 1.2905, 248 df, P > 0.05
detailed solution |
3 |
only #4 |
4/5 |
F(treat)
= 2.385, F(age)
= 5.474 F(Tr
x Age) =
0.987
detailed solution: |
6 |
OR:
5.00, P=0.215, (FET) |
7 |
y'
= 150.252 - 1.423(x), t= 11.917 |
8 |
CL: +0.2514, -0.2194 t=0.145, 328.8
|
9 |
River rafting ... (F.Exact.T) P=
0.0035
Detailed
solution |
10 |
y'= 0.72 + 0.27(x), y=2.88 cm, t=20.0
NEW SCATTERPLOT to
illustrate this solution
|
11 |
t=2.429 P<0.05, 76, 127, 178
μgPb/L
Detailed solution |
12 |
|
13 |
46 df, P < 0.05 |
14 |
7.512 P<0.01, 4.814 P<0.05,
1.951 P>0.05 |
15 |
t = 3.015, 8 df, P < 0.02
detailed
solution |
16 |
t = 3.953, 38 df, P < 0.001 |
17 |
t = 1.272,
21df
detailed solution
|
18 |
t=2.494, 18 df |
19 |
t=3.993, 5df |
20 |
chi-sq = 22.1
detailed
solution |
21 |
t (age)
= 1.62,
58 df, P>0.05; t
(weight )= 0.74, 58 df, P>0.05;
t (bp) =
6.04, 58 df, P<0.001 |
22 |
t = 0.5606, 11 df, P
= 0.59 |
23 |
t = 0.7845, 4 df, P = 0.527
(this is paired) |
24 |
t = 0.6978,
11 df, P > 0.05
detailed solution
|
25 |
t = 1.736, 16 df,
P >0.05
|
26 |
Ho: "No
difference between Nur and Eng students in terms of mean GPA" P < 0.05 |
27 |
P > 0.05 |
28 |
chi-sq
3.525 P = 0.06 OR = 2.85 |
29 |
chi-sq
6.764 1 df, P < 0.01 RR = 3.3 (protective) |
30 |
t = 4.202, 18 df, P < 0.001
detailed solution |
31 |
t = 2.583, 22 df,
P < 0.02
detailed solution
|
32 |
(individual
calculation) |
33 |
ms(species) =
73.44, F=4.54 ,
p>0.05 MS
CHEM= 13.083,
F = 0.808 p>0.05
|
34 |
|
35 |
chi-sq
= 7.18,
2df, P<0.05 (Actually P=0.0276) |
36 |
F =
0.693, 5.403, 6.500 |
37 |
This
is a paired t-test but requires a log transform because
the data are exponential t
= 1.109, P=0.3306 |
38 |
MPchi-sq:0.93,
P:0.335; HPchi-sq:7.10, P:0.0077(prot);
ESFET:
P:0.278 |
39 |
(1) RH r=0.75, y=0.654+0.0892, t=2.971, P<0.05;
(2)
VEL:
r=0.74, y=-0.510+0.0884(x); t=2.879, P<0.05
(3) CORR betw RH&Vel: r=0.22 |
40 |
please
change 6.0
to 60, and 7.0 to 70] Now this is correct: Pb = 233.4 - 31.73(pH)
= 233.4 - 31.73(pH) |
41 |
chi-sq: 4.43,
P+0.035, OR 3.18, CL: 0.94 to11.15 |
42 |
t = 3.315, 7 df, P = 0.0105 |
43 |
F=10.939, 2,12
df, P=0.002301
detailed solution |
44 |
(chi-sq goodness of
fit)(chi-sq goodness of
fit) (You don't need this for the exam) |
45 |
F(freq):8.44, P<0.01 F(size): 11.25, P<0.01
F(Freq x Size): 4.85, P<0.05
see detailed solution here |
46 |
b= ―0.1429,
CL: ―0.1074
to ―0.1783, t
= 34.65, 4df,
P <0.001 |
47 |
d= 2.688,
t= 9.622, 7df, P
<0.001 |
48 |
(a) 0.13, (b) 0.81, (c) 0.61, (d) 0.18, (e) >.05,
<.002, >.05, >.05 (f) best return in 2 yrs
detailed solution
|
49 |
inverse,
r=0.58, 10 df, P<0.05 |
50 |
Fisher's Exact Test P
= 0.0204 |
51 |
đ = 7.4,
t = 2.608, P = 0.025 |
52 |
F=
6.17; P<0.01
Detailed solution
|
53 |
Completed in class. t(unpaired) =
1.077 t(paired) = 8.624
(The correct method is paired) detailed
solution |
54 |
(a) X² 0.47,
P=0.49; (b) X² 17.30, (prot) P=0.000032,
(c) X²
9.49, P=0.00207 |
55 |
(a) X²
0.08,
P=0.78; (b) X²
0.82, P=0.365, (c) (error:please
omit) |
56 |
X² 6.94,
P=0.0084 (or "P<0.01"); OR:
2.37
|
ANSWERS TO EXTRA QUESTION SET #57-
81
(Scroll down for these questions)
57 |
OR: 1.0, 1.17, 0.63, 2.00, 5.44 X?11.1, P=0.0009 (Scroll down for question) |
58 |
(c)
at
250M, Y= 480 ppm, at 500M, Y= 182 ppm, (d) t=9.88 (Scroll down for question)
please note the changes here
|
59 |
(a)
>0.05 (b) no (d) 26 (Scroll down for question) |
60 |
t = 9.7, 98 df, ss P<0.001
(Scroll down for question) |
61 |
RR=4.02 ChiSq: 18.9,
1 df P=0.000014
(P<0.001) Exposed
personnel >4x as likely to develop lung ca. reject Ho.
(Scroll down for question) |
62 |
Only expected values are important here. 2 of 9 is
22%, so >20% cells have E values 5 or less. ChiSq not valid |
63 |
lead(ppm) =
724
- 0.899 M 6df, t
= 9.90, P
<0.001
(inverse rel. stat sig.) (Scroll down for question) |
64 |
removed. (essentially the
same as #61) |
65 |
HC%
= 47.19 - 0.3695 (ppm LEAD). t =
13.5, 11df,
P <.001,
r= - 0.97,
r2 = 0.94 (Scroll down for question) |
70A |
SOLUTION: We have here TWO variables. The independent variable is the
'treatment' and is a categorical variable with two levels (a hand barrier
cream either antiseptic or not). The dependent variable is presence or
absence of E. coli, clearly a categorical variable, with two levels. So the
data will be displayed as a 2x2 table. Analysis would be by ChiSquare. (Of
course odds ratio would be useful for explaining the strength and direction of the
association). |
70B |
SOLUTION:
The independent variable is the same but the dependent variable is not
continuous. This is a candidate for either 1-way ANOVA or unpaired
t-test, either of which would be applicable. |
70C |
SOLUTION: This introduces a second independent variable, also categorical,
with two levels. If the dependent variable remains as in 70B
(continuous), then this is a 2-way ANOVA, and with 240 workers, there will
obviously be a number in each group, allowing the "factorial design with
interaction". |
70D |
SOLUTION: Now this takes on a complicated arrangement.
If the arrangement in 70B is used, we have a pre-post test (paired t-test) with
all 240 people tested before and after using the antiseptic hand cream, thus 240
pairs of data, (239df). But if 70D is used in this way
we have TWO independent variables, and the solution is beyond the scope of this
course, but might include Raw foods pre-post and Cooked foods pre-post. A
t-test in either case would be used. |
71 |
SOLUTION: The data would be appropriately analysed
by means of the paired t-test. Note that the 'pairing' is taking place on
the water samples, each of which is being assessed using BOTH tests. Thus
the 15 water samples with known lead content produces 30 separate results, or 15
pairs. (14 df)
|
72 |
SOLUTION: t-test for unpaired data. (18 df)
|
73 |
SOLUTION: Chi-square analysis in 2x2 contingency table.
True relative risk is appropriate here because you DO have the true incidence
data. The two exposure groups were all healthy at the start of the study 30
years ago, and have been followed. Therefore you have the incidence data.
RR = Ie/Io or Incidence rate for exposed group over the Incidence rate for non-exposed group. |
74 |
SOLUTION: Two Variables: Ind.var is categorical with 3
levels. Dep.var is continuous. 1-way ANOVA |
75 |
SOLUTION: first Ind.var is categorical (3 levels),
second Ind.var is also categorical with three levels, Dep.var. is
continuous. Two-way ANOVA. Block design if only one obs per
group, or Factorial if >1 obs per group |
76 |
SOLUTION: Chi-square analysis in 3x3 contingency
table. But things can get complex. If we do this and just have
the total count in each cell, we get a test of the relationship between haz
types and training groups. (No pass/fail) So we may stratify -
and that is beyond the scope of this course |
77 |
SOLUTION: Chi-square analysis in
2x3 contingency table.
|
78 |
Click
here for detailed solution |
79 |
F= 23.53, 2, 33 df, P<0.01
detailed solution |
80 |
F= 3.190, 2,12 df, P>0.05
detailed solution
|
81 |
detailed
solution |
ˇ
EXTRA QUESTION SET [ #57-80 ]
57. You are investigating risk factors among farm workers
for contracting leptospirosis. A group of 30 patients has been identified, as well as a
group of 40 non-leptospirosis controls. The following data are the results from
enquiries about possible exposures in the previous three months. Data shown are the
number stating "yes" to each exposure:
(Q57)
|
Have you
handled wild animals?
|
Do you have a
mice infestation?
|
Have you
visited a zoo?
|
Have you
handled garden soil?
|
Have you
repaired sewer pipes or drains?
|
cases
|
6
|
10
|
4
|
20
|
21
|
controls
|
8
|
12
|
6
|
20
|
12
|
58. Data: lead (in ppm) in soil samples (Y) measured at a
distance (X) metres from the smoke stack of a lead processing plant. The regression
coefficient is -1.1927; the standard error of the regression coeff. is 0.119488;
estimated lead concentration at the base of the stack is 778.05 ppm. (a) Plot
the data on a scattergram; (b) show the least-squares line; (c) predict the lead in the
soil at 250M and 500M distance from the stack; (d) test the null hypothesis of no
association; (e) clearly summarize. (Note: you do NOT have to calculate the parameters
from the original data.
lead(ppm)
y
|
40
|
510
|
330
|
160
|
700
|
610
|
220
|
440
|
distance(M)
x
|
650
|
180
|
405
|
510
|
90
|
190
|
380
|
290
|
59
The results of an investigation into the cadmium content in
the blood of children from two areas (A and B) concludes with a statement that the (mean
of A) minus (mean of B) was 8.3 µg Cd/100ml blood; 24 df, t=2.01. (a) What is the
probability that a mean difference of this amount could have been observed in these two
sample groups if there was really no difference between the two areas in terms of
children's blood-cadmium? (b) Is this difference statistically significant at
the 5% rejection level? (c) Describe clearly what you understand from the
results; (d) how many children participated in the study?
60. In correlating haemoglobin levels with estimated
exposure to toxic solvents among 100 victims of a chemical spill, the linear correlation
coefficient is found to be -0.70 Explain this relationship in detail,
including a test of Ho that rho=zero.
61. N=520 men who worked at a Northern Ontario asbestos
mill between 1950 and 1970 have been identified through employment records and medical
records, and their health/survival status since then is compared with paper mill workers
in the same region during the same period. It is found that of the asbestos workers,
25 have developed (or died from) lung cancer. Of 1005 paper mill workers, 12 have been
diagnosed (or died from) the disease. (a) Calculate an appropriate risk measurement,
and (b) explain what this measurement means.
62. Three cells in a 3x3 contingency table have observed
values less than 5, and two of them have expected values less than 5. (a) Is
chi-square analysis appropriate here? (b) Why or why not?
63. The effect of distance from an incinerator smoke stack is being investigated. The
following data show the lead content in the soil at various distances downwind from the
base of the stack. (A) construct a scattergram showing the data and the
least-squares line. (B) From your regression model (the equation, not the
scattergram) estimate the lead in soil at 650M and 400M. (C) test the null
hypothesis that ?zero. (D) summarize fully.
Distance (M)
|
100
|
180
|
220
|
340
|
430
|
500
|
610
|
700
|
Lead in soil
(ppm)
|
680
|
580
|
450
|
420
|
380
|
205
|
200
|
110
|
65. The following data describe the effect of blood lead on haematocrit (packed red
blood cells as % of whole blood). Calculate correlation and regression coefficients.
Plot the data and the least-squares line on the scattergram. Test the
hypotheses (2) that beta=zero and rho=zero
lead µg/dl
|
5
|
11
|
13
|
18
|
21
|
27
|
32
|
38
|
41
|
44
|
50
|
58
|
60
|
H.crit%
|
45
|
43
|
44
|
39
|
41
|
35
|
37
|
33
|
29
|
32
|
31
|
26
|
24
|
70.
What method would you use for these?
(70 a-b-c-d-)
70A... A study of
effectiveness of new skin antiseptic barrier cream to be used in food
manufacturing. Two hundred & forty workers are recruited and randomized
into two groups - one group to receive the antiseptic cream, and the other group
to receive a similar-looking product without antiseptic action. Each day a
hand swab test is taken from the workers and analyzed for
E. coli which are reported as "present" or "absent"
70B.... The above study
but the outcome (E.coli) is required
as a count.
70C...Someone has suggested that those people working with raw foods would be exposed to
a much greater bacterial load, so a further division is made between "raw" and
"cooked" foods. How would this change the analysis?
70D....As an alternative to the basic study in (70C) above, the investigators contemplate
taking all 240 workers and letting them work for a month (with hand swabs every day)
before they are asked to use the antiseptic hand cream daily. They are then swabbed daily
for another month. What is the statistical analysis you would recommend here?
71... A filter system for removing lead from the water supply is being compared with a
conventional filter system. Fifteen water samples with known (but different)
concentrations of lead are passed through each of the filters, and the resulting filtrate
is examined for lead content. There are 30 samples tested in this way. What
method of analysis would you use here?
72....The number of new cases of diarrhoea in infants in a six month period is the
outcome measure used in a study of the effectiveness (if any) of a strict double-handwash
procedure after changing diapers. Ten centres are selected and invited to
participate in the trial, along with another 10 which will undertake normal handwashing
practices.
73... In 1975, during
the replacement of fuel rods at a nuclear plant, 32 workers were accidentally
exposed to radiation for about 4 hours. They have been followed and their
health outcomes compared to a group of 48 welders and pipe fitters in a
conventional power station. After thirty years, the 8 of the nuclear plant
workers and 9 of the conventional plant works have been diagnosed with some form
of cancer. What is the appropriate risk measurement? What is the
method of analysis to be used?
74
Three methods of training ESL workers
on WHMIS procedures are being compared: lecture, power-point, and
animated cartoon. The outcome is a
knowledge test score out of 40.
What method of analysis is to be used here?
?/fo
75
Same
three methods of WHMIS training but this time someone suggests the type of
work may play am important part. So
the workers are divided into low hazard, medium hazard, and high hazard
jobs.
What method of analysis is to be used here?
?/fon
76
Same
project comparison between three WHMIS training groups x three
types of hazard
with which the people are working,
but this time the dependent var is just
‘pass/fail' the standard test.
What method of analysis is to be used here?
?/fo
77
A
herbal treatment for contact dermatitis is being
evaluated against the conventional treatment of corticosteroids and
emollient creams. The outcome after 7 days is recorded as improved? unchanged?
or worse?
What method of analysis is to be used here?
78
You are investigating the extent to which INDOOR air quality (IAQ) in office buildings
without openable windows compares to air quality in similar buildings equipped with
windows that open for 5 hours/day. Measurements are taken in three types of EXTERNAL air
quality (EAQ). The dependent variable is the count of suspended particles (10 microns or
less) per 10 cc. Complete the ANOVA table and summarize.
GOOD |
MODERATE |
POOR |
CLO |
OPN |
CLO |
OPN |
CLO |
OPN |
N=10 |
N=10 |
N=10 |
N=10 |
N=10 |
N=10 |
MEAN=49.6 |
MEAN=36.3 |
MEAN=49.4 |
MEAN=45.6 |
MEAN=49.7 |
MEAN=54.5 |
source |
SS |
df |
MS |
F |
P |
all groups |
1842.53 |
|
|
|
|
EAQ |
810.13 |
|
|
|
|
wind O/C |
240.00 |
|
|
|
|
interaction |
|
|
|
|
|
residual |
|
|
|
|
|
# 79
Three methods of teaching infection control are
being compared using three groups of students selected at random.
Their scores out of 20 are shown. Is there a significant difference
between the means of each group? Explain clearly.
|
Method
A:
Method B
Method C |
5.0 9.0
5.0 8.0 5.0
7.0 6.0 7.0 6.0
6.0 6.0 7.0
12.0 6.0 11.0
7.0 10.0 7.0 10.0
8.0 10.0 9.0 9.0
9.0
3.0 3.0 8.0
7.0 4.0 4.0
6.0 4.0 5.0
5.0 4.0 4.0
|
# 80
The pH of three types of canned tomatoes are being
compared using the pH readings are shown. Is there a
significant difference between the means of each group? Explain
clearly.
|
Method
1:
Method 2
Method 3 |
1.9,
2.4, 2.5, 2.6, 3.0
2.0, 2.1, 3.0, 3.1, 3.3
2.8, 2.9, 3.3, 3.3, 3.8
|
NEW
81.
In the following ANOVA analysis, two systems of purifying water (‘old’
and ‘new’) are being compared, with the pH of the water also being
studied. The dependent variable is the ppm of the contaminant (least is
better). Twenty-eight separate measurements have been made. Note that
you do not have the original data, just the means, total, and N for each
group. The ANOVA table has been completed and is shown below. |
|
NEW EQUIPM |
OLD EQUIPM |
Low pH |
High pH |
Low pH |
High pH |
TOTAL
MEAN
N |
174.00
24.86
7.00 |
172.00
24.57
7.00 |
303.00
43.29
7.00 |
238.00
34.00
7.00
|
source |
SS |
DF |
MS |
F |
P |
ALL GRPS |
|
|
|
|
|
EQUIP |
|
|
|
|
|
pH |
|
|
|
|
|
EQ
X pH |
|
|
|
|
|
RESIDUAL |
|
|
|
|
|
|
|
|
|
|
|
TOTAL |
1850.11 |
|
|
|
|
_________________________________________________________________
[a]
Enter the missing values in the ANOVA table above
[b]
Now select the single correct statement that summarizes the
analysis
A. Old equipm and
new equipm perform equally in low pH situations
B. New equipm is
not affected by pH, but performs consistently better than the old
C. In high pH
older equipm performs better, but in low pH new equipm is better
D. Old equipm and
new equipm perform equally in high pH situations
E. The
performance of the old equipment is not affected by pH
|
|