In this section students will:
R outputWe have learned about estimating parameters by point estimation and interval estimation (specifically confidence intervals). More often than not, the objective of an investigation is not to estimate a parameter but to decide which of two (or more) contradictory claims about the parameter is correct.
This part of statistics is called hypothesis testing
Statistical hypotheses is a claim or assertion about
Hypotheses
When stating the hypotheses, the notation used is always population parameter notation; inferences upon populations need population notation (the Greek letters).
\(\mu\) for the mean and \(p\) for the proportion
Since the null and alternative hypotheses are contradictory, evidence must be examined to decide if there is enough evidence to reject the null hypothesis or not. The evidence is in the form of sample data. After it has been determined which hypothesis the sample supports, a decision is made.
There are two options for a decision:
All tests (when done by hand) include the following five steps:
\(H_0\) always has a symbol with an equal in it. \(H_a\) never has a symbol with an equal in it. The choice of symbol depends on the wording of the hypothesis test. However, it is common in practice to just use \(=\) in the null hypothesis, even with > or < as the symbol in the alternative hypothesis. This practice is acceptable because the decision is only made to either reject or not reject the null hypothesis
Null hypothesis:
\[H_0: \mu=\mu_0\] Alternative hypotheses (choose one):
(1) \(H_a: \mu \neq
\mu_0\)
(2) \(H_a:
\mu>\mu_0\)
(3) \(H_a:
\mu<\mu_0\)
The hypotheses are regardless of whether or not \(z\) or \(t\) is used. Most often the null hypothesis will have \(=\) while the alternative will be one of either \(\neq\), \(>\), or \(<\)
\(\mu_0\) is a specified value (a number that is given in the problem)
Null hypothesis:
\[H_0: p=p_0\]
Alternative hypotheses (choose one):
(1) \(H_a: p \neq
p_0\)
(2) \(H_a:
p>p_0\)
(3) \(H_a: p<p_0\)
Most often the null hypothesis will have \(=\) while the alternative will be one of either \(\neq\), \(>\), or \(<\).
\(p_0\) is a specified value (a number that is given in the problem)
If assumptions are violated, the results from the analyses are not valid nor reliable
1-sample test of the mean \(\mu\) when \(\sigma\) is known: Use \(Z\)
\[z=\frac{\overline{X}-\mu_0}{se}~~se=\frac{\sigma}{\sqrt{n}}\]
1-sample test of the proportion \(p\): Use \(Z\) \[z=\frac{\hat p-p_0}{se}~~se=\sqrt{\frac{p_0q_0}{n}}~~and~q_0=1-p_0\]
1-sample test of the mean \(\mu\) when \(\sigma\) is unknown: Use \(t\)
\[t=\frac{\overline{X}-\mu_0}{se}~~se=\frac{s}{\sqrt{n}}\]
In practice (other than one example in class), most tests like this are done with \(t\) instead of \(z\)
Is based on significance level \(\alpha\). \(\alpha=1-CL\) where CL is the confidence level
Always assume \(\alpha=0.05\) unless specified otherwise)
Two methods for rejection:
The alternative hypothesis (\(H_a\)) determines rejection based on where you are at on the curve
\[pvalue\leq \alpha\Rightarrow Reject~H_0\]
The \(pvalue\) approach; the null hypothesis can be rejected \(iff\) (if and only if) \(pvalue\leq\alpha\) (with \(\alpha=0.05\) most often). This does not change, regardless of the sign of the alternative hypothesis. However, the calculation of the \(pvalue\) is dependent on the sign of the alternative hypothesis. The \(pvalue\) will be the \(P(\) the results of the test \(| H_0\) is correct), in other words, it is the probability that the results would occur by random chance if the null hypothesis is actually correct.
Assume that \(\alpha=0.05\) unless specified; any rejection of \(H_0\) means that the results (of experiment, survey, etc.) are significant. The smaller the \(pvalue\), the more significant results are (because the test statistic will increase in magnitude the further from the mean it is).
The calculation of the \(pvalue\) is dependent on the type of test you are doing, as in one-tail upper, one-tail lower, or two-tail. The sign of the alternative hypothesis is the determining factor in calculation of the \(pvalue\).
Note that while all examples are with \(z\), it is interchangeable with \(t\) (\(df\) is needed). In this case, \(pvalue\) represents the area in the specified tail(s) of the distribution
\[\text{When }H_a: >~,~pvalue=P(Z\geq
z_{calc})\]
\[\text{When }H_a: <~,~pvalue=P(Z\le
z_{calc})\]
\[\text{When }H_a: \ne~,~pvalue=2P(Z\ge
|z_{calc}|)\]
Note that while all examples are with \(z\), it is interchangeable with \(t\) (\(df\) is needed). In this case, \(pvalue\) represents the area in the specified tail(s) of the distribution
\[pvalue=P(Z\geq z_{calc})\]
\[pvalue=P(Z\leq z_{calc})\]
\[pvalue=2[P(Z\geq
|z_{calc}|)]\]
Type I=\(\alpha=P(reject~H_0|H_0~true)\). This is a
conditional probability statement that reads as “the probability of
rejecting the null given that the null is true.”
TLDR; we rejected a true null hypothesis (that’s a bad thing).
Type I can only happen when \(H_0\) is rejected
Type II=\(\beta=P(Fail~to~reject~H_0|H_0~false)\).
This is a conditional probability statement that reads as “the
probability of not rejecting the null given that the null is
false.”
TLDR; we kept a false hypothesis (again, a bad thing).
Type II can only happen when \(H_0\) is not rejected
Power=\(1-\beta=P(reject~H_0|H_0~false)\). This is
a conditional probability statement that reads as “the probability that
the null is rejected given that it is false.”
TLDR; we correctly rejected \(H_0\)
when \(H_0\) is false (a good thing.
Finally!)
Finding a \(pvalue\) for \(t\) that results in a single probability number is impossible when using the t-table rather than with software (or a website). So, with t, to find the \(pvalue\):
A manufacturer of sprinkler systems used for fire protection in office buildings claims that the true average system-activation temperature is \(130^{\circ}\)F. It is known from previous studies that the temperatures are normally distributed with standard deviation \(1.5^{\circ}\)F. In a random sample of \(n=9\) systems, yields a sample average activation temperature of \(131.08^{\circ}\)F. Is there sufficient evidence that the true mean activation temperature is more than what the manufacturer claims?
Provided information: \(\overline{x}=131.08\), \(\sigma=1.5\), \(n=9\), \(\mu_0=130\)
Ingots are huge pieces of metal often weighing more than 10 tons (20,000 lbs.). They must be cast in one large piece for use in fabricating large structural parts for cars and planes. If they crack while being made, the crack can propagate into the zone required for the part, compromising its integrity; metal manufacturers would like to avoid cracking if at all possible. In one plant, only about 80% of the ingots have been defective-free. In an attempt to reduce the cracking, the plant engineers and chemists have tried some new methods for casting the ingots and from a random sample of 500 ingot cast in the new method, 16% of the casts were found to be defective (cracked). Is there sufficient evidence that the defective rate has decreased?
Provided information: \(\hat p=0.16\), \(n=500\), \(p_0=0.20\), and \(q_0=1-p_0=1-0.2=0.8\)
\(se=\sqrt{\frac{p_0q_0}{n}}=\sqrt{\frac{(0.2)(0.8)}{500}}=0.0179\)
Suppose a clinical audit of 20 randomly selected hospitals was conducted, a care quality score (0–100) was calculated for each and researchers were interested in investigating whether these hospitals differ from a national average quality score. The sample mean quality score is \(85.31\) with sample standard deviation \(16.24\). The researchers want to know if there is sufficient evidence the mean quality score differs from the national average, 90.3? Let \(\alpha=0.01\).
{recho=F,fig.alt='Graphs of hospital quality scores'} hist(hospital,breaks=8) boxplot(hospital,horizontal=T,main='Hospital Quality Scores')
*: The outcome of a two-tailed test is also contained in a CI of same significance level.
If your hypotheses are \(H_0: \mu=5\) and \(H_a: \mu\ne 5\) and your CI is \(3,7\), since the hypothesized value of 5 is in the CI, the null cannot be rejected. If your CI is \(2,4\), then the hypothesized value of 5 is not in the CI and the null can be rejected
This only works when the test is a two-tailed test
RSprinklers
xbar=131.08
sigma=1.5; n=9
se=sigma/sqrt(n); se
[1] 0.5
# CI
zstar=qnorm(1-.05/2)
bound=zstar*se
lower=xbar-bound
upper=xbar+bound
rbind(zstar,bound,lower,upper)
[,1]
zstar 1.959964
bound 0.979982
lower 130.100018
upper 132.059982
# Hypothesis test
mu0=130
zcalc=(xbar-mu0)/se
pvalue=1-pnorm(zcalc)
rbind(zcalc,pvalue)
[,1]
zcalc 2.16000000
pvalue 0.01538633
Ingots
phat=0.16; qhat=1-phat; n=500
se.phat=sqrt(phat*qhat/n)
# CI
zstar=qnorm(1-.05/2)
bound=zstar*se.phat
lower=phat-bound
upper=phat+bound
rbind(phat,qhat,se.phat,zstar,bound,lower,upper)
[,1]
phat 0.16000000
qhat 0.84000000
se.phat 0.01639512
zstar 1.95996398
bound 0.03213385
lower 0.12786615
upper 0.19213385
# Hypothesis test
p0=0.2; q0=1-p0
se.p0=sqrt(p0*q0/n)
zcalc=(phat-p0)/se.p0
pvalue=pnorm(zcalc)
rbind(phat,qhat,p0,q0,se.p0,zcalc,pvalue)
[,1]
phat 0.16000000
qhat 0.84000000
p0 0.20000000
q0 0.80000000
se.p0 0.01788854
zcalc -2.23606798
pvalue 0.01267366
Hospital quality score
hist(hospital,breaks=8)
boxplot(hospital,horizontal=T,main='Hospital Quality Scores')
t.test(hospital,mu=90.3,conf.level=.99)
One Sample t-test
data: hospital
t = -2.263, df = 19, p-value = 0.03554
alternative hypothesis: true mean is not equal to 90.3
99 percent confidence interval:
66.21827 93.11016
sample estimates:
mean of x
79.66422
When doing a test with R, if the test is a 2-tailed
test, R provides the normal CI (with 2 numbers). However,
when the test is a one-tail test in R, the output will have
one-sided CIs (briefly discussed in last module).
t.test(hospital,mu=90.3,conf.level=.99)
One Sample t-test
data: hospital
t = -2.263, df = 19, p-value = 0.03554
alternative hypothesis: true mean is not equal to 90.3
99 percent confidence interval:
66.21827 93.11016
sample estimates:
mean of x
79.66422
t.test(hospital,mu=90.3,conf.level=.99,alternative='l')
One Sample t-test
data: hospital
t = -2.263, df = 19, p-value = 0.01777
alternative hypothesis: true mean is less than 90.3
99 percent confidence interval:
-Inf 91.59939
sample estimates:
mean of x
79.66422
t.test(hospital,mu=90.3,conf.level=.99,alternative='g')
One Sample t-test
data: hospital
t = -2.263, df = 19, p-value = 0.9822
alternative hypothesis: true mean is greater than 90.3
99 percent confidence interval:
67.72904 Inf
sample estimates:
mean of x
79.66422