One-Sample Hypothesis Tests

In this section students will:

  1. Learn about hypothesis tests and notation
  2. Learn procedure for hypothesis tests
  3. Describe Type I and Type II errors and recognize when they occur
  4. Learn how to complete a test with \(z\) and \(t\)
  5. Learn and use the 5 step procedure for hypothesis testing
  6. Learn and use the 5 step procedure for hypothesis testing with R output

Introduction to hypothesis tests

We have learned about estimating parameters by point estimation and interval estimation (specifically confidence intervals). More often than not, the objective of an investigation is not to estimate a parameter but to decide which of two (or more) contradictory claims about the parameter is correct.

This part of statistics is called hypothesis testing

Statistical hypotheses is a claim or assertion about

  1. The value of a single parameter
  2. The values of several parameters
  3. The form of an entire probability distribution

Hypotheses

  1. Null hypothesis, denoted by \(H_0\) (“H-not” or “H-zero”), is a statement about the population that either is believed to be true or is used to put forth an argument unless it can be shown to be incorrect beyond a reasonable doubt (the “prior belief” or “historical” claim)
  2. Alternative hypothesis, denoted by \(H_a\) (“H.A.”), is the assertion that is contradictory to \(H_0\); it is a researcher’s claim, what they are trying to prove (thus the reason behind the study)

When stating the hypotheses, the notation used is always population parameter notation; inferences upon populations need population notation (the Greek letters).

\(\mu\) for the mean and \(p\) for the proportion

Since the null and alternative hypotheses are contradictory, evidence must be examined to decide if there is enough evidence to reject the null hypothesis or not. The evidence is in the form of sample data. After it has been determined which hypothesis the sample supports, a decision is made.

There are two options for a decision:

  1. “Reject the null hypothesis” if the sample information favors the alternative hypothesis, or
  2. “Do not reject the null hypothesis” if the sample information is insufficient to reject the null hypothesis.

Hypothesis Testing Checklist

All tests (when done by hand) include the following five steps:

  1. State hypotheses, check assumptions
  2. Calculate the test statistic
  3. Find the rejection region
  4. Results and conclusion of the test
  5. State possible error in context

Hypotheses for inferences of mean, \(\mu\)

\(H_0\) always has a symbol with an equal in it. \(H_a\) never has a symbol with an equal in it. The choice of symbol depends on the wording of the hypothesis test. However, it is common in practice to just use \(=\) in the null hypothesis, even with > or < as the symbol in the alternative hypothesis. This practice is acceptable because the decision is only made to either reject or not reject the null hypothesis

Null hypothesis:

\[H_0: \mu=\mu_0\] Alternative hypotheses (choose one):

(1) \(H_a: \mu \neq \mu_0\)
(2) \(H_a: \mu>\mu_0\)
(3) \(H_a: \mu<\mu_0\)

The hypotheses are regardless of whether or not \(z\) or \(t\) is used. Most often the null hypothesis will have \(=\) while the alternative will be one of either \(\neq\), \(>\), or \(<\)

\(\mu_0\) is a specified value (a number that is given in the problem)

Hypotheses for inferences for proportion, \(\pi\)

Null hypothesis:

\[H_0: p=p_0\]
Alternative hypotheses (choose one):

(1) \(H_a: p \neq p_0\)
(2) \(H_a: p>p_0\)
(3) \(H_a: p<p_0\)

Most often the null hypothesis will have \(=\) while the alternative will be one of either \(\neq\), \(>\), or \(<\).

\(p_0\) is a specified value (a number that is given in the problem)

Assumptions

  1. Randomization: proper randomization was used
    • Takes care of independence issue if there is one
  2. Independence: observations are independent from one another
  3. Normality
    1. Means need an approximate normal distribution
      • \(n \geq 30\)
      • Distribution is normal
      • Assess normality with graph
    2. Proportions need \(n \geq60\) (via CLT)

If assumptions are violated, the results from the analyses are not valid nor reliable

Test Statistic

1-sample test of the mean \(\mu\) when \(\sigma\) is known: Use \(Z\)

\[z=\frac{\overline{X}-\mu_0}{se}~~se=\frac{\sigma}{\sqrt{n}}\]

1-sample test of the proportion \(p\): Use \(Z\) \[z=\frac{\hat p-p_0}{se}~~se=\sqrt{\frac{p_0q_0}{n}}~~and~q_0=1-p_0\]

1-sample test of the mean \(\mu\) when \(\sigma\) is unknown: Use \(t\)
\[t=\frac{\overline{X}-\mu_0}{se}~~se=\frac{s}{\sqrt{n}}\]

In practice (other than one example in class), most tests like this are done with \(t\) instead of \(z\)

Rejection Region

Is based on significance level \(\alpha\). \(\alpha=1-CL\) where CL is the confidence level

Always assume \(\alpha=0.05\) unless specified otherwise)

Two methods for rejection:

  1. Critical value approach (not learning)
  2. \(pvalue\) approach

The alternative hypothesis (\(H_a\)) determines rejection based on where you are at on the curve

\(pvalue\) logistics

\[pvalue\leq \alpha\Rightarrow Reject~H_0\]

The \(pvalue\) approach; the null hypothesis can be rejected \(iff\) (if and only if) \(pvalue\leq\alpha\) (with \(\alpha=0.05\) most often). This does not change, regardless of the sign of the alternative hypothesis. However, the calculation of the \(pvalue\) is dependent on the sign of the alternative hypothesis. The \(pvalue\) will be the \(P(\) the results of the test \(| H_0\) is correct), in other words, it is the probability that the results would occur by random chance if the null hypothesis is actually correct.

Assume that \(\alpha=0.05\) unless specified; any rejection of \(H_0\) means that the results (of experiment, survey, etc.) are significant. The smaller the \(pvalue\), the more significant results are (because the test statistic will increase in magnitude the further from the mean it is).

The calculation of the \(pvalue\) is dependent on the type of test you are doing, as in one-tail upper, one-tail lower, or two-tail. The sign of the alternative hypothesis is the determining factor in calculation of the \(pvalue\).

\(pvalue\) Calculations

Note that while all examples are with \(z\), it is interchangeable with \(t\) (\(df\) is needed). In this case, \(pvalue\) represents the area in the specified tail(s) of the distribution

\[\text{When }H_a: >~,~pvalue=P(Z\geq z_{calc})\]
\[\text{When }H_a: <~,~pvalue=P(Z\le z_{calc})\]
\[\text{When }H_a: \ne~,~pvalue=2P(Z\ge |z_{calc}|)\]

Upper tail test \(H_a:~>\)

Note that while all examples are with \(z\), it is interchangeable with \(t\) (\(df\) is needed). In this case, \(pvalue\) represents the area in the specified tail(s) of the distribution

\[pvalue=P(Z\geq z_{calc})\]

Right tail pvalue

Lower tail test \(H_a:~<\)

\[pvalue=P(Z\leq z_{calc})\]

Left tail pvalue

Two-tailed test \(H_a:~\neq\)

\[pvalue=2[P(Z\geq |z_{calc}|)]\]
Two-tailed pvalue

Results and Conclusion

Errors

Type I=\(\alpha=P(reject~H_0|H_0~true)\). This is a conditional probability statement that reads as “the probability of rejecting the null given that the null is true.”
TLDR; we rejected a true null hypothesis (that’s a bad thing).
Type I can only happen when \(H_0\) is rejected

Type II=\(\beta=P(Fail~to~reject~H_0|H_0~false)\). This is a conditional probability statement that reads as “the probability of not rejecting the null given that the null is false.”
TLDR; we kept a false hypothesis (again, a bad thing).
Type II can only happen when \(H_0\) is not rejected

Power=\(1-\beta=P(reject~H_0|H_0~false)\). This is a conditional probability statement that reads as “the probability that the null is rejected given that it is false.”
TLDR; we correctly rejected \(H_0\) when \(H_0\) is false (a good thing. Finally!)

Errors
Errors

\(pvalue\) for \(t\)

Finding a \(pvalue\) for \(t\) that results in a single probability number is impossible when using the t-table rather than with software (or a website). So, with t, to find the \(pvalue\):

  1. Find the row on the table that corresponds to the \(df\)
  2. Look along the \(df\) row for the calculated t score
  3. Find the 2 numbers that enclose the calculated test statistic
  4. Then go to the top of the table and choose the row for one or two tails (depending on the hypotheses). The \(pvalue\) will be between 2 numbers
  5. If the \(pvalue\) is less than both endpoints of the \(pvalue\) interval, the null hypothesis can be rejected. If the \(pvalue\) is less than just one of the endpoints or is greater than both endpoints, then the null hypothesis cannot be rejected

Test of \(\mu\) when \(\sigma\) is known

A manufacturer of sprinkler systems used for fire protection in office buildings claims that the true average system-activation temperature is \(130^{\circ}\)F. It is known from previous studies that the temperatures are normally distributed with standard deviation \(1.5^{\circ}\)F. In a random sample of \(n=9\) systems, yields a sample average activation temperature of \(131.08^{\circ}\)F. Is there sufficient evidence that the true mean activation temperature is more than what the manufacturer claims?

Provided information: \(\overline{x}=131.08\), \(\sigma=1.5\), \(n=9\), \(\mu_0=130\)

  1. Hypotheses: \(H_0: \mu=130\) and \(H_a: \mu > 130\)
    Assumptions are met: randomization (yes), independence (yes because random), normality (yes temps are normal)
  2. Test statistic: \(z=\frac{\overline{x}-\mu}{se}\) and \(se=\frac{\sigma}{\sqrt{n}}=\frac{1.5}{\sqrt{9}}=0.5\)
    \(z=\frac{131.08-130}{0.5}=2.16\)
  3. Rejection region: \(H_a:~>\), it is an upper-tail test (also called a one-tailed test). Reject \(H_0\) if \(pvalue\le\alpha\), \(\alpha=0.05\), with \(pvalue=P(Z\ge z_{calc})=0.015\)
  4. Results and conclusion: \(0.015\le \alpha~(0.05)\therefore\) (therefore) \(H_0\) is rejected. Since \(H_0\) was rejected, there is evidence the mean activation temperature is higher than the manufacturer’s claim of \(130^{\circ}F\)
  5. Error: since \(H_0\) was rejected, a Type I error (reject null when null is true) could have been made; we think the activation temperature is higher than the claim but it is not higher

Test of \(p\)

Ingots are huge pieces of metal often weighing more than 10 tons (20,000 lbs.). They must be cast in one large piece for use in fabricating large structural parts for cars and planes. If they crack while being made, the crack can propagate into the zone required for the part, compromising its integrity; metal manufacturers would like to avoid cracking if at all possible. In one plant, only about 80% of the ingots have been defective-free. In an attempt to reduce the cracking, the plant engineers and chemists have tried some new methods for casting the ingots and from a random sample of 500 ingot cast in the new method, 16% of the casts were found to be defective (cracked). Is there sufficient evidence that the defective rate has decreased?

Provided information: \(\hat p=0.16\), \(n=500\), \(p_0=0.20\), and \(q_0=1-p_0=1-0.2=0.8\)

\(se=\sqrt{\frac{p_0q_0}{n}}=\sqrt{\frac{(0.2)(0.8)}{500}}=0.0179\)

  1. Hypotheses, assumptions
    \(H_0: p=0.2\) \(H_a: p<0.2\) Assumptions: randomization (yes), independence (yes because random), normality (yes, \(n=500\ge 60\))
  2. Test statistic: \(z=\frac{\hat p-p_0}{se}=\frac{0.16-0.2}{0.0179}=-2.23\)
  3. Rejection region (\(pvalue\)): Since \(H_a: <\), it is a lower-tail test (also called one-tailed test). \(pvalue=P(Z<z_{calc})=P(Z<-2.23)=0.013\)
  4. Results and conclusion: Reject \(H_0\) if \(pvalue\le\alpha~(0.05)\). \(\le \alpha~(0.05)\therefore\) (therefore) \(H_0\) will be rejected. Since \(H_0\) was rejected, there is sufficient evidence the new casting method’s defective rate is significantly less than the current method (in other words, there is a significant decrease in the defect rate).
  5. Error: since \(H_0\) was rejected, a Type I error (reject null when null is true) could have been made; we think the defect rate decreased from the new method but it did not

Test of \(\mu\) when \(\sigma\) is not known

Suppose a clinical audit of 20 randomly selected hospitals was conducted, a care quality score (0–100) was calculated for each and researchers were interested in investigating whether these hospitals differ from a national average quality score. The sample mean quality score is \(85.31\) with sample standard deviation \(16.24\). The researchers want to know if there is sufficient evidence the mean quality score differs from the national average, 90.3? Let \(\alpha=0.01\).

{recho=F,fig.alt='Graphs of hospital quality scores'} hist(hospital,breaks=8) boxplot(hospital,horizontal=T,main='Hospital Quality Scores')

  1. Hypotheses, assumptions
    \(H_0: \mu=90.3~vs.~H_a: \mu\neq90.3\)
    Assumptions: randomization (yes), independence (yes because random), normality (yes boxplot is normal)
  2. Test statistic, df \(t=\frac{85.31-90.3}{3.63}=-1.37\), \(df=n-1=20-1=19\)
  3. Rejection region: \(pvalue=2P(T>|t_{calc,df})=2P(T>|-1.37|_{df=19})=2(0.093=0.187\). With the table, we will not get a single number for the \(pvalue\) and it will have to be in an interval. The \(pvalue\) from the table for this example is \(0.1<pvalue<0.2\)
  4. Results and conclusion: Reject null if \(pvalue \le \alpha~(0.01)`\). \(0.187\nleq \alpha~(0.01)\therefore~H_0\) cannot be rejected. There is not enough evidence to say that the hospitals’ quality scores differ from the national average (there is no significant difference between the national score and the hospitals’ score)
  5. Error: since \(H_0\) was not rejected, a Type II error (not rejecting the null when the null is false) could have been made; we think the hospitals’ quality score is the same as the national average but they are different

Relationship between tests\(^*\) and CIs

*: The outcome of a two-tailed test is also contained in a CI of same significance level.

If your hypotheses are \(H_0: \mu=5\) and \(H_a: \mu\ne 5\) and your CI is \(3,7\), since the hypothesized value of 5 is in the CI, the null cannot be rejected. If your CI is \(2,4\), then the hypothesized value of 5 is not in the CI and the null can be rejected

This only works when the test is a two-tailed test

Analyses in R

Sprinklers

xbar=131.08
sigma=1.5; n=9
se=sigma/sqrt(n); se
[1] 0.5
# CI
zstar=qnorm(1-.05/2)
bound=zstar*se
lower=xbar-bound
upper=xbar+bound
rbind(zstar,bound,lower,upper)
            [,1]
zstar   1.959964
bound   0.979982
lower 130.100018
upper 132.059982
# Hypothesis test
mu0=130
zcalc=(xbar-mu0)/se
pvalue=1-pnorm(zcalc)
rbind(zcalc,pvalue)
             [,1]
zcalc  2.16000000
pvalue 0.01538633
  1. Hypotheses, assumptions
    \(H_0: \mu=130~H_a: \mu>130\)
    Assumptions: random (yes), independence (yes because random), normality (yes temps are normal)
  2. Test statistic, \(df\), \(pvalue\)
    \(z=2.16\), \(pvalue=0.015\)
  3. Results: \(pvalue=0.015\leq\alpha(0.05) \therefore\) (therefore) \(H_0\) is rejected
  4. Conclusion: since the null is rejected, that means that there is evidence that the sprinkler activation temperature is higher than the manufacturer’s claim of \(130^{\circ}\)F
  5. Error: since \(H_0\) was rejected, a Type I error (reject null when null is true) could have been made; we think the activation temperature is higher than the claim but it is not higher

Ingots

phat=0.16; qhat=1-phat; n=500
se.phat=sqrt(phat*qhat/n)
# CI
zstar=qnorm(1-.05/2)
bound=zstar*se.phat
lower=phat-bound
upper=phat+bound
rbind(phat,qhat,se.phat,zstar,bound,lower,upper)
              [,1]
phat    0.16000000
qhat    0.84000000
se.phat 0.01639512
zstar   1.95996398
bound   0.03213385
lower   0.12786615
upper   0.19213385
# Hypothesis test
p0=0.2; q0=1-p0
se.p0=sqrt(p0*q0/n)
zcalc=(phat-p0)/se.p0
pvalue=pnorm(zcalc)
rbind(phat,qhat,p0,q0,se.p0,zcalc,pvalue)
              [,1]
phat    0.16000000
qhat    0.84000000
p0      0.20000000
q0      0.80000000
se.p0   0.01788854
zcalc  -2.23606798
pvalue  0.01267366
  1. Hypotheses, assumptions
    \(H_0: p=0.2~vs.~H_a: p<0.2\)
    Assumptions: random (yes), independence (yes because random), normality (yes \(n=500\geq60\))
  2. Test statistic, \(pvalue\)
    \(z=-2.24\), \(pvalue=0.013\)
  3. Results: \(pvalue=0.013\leq\alpha(0.05) \therefore~H_0\) is rejected
  4. Conclusion: since the null is rejected, that means that there is evidence that the defect rate has significantly decreased (because of the new method)
  5. Error: since \(H_0\) was rejected, a Type I error (reject null when null is true) could have been made; we think the defect rate decreased from the new method but it did not

Hospital quality score

hist(hospital,breaks=8)

Boxplot of hospital quality score

boxplot(hospital,horizontal=T,main='Hospital Quality Scores')

Boxplot of hospital quality score

t.test(hospital,mu=90.3,conf.level=.99)

    One Sample t-test

data:  hospital
t = -2.263, df = 19, p-value = 0.03554
alternative hypothesis: true mean is not equal to 90.3
99 percent confidence interval:
 66.21827 93.11016
sample estimates:
mean of x 
 79.66422 
  1. Hypotheses, assumptions
    \(H_0: \mu=90.3~vs.~H_a: \mu\neq90.3\)
    Assumptions: random (yes), independence (yes because random), normality (yes boxplot is normal)
  2. Test statistic, \(df\), \(pvalue\)
    \(t=-2.263\), \(df=19\), \(pvalue=0.03554\)
  3. Results: Reject null if \(pvalue \le \alpha\). \(pvalue=0.03554\nleq\alpha~(0.01) \therefore~H_0\) is not rejected
  4. Conclusion: since the null is not rejected, that means that there is not enough evidence to say that the hospitals’ quality scores differ from the national average
  5. Error: since \(H_0\) was not rejected, a Type II error (not rejecting the null when the null is false) could have been made; we think the hospitals quality scores are the same as the national average but they are different

Output differences

When doing a test with R, if the test is a 2-tailed test, R provides the normal CI (with 2 numbers). However, when the test is a one-tail test in R, the output will have one-sided CIs (briefly discussed in last module).

t.test(hospital,mu=90.3,conf.level=.99)

    One Sample t-test

data:  hospital
t = -2.263, df = 19, p-value = 0.03554
alternative hypothesis: true mean is not equal to 90.3
99 percent confidence interval:
 66.21827 93.11016
sample estimates:
mean of x 
 79.66422 
t.test(hospital,mu=90.3,conf.level=.99,alternative='l')

    One Sample t-test

data:  hospital
t = -2.263, df = 19, p-value = 0.01777
alternative hypothesis: true mean is less than 90.3
99 percent confidence interval:
     -Inf 91.59939
sample estimates:
mean of x 
 79.66422 
t.test(hospital,mu=90.3,conf.level=.99,alternative='g')

    One Sample t-test

data:  hospital
t = -2.263, df = 19, p-value = 0.9822
alternative hypothesis: true mean is greater than 90.3
99 percent confidence interval:
 67.72904      Inf
sample estimates:
mean of x 
 79.66422 

Haha I could not resist

p>0.05
p>0.05