2-sample Methods

Two-Sample Analyses

In this section students will:

Learn about 2-sample CIs and hypothesis tests
Learn notation for 2-sample CIs and hypothesis tests
Calculate 2-sample hypothesis tests and CIs
See output from a test done with the program R
Learn how to interpret R output for 2-sample methods
Learn and use the 5 step procedure for hypothesis testing with R output

Comparing two groups

Comparisons:

Two independent means$^*$
- When $\sigma^2_1\approx\sigma^2_2$: Pooled (not learned)
- When $\sigma^2_1\neq\sigma^2_2$: Unpooled (also called Welch or Satterthwaite)
Dependent means
Two proportions (independent)

$^*$While there are two cases for this (when variances are equal or unequal), we will only use the unequal variances (unpooled) method. If the two variances are unequal or equal, the unpooled is appropriate in either case. (In practice, a variance test is done to see if they are equal or not before deciding either pooled or unpooled; we will just learn unpooled and no variance test)

Independent Means Formula: $df$ for use of $t$ and CI

This compares the means of two distinct (separate) groups of units or subjects. The wording used is the difference of two (2) means

Degrees of freedom for (unpooled) independent means is usually calculated rather than using $n-1$ or something similar:

\[df=\frac{\left(\frac{s^2_1}{n_1}+\frac{s^2_2}{n_2}\right)^2}{\frac{\left(s^2_1/n_1\right)^2}{n_1-1}+\frac{\left(s^2_2/n_2\right)^2}{n_2-1}}\]

We will be using the smaller of the two sample sizes (minus one) \[df=min(n_1-1,n_2-1)\]

CI for the difference of two (independent) means:

\[\overline{X}_1-\overline{X}_2 \pm t^{\star}(se)\text{ with }se=\sqrt{\frac{s^2_1}{n_1}+\frac{s^2_2}{n_2}}\]

$t^{\star}$ is found the same way as in the one-sample methods

Interpretation: “With ___% confidence, the true difference of (independent) means of <insert context> is between (lower) and (upper) <units of measurement>.”

Hypotheses for Difference of Two Independent Means

For the difference of two (independent) means¹:

\[H_0: \mu_1=\mu_2~~H_a: \mu_1\left(\begin{array} {lll} \neq \\ > \\ < \\ \end{array}\right)\mu_2\]

\[Or\]

\[H_0: \mu_1-\mu_2=0~~H_a: \mu_1-\mu_2\left(\begin{array} {lll} \neq \\ > \\ < \\ \end{array}\right)0\]

Assumptions

Randomization
Independence (if random met, this is met)
Normality

$n_i\ge 30$
graphs
statement of normality

Formula: Test Statistic

\[t=\frac{\overline{X}_1-\overline{X}_2}{se}~~where~se=\sqrt{\frac{s^2_1}{n_1}+\frac{s^2_2}{n_2}}\]

With
\[df=min(n_1-1,n_2-1)\]

Dependent means

This compares the mean of the difference between two measurements of the same unit or subject. The wording used is the mean difference. This analysis is for comparing measurements on the same subject/unit; once before a treatment and once again after the treatment, to detect if there is a difference due to the treatment.

Examples are weight loss programs, Coke vs. Pepsi, compare GDP of countries at 2 different dates (time is treatment)

Formula: CI

$d_i$: individual differences between measurements

$\overline{X}_d=\frac{\sum{d_i}}{n}$ sample mean difference (mean of the differences)

$s_d=\sqrt{\frac{\sum{(d_i-\overline{X}_d)^2}}{n-1}}$: sample standard deviation of the differences

\[\overline{X}_d \pm t^{\star}(se)~~where~se=\frac{s_d}{\sqrt{n}}~~and~df=n-1\]

$t^{\star}$ is found the same as in one-sample methods

Interpretation: “With ___% confidence, the true mean difference of <insert context> is between (lower) and (upper) <units of measurement>.”

Hypotheses

For the mean difference²:

\[H_0: \mu_d=0~~H_a: \mu_d\left(\begin{array} {lll} \neq \\ > \\ < \\ \end{array}\right)0\]

Assumptions

Randomization
Independence (of units/subjects)
Normality of differences
Two measurements per unit/subject

Formula: Test Statistic

\[t=\frac{\overline{X}_d-0}{se}\text{ with }se=\frac{s_d}{\sqrt{n}}\]

Two Proportions

This compares the proportions of two distinct (separate) groups of units or subjects. The wording used is the difference of two (2) proportions

Formula: CI

CI for the difference of two (independent) proportions:

\[\hat p_1-\hat p_2 \pm z^{\star}(se) \text{ with }se=\sqrt{\frac{\hat p_1\hat q_1}{n_1}+\frac{\hat p_2\hat q_2}{n_2}}\]
with $z^{\star}$ found the same way in one-sample methods

Interpretation: “With ___% confidence, the true difference of (independent) proportions of <insert context> is between (lower) and (upper) <units of measurement>.”

Hypotheses

For the difference of two (independent) proportions³:

\[H_0: p_1=p_2~~H_a: p_1\left(\begin{array} {lll} \neq \\ > \\ < \\ \end{array}\right)p_2\]

\[Or\]

\[H_0: p_1-p_2=0~~H_a: p_1-p_2\left(\begin{array} {lll} \neq \\ > \\ < \\ \end{array}\right)0\]

Assumptions

Randomization
Independence (if random met, this is met)
Normality

$n_1\geq60$ AND $n_2\geq60$

Formula: Test Statistic

\[z=\frac{\hat p_1-\hat p_2}{se}\text{ with }se=\sqrt{\frac{ p_1\hat q_1}{n_1}+\frac{\hat p_2\hat q_2}{n_2}}\]

Analyses

The basic process is the same as for 1-sample methods. Make sure to follow the 5 steps to hypothesis testing:

State hypotheses, check assumptions if requested
Test statistic $z$ or $t$ and $df$ (if applicable)
Rejection region with $pvalue$
State results and conclusion in context from results
State possible error that could have been made and discuss it within the context

2 Independent Means

Some archaeologists theorize that ancient Egyptians interbred with several different immigrant populations over thousands of years. To see if there is any indication of changes in body structure that might have resulted, in a random sample they measured 30 skulls of male Egyptians dated from 4000 BCE and 30 others dated from 200 BCE

Is there sufficient evidence that the mean breadth of males’ skulls increased (as theorized by archaeologists) over this period? Conduct hypothesis test
Estimate the true difference of means with 95% confidence and interpret

Boxplot of Egyptial Male Skull breadths (200 BCE and 4000 BCE

Egypt setup

\[H_0: \mu_1=\mu_2~~H_a: \mu_1>\mu_2\]
(or $H_0: \mu_1-\mu_2=0~~H_a: \mu_1-\mu_2>0$)

Assumptions:
(1) Random: yes
(2) Independence: yes because random
(3) Normality: $n_1=n_2=30\geq30$ so yes

Organization of information:
$n_1=30$
$n_2=30$
$H_a:~>$ (upper tail test)
$\alpha=0.05$ (assumed because not specifically stated otherwise)

       200BCE 4000BCE
xbari 135.633 131.367
sdi     4.038   5.129
ni     30.000  30.000

Hypotheses, assumptions $H_0: \mu_1=\mu_2$ and $H_a:\mu_1>\mu_2$ Assumptions: randomization (yes), independence (yes because random), normality (yes; $n_1=n_2=30$)
Test statistic, df $se=\sqrt{\frac{s^2_1}{n_1}+\frac{s^2_2}{n_2}}=\sqrt{\frac{4.038^2}{30}+\frac{5.129^2}{30}}=1.192$ and $t=\frac{\overline{x}_1-\overline{x}_2}{se}=\frac{135.633-131.367}{1.192}=3.5788591\approx 3.579$. $df=min(n_1-1,n_2-1)=min(30-1,30-1)=29$
Rejection region: $pvalue=P(T>t_{calc,df})=P(T>3.58_{df=29})=0.001<pvalue<0.0005$
Results and conclusion: reject $H_0$ if $pvalue\le\alpha$. With $0.001<pvalue<0.0005$ and both endpoints are $\le\alpha~(0.05)\therefore~H_0$ is rejected. There is a significant increase in skull breadths over the era between 4000 BCE and 200 BCE
Error: since the null hypothesis was rejected, we could have made a Type $I$ error (rejecting the null when the null is true). We think that skull breadths have increased over the period but they have not

\[\overline{x}_1-\overline{x}_2\pm t^{\star}(se)\]
$t^{\star}=t_{CL=95\%,df=29}=2.0452296$

\[135.633-131.367\pm (2.045)(1.192)=4.266\pm 2.438=1.83,6.7\]
With 95% confidence, the true difference in mean skull breadths of 200 BCE and 4000 BCE is between 1.83 and 6.7 mm

2 Independent Proportions

A tracking beacon used for enabling robots to home in on a beacon that produces an audio signal, is said to be fine-tuned if the probability of correct identification of the direction of the beacon is the same for each side (left and right) of the tracking device. In a random sample, out of 100 signals from the left, the device identifies the direction correctly 85 times. Of the 100 signals from the right, the device identifies the direction correctly 87 times.

Is there sufficient evidence that the proportion of signals differs between the left and right sides of the tracking device? Conduct hypothesis test and let $\alpha=0.10$
Estimate the true difference of proportions with 90% confidence and interpret

      Correct id Incorrect id Total
Left          85           15   100
Right         87           13   100
Total        172           28   200

Barplot of Correct IDs

Hypotheses, assumptions \[H_0: p_1=p_2~~H_a: p_1\ne p_2\]
Assumptions: randomization (yes), independence (yes because random), normality ($n_1=n_2=100\geq60$)
Test statistic: $z=\frac{\hat p_1-\hat p_2}{se}$ with $se=\sqrt{\frac{\hat p_1\hat q_1}{n_1}+\frac{\hat p_2\hat q_2}{n_2}}=\sqrt{\frac{0.87(0.13)}{100}+\frac{0.85(0.15)}{100}}=0.0491$ and $z=\frac{0.87-0.85}{0.0491}=0.4077\approx 0.41$
Rejection region: Reject $H_0$ if $pvalue\le\alpha~(0.10)$ with $pvalue=2P(Z>z_{calc})=2P(Z>0.41)=2(0.3409)=0.6818$
Results and conclusion: $pvalue=0.6818\nleq\alpha(0.10) \therefore~H_0$ cannot be rejected. There is not a significant difference of correct identifications of the direction of the beacon between the right and left sides of the tracking device. There is no difference between the sides.
Error: since $H_0$ was not rejected, a Type $II$ error (not rejecting null when null is false) could have been made; we think there is no difference between the left and right sides of the tracking device when there is a difference

\[\hat p_1-\hat p_2\pm z^{\star}(se)\] with \[se=\sqrt{\frac{\hat p_1\hat q_1}{n_1}+\frac{\hat p_2\hat q_2}{n_2}}\]
\[0.87-0.85\pm (1.645)(0.0491)=0.02\pm 0.0807=-0.0607,0.1007\] The CI: $(-0.0607,0.1007)=(-6.07\%,10.07\%)$

With 90% confidence, the true difference in proportions of the sides correctly identifying the direction of the signal is between -6.07% and 10.07%. Since the CI includes 0, we say that there is no significant difference between the two sides

Dependent Means

A car dealer decided to compare the mean monthly sales of two salespersons, A and B. Because the strength of sales varies with season and with people’s opinions about the economy, the car dealer decided to take a random sample from Salespersons A and B to make the comparison on a monthly basis. The data given has the monthly sales (to the nearest thousand dollars) for the two salespersons.

Is there sufficient evidence the true mean difference in sales between Salesperson A and Salesperson B is significantly different? Let $\alpha=0.01$
Estimate the true mean difference with 99% confidence and interpret

Boxplot of Difference of Salepersons A and B

        [,1]
n     12.000
xbard 15.667
sD    10.924

Hypotheses, assumptions $H_0: \mu_D=0~~H_a: \mu_D\ne0$ Assumptions: randomization (yes), independence of units/subjects (yes because random), differences have approximate normal distribution (boxplots are ok), dependence of two measurements per unit/subject (yes)
Test statistic, df $t=\frac{\overline{x}_d}{se}$ with $se=\frac{s_d}{\sqrt n}=\frac{10.924}{\sqrt{12}}=3.153$. $t=\frac{15.667}{3.153}=4.968$, $df=n-1=12-1=11$
Rejection region: reject $H_0$ if $pvalue\le\alpha~(0.01)$ with $pvalue=2P(T>t_{calc,df})=2P(T>|4.9682|)=2(pvalue<0.001)=pvalue<0.002$
Results and conclusion: Since $pvalue<0.002$ and is $\leq\alpha(0.01) \therefore~H_0$ is rejected. The true mean difference in sales between A and B is significantly different.
Error: since $H_0$ was rejected, a Type $I$ error (reject null when null is true) could have been made; we think the mean difference in sales significantly differs between salespersons when it does not

\[\overline{x}_d\pm t^{\star}(se)\] with \[se=\frac{s_d}{\sqrt n}\]
\[15.667\pm (3.106)(3.153)=15.667\pm 9.793=5.874, 25.46\]

The CI: $(5.87,25.46)$

With 99% confidence, the true mean difference in sales between Salespersons A and B is between $6,000 and $25,000.

Alternative way to interpret: With 99% confidence, the sales from A are between $6,000 and $25,000 higher than Salesperson B

Analyses with `R`

Each example will be done again with R output

Egypt Analysis with `R` Output

When doing one-tailed tests with software, the CIs are not the ones we want so a separate analysis is to be done to acquire proper CIs when doing one-tail tests (upper or lower)


    Welch Two Sample t-test

data:  breadth by era
t = 3.5797, df = 54.973, p-value = 0.000364
alternative hypothesis: true difference in means between group 200BCE and group 4000BCE is greater than 0
95 percent confidence interval:
 2.27257     Inf
sample estimates:
 mean in group 200BCE mean in group 4000BCE 
             135.6333              131.3667

Test statistic $t=3.579$, $df=54.973$, $pvalue=0.000364$

Results: $pvalue=0.000364\leq\alpha(0.05) \therefore$ (therefore) $H_0$ is rejected

Conclusion: since the null is rejected, that means that there is evidence that the skull breadths have significantly increased over the period from 4000 BCE to 200 BCE

Error: since $H_0$ was rejected, a Type $I$ error (reject null when null is true) could have been made; we think the the skull breadths have increased but they did not


    Welch Two Sample t-test

data:  breadth by era
t = 3.5797, df = 54.973, p-value = 0.000728
alternative hypothesis: true difference in means between group 200BCE and group 4000BCE is not equal to 0
95 percent confidence interval:
 1.878030 6.655303
sample estimates:
 mean in group 200BCE mean in group 4000BCE 
             135.6333              131.3667

The CI: $(1.878030,6.655303)\approx(1.88,6.66)$

With 95% confidence, the true difference in mean skull breadths of Egyptian males from 4000 BCE to 200 BCE is 1.88 to 6.66 mm.

Alternative way to interpret: With 95% confidence, mean skull breadths of Egyptian males have increased from 4000 BCE to 200 BCE, 200 BCE skulls are 1.88 to 6.66 mm larger than the 4000 BCE skulls, indicating that immigrating populations did interbreed with the native Egyptians.

Robot analysis

      Correct id Incorrect id Total
Left          85           15   100
Right         87           13   100
Total        172           28   200

\[H_0: \pi_1=\pi_2~~H_a: \pi_1\ne\pi_2\]

(or $H_0: \pi_1-\pi_2=0~~H_a: \pi_1-\pi_2\ne0$)
Assumptions:
(1) Independence: random so yes
(2) Randomization: yes
(3) Normality: $n_1=n_2=100\geq60$

Organization of information:
$n_1=100$
$n_2=100$
$H_a:~\ne$ (two tail test)
$\alpha=0.10$ (because specifically stated as 10%)

Robot Analysis Output

side
 Left Right 
   85    87

[1] 100 100

[1] 0.85 0.87

             [,1]
se     0.04905099
zcalc  0.40773893
pvalue 0.68346535

                  [,1]
diff.pihat  0.02000000
zstar       1.64485363
bound       0.08068171
lower      -0.06068171
upper       0.10068171

Test statistic $z=0.41$, $pvalue=0.6835$

Results: $pvalue=0.6835\nleq\alpha(0.10) \therefore$ (therefore) $H_0$ cannot be rejected

Conclusion: since the null is not rejected, that means that there is not a significant difference of correct identifications of the direction of the beacon between the right and left sides of the tracking device. There is no difference between the sides.

Error: since $H_0$ was not rejected, a Type $II$ error (not rejecting null when null is false) could have been made; we think there is no difference between the left and right sides of the tracking device when there is a difference

The CI: $(-0.0607,0.1007)=(-6.07\%,10.07\%)$

Car sales

Is there sufficient evidence the true mean difference in sales between Salesperson A and Salesperson B is significantly different? Let $\alpha=0.01$
Estimate the true mean difference with 99% confidence and interpret

           [,1]
xbar.d 15.66667
s.d    10.92398
n      12.00000

By hand

Organization of information:
$n=12$ (12 months)
$H_a:~\ne$ (two-tail test)
$\alpha=0.01$ (specifically stated)

Hypotheses, assumptions \[H_0: \mu_D=0~~H_a: \mu_D\ne0\]
Assumptions: Randomization: yes, Independence (of units/subjects): random met so yes, Differences have approximate normal distribution (boxplots are ok), Two measurements per unit/subject: yes
Test statistic $t=\frac{\overline{X}_d}{se}$, $se=\frac{s_d}{\sqrt{n}}=\frac{10.92}{\sqrt{12}}=3.15$ and so $t=\frac{15.67}{3.15}=4.975$
Rejection region (pvalue) Reject $H_0$ if $pvalue\le\alpha~~(0.01)$ with $pvalue=2P(T>t)=2P(T>4.975_{df=11})$: using the t-table with $df=n-1=12-1=11$ shows the $pvalue<0.001$ (from the table)
Results, conclusion Results: Since $0.001\le\alpha~~(0.01)$, $H_0$ will be rejected. There is a significant difference in mean monthly sales between salespersons A and B
Error A Type $I$ error could have been made: We think there is a significant difference in mean monthly sales between salespersons A and B but there is not

CI:
\[\overline{X}_d \pm t^{\star}(se)\]

Find $t^{\star}$: $df=11$, confidence level$=99\%$ so $t^{\star}=3.106$

\[15.67\pm(3.106)(3.15)=15.67\pm 9.78=5.89, 25.45\]

with Software (R) output


    Paired t-test

data:  A and B
t = 4.9681, df = 11, p-value = 0.0004234
alternative hypothesis: true mean difference is not equal to 0
99 percent confidence interval:
  5.872564 25.460770
sample estimates:
mean difference 
       15.66667

\[H_0: \mu_D=0~~H_a: \mu_D\ne0\]
Assumptions:
(1) Randomization: yes
(2) Independence (of units/subjects): random met so yes
(3) Differences have approximate normal distribution (boxplots are ok)
(4) Two measurements per unit/subject: yes

Organization of information:
$n=12$ (12 months)
$H_a:~\ne$ (two-tail test)
$\alpha=0.01$ (specifically stated)

Test statistic $t=4.9681$, $df=11$, $pvalue=0.0004234$

Results: $pvalue=0.0004234\leq\alpha(0.01) \therefore$ (therefore) $H_0$ is rejected

Conclusion: since the null is rejected, that means the true mean difference in sales between A and B is significantly different.

Error: since $H_0$ was rejected, a Type I error (reject null when null is true) could have been made; we think the mean difference in sales significantly differs between salespersons when it does not

The CI: $(5.87,25.46)$

With 99% confidence, the true mean difference in sales between Salespersons A and B is between $6,000 and $25,000.

Alternative way to interpret: With 99% confidence, the sales from A are between $6,000 and $25,000 higher than Salesperson B

In practice, the difference of means can be hypothesized to be equal to a value other than zero↩︎
In practice, the mean difference can be hypothesized to be equal to a value other than zero↩︎
In practice, the difference of proportions can be hypothesized to be equal to a value other than zero↩︎

2-sample Methods

Stat 2510: Statistical Methods

2025

Two-Sample Analyses

Comparing two groups

Independent Means Formula: \(df\) for use of \(t\) and CI

Dependent means

Two Proportions

Analyses

2 Independent Means

2 Independent Proportions

Dependent Means

Analyses with `R`

Egypt Analysis with `R` Output

Robot analysis

Car sales

By hand

with Software (R) output

2-sample Methods

Stat 2510: Statistical Methods

2025

Two-Sample Analyses

Comparing two groups

Independent Means Formula: \(df\) for use of \(t\) and CI

Dependent means

Two Proportions

Analyses

2 Independent Means

2 Independent Proportions

Dependent Means

Analyses with R

Egypt Analysis with R Output

Robot analysis

Car sales

By hand

with Software (R) output

Analyses with `R`

Egypt Analysis with `R` Output