This tutorial introduces the fundamental statistical tests: the T-test. You will learn its purposes, how to apply them, and how to interpret their results using R. These skills are crucial for analyzing data in various fields, including environmental and ecological sciences.
The link for this tutorial is this GitHub link. You can get all of the resources for this tutorial from this GitHub repository. Clone and download the repo as a zip file, then unzip it.
It’s completely okay if you’ve never used R before—you can walk through everything step by step with us! Everyone starts as a beginner, and we’re here to guide you through the process and make it as smooth as possible. Together, we’ll cover the basics and get you comfortable with using R in no time. A great resource to guide you through this process is the Coding Club tutorial Getting Started with R and RStudio. While you’re at it, take a look at their Troubleshooting and How to Find Help tutorial, and the Coding Etiquette guide, which offers excellent tips for navigating the coding community.
Now, let’s get started!
First, open RStudio
, create a new script by
clicking on File/New File/R Script
. A Script in R is a file
where you can write and save code to run, edit, and reuse later. Name
your script appropriately, so that it clearly reflects its purpose and
makes it easy to identify later. It is always a good idea the write a
header to your script with your name, data and purpose as shown
below.
# Title: Intro of T-test and Chi-squared test in R
# Script purpose: Use T-test and Chi-squared test to investigate questions on dataset iris
# Author - contact details
# Date
Next, set your working directory to the folder containing the unzipped files on your computer.
# Set the working directory
setwd("your_filepath")
getwd() # Run this to check where your working directory is
Then, load the required packages and dataset, installing them first if they are not already available.
# load packages
library(ggplot2)
library(reshape2)
library(dplyr)
# Load the iris dataset
data(iris)
head(iris) # View the first few rows
summary(iris) # Summary of the dataset
After having an idea on how to deal with R, let’s now start to learn T-test.
A t test is a statistical test that is used to compare the means of two groups. It is often used in hypothesis testing to determine whether a process or treatment actually has an effect on the population of interest, or whether two groups are different from one another. For instance:
“Do students in online classes score higher than those in traditional classes?”
“Are the differences in exam scores between male and female students significant?”
These tests provide robust frameworks to evaluate hypotheses and are widely applied across various fields, including biology, psychology, and business analytics.
The t test is a parametric test of difference, meaning that it makes the same assumptions about your data as other parametric tests. The t test assumes your data:
Independence: The observations in one sample are independent of the observations in the other sample.
Normality: Both samples are approximately normally distributed.
Homogeneity of Variances: Both samples have approximately the same variance.
Random Sampling: Both samples were obtained using a random sampling method.
Note that a t test can only be used when comparing the means of two groups (a.k.a. pairwise comparison). If you want to compare more than two groups, or if you want to do multiple pairwise comparisons, use an ANOVA test or a post-hoc test.
t-tests are appropriate in the following scenarios:
Numerical Data: When the data is continuous (e.g., height, weight, or length measurements).
Comparing Means: When the goal is to compare the means of one or more groups.
By understanding these scenarios, it becomes easier to select the appropriate test for the analysis, ensuring valid and interpretable results.
Hypothesis testing is a basic idea in statistics that helps us decide if we have enough evidence to support a claim about a group or population. It has a few important parts:
Null and Alternative Hypotheses
Null Hypothesis (\(H_0\)): This is like the default assumption, saying nothing special is happening. For example, “There’s no difference between two groups.”
Alternative Hypothesis (\(H_a\)): This is the claim we’re testing, suggesting something is happening, like “The two groups are different.”
The p-value tells us how likely it is to see our results (or something even more surprising) if the null hypothesis is true.
The significance level (\(\alpha\)) is a cutoff point we choose (often 0.05). If the p-value is smaller than \(\alpha\), it means the results are unlikely under the null hypothesis, so we reject it.
For a one-tailed test, the p-value is calculated for results in one direction of interest (e.g., greater than or less than a certain value). The entire \(\alpha\) is concentrated in one tail of the distribution.
For a two-tailed test, the p-value accounts for extreme results in both directions, and \(\alpha\) is split equally between the two tails (e.g., \(0.025\) in each tail for \(\alpha = 0.05\)).
Type I Error: This happens when we think something is happening (reject \(H_0\)) but actually, nothing is (false alarm).
Type II Error: This happens when we don’t notice something is happening (fail to reject \(H_0\)) even though it is (missed signal).
Hypothesis testing is useful because it helps us figure out whether the patterns we see in data are real or just random chance.
This might seems a lot to understand. Don’t worry, we will go through this together step by step in latter sections.
Before we dive into the different cases of T-test, we need to first understand what dataset we are using.
iris
Dataset?The iris
dataset is a collection of flower measurements
from three types of iris flowers: setosa
,
versicolor
, and virginica
. There are 150 rows
in total (50 flowers of each species). The dataset has these
columns:
setosa
, versicolor
, or
virginica
).There are three types of t-test: One-sample t-test, Two-sample t-test, and Paired t-test. Their formulas are:
The formula for the one-sample \(t\)-test is:
\[ t = \frac{\bar{x} - \mu_0}{\frac{s}{\sqrt{n}}} \]
Where:
\(\bar{x}\) = sample mean
\(\mu_0\) = hypothesized population mean
\(s\) = sample standard deviation
\(n\) = sample size
Student’s t-test
The formula for the independent two-sample \(t\)-test is:
\[ t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{s^2 \left( \frac{1}{n_1} + \frac{1}{n_2} \right)}} \]
Where:
\(\bar{x}_1, \bar{x}_2\) = sample means of the two groups
\(s^2\) = pooled variance: \[ s^2 = \frac{(n_1 - 1)s_1^2 + (n_2 - 1)s_2^2}{n_1 + n_2 - 2} \]
\(n_1, n_2\) = sample sizes of the two groups
\(s_1^2, s_2^2\) = sample variances of the two groups
Welch’s \(t\)-test
When the assumption of equal variances is not met, Welch’s \(t\)-test should be used. The formula for Welch’s \(t\)-test is:
\[ t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}} \]
Where:
Key Differences:
Welch’s \(t\)-test is more robust and is generally preferred when the two groups have unequal variances or significantly different sample sizes.
The formula for the paired \(t\)-test is:
\[ t = \frac{\bar{d}}{\frac{s_d}{\sqrt{n}}} \]
Where:
\(\bar{d}\) = mean of the differences between paired observations
\(s_d\) = standard deviation of the differences
\(n\) = number of paired observations
A larger \(t\)-value indicates that the difference between group means is greater than the pooled standard error, showing a more significant difference between the groups.
To determine significance:
Compare your calculated \(t\)-value against a critical value chart.
If the \(t\)-value exceeds the critical value (based on your significance level \(\alpha\) and degrees of freedom), you can reject the null hypothesis and conclude that the two groups are significantly different.
You might come up with a question: which test should I use?
So, here is how you find a suitable t-test:
If the groups come from a single population (e.g., measuring before and after an experimental treatment), perform a paired t test. This is a within-subjects design.
If the groups come from two different populations (e.g., two different species, or people from two separate cities), perform a two-sample t test (a.k.a. independent t test). This is a between-subjects design.
If there is one group being compared against a standard value (e.g., comparing the acidity of a liquid to a neutral pH of 7), perform a one-sample t test.
Now, let’s apply what we have learned into application!
This tests if the mean of a single group is different from a specific value.
Hypotheses:
Null Hypothesis (\(H_0\)): The mean sepal length is equal to 5.8.
Alternative Hypothesis (\(H_a\)): The mean sepal length is not equal to 5.8.
Let’s check if the average sepal length of all flowers in the
iris
dataset is significantly different from 5.8 (a
hypothetical value).
# Test if the mean Sepal.Length is significantly different from 5.8
t.test(iris$Sepal.Length, mu = 5.8)
The output shows that:
Since hypothesis testing primarily relies on the p-value, we will focus on analyzing the p-value.
p-value: \(p = 0.5226\)
This is much greater than 0.05, so we fail to reject the null hypothesis.
Then, we get the conclusion:
The average sepal length in the iris
dataset is not
significantly different from 5.8. Any small difference is likely due to
random chance.
Does that build you some confidence? Now, let’s look at the next test.
This test compares the averages of two separate groups to see if they’re different.
In our example, we could write our hypothesis as:
Hypotheses:
Null Hypothesis (\(H_0\)): The mean sepal lengths of
setosa
and versicolor
are equal.
Alternative Hypothesis (\(H_a\)): The mean sepal lengths of
setosa
and versicolor
are not equal.
# Filter the data for two species
setosa <- subset(iris, Species == "setosa")
versicolor <- subset(iris, Species == "versicolor")
# Perform an independent t-test
t.test(setosa$Sepal.Length, versicolor$Sepal.Length)
The output shows that:
p-value: \(p < 2.2e-16\)
This is much smaller than 0.05, so we reject the null hypothesis.
Then, we can conclude from our result that, the mean sepal length of
setosa
and versicolor
are not equal.
Now, let’s learn one more method.
This test is used to compare two related sets of measurements, like “before and after” scenarios.
In our example, we could write our hypothesis as:
Hypotheses:
Null Hypothesis (\(H_0\)): The mean of
Sepal.Length
is equal to the mean of
Petal.Length
for setosa
.
(Or, we could also say that there is no significant difference between
the two measurements.)
Alternative Hypothesis (\(H_a\)): The mean of
Sepal.Length
is not equal to the mean of
Petal.Length
for setosa
.
(Or, we could also say that there is a significant difference between
the two measurements.)
# Compare Sepal.Length and Petal.Length within the same species ("setosa")
setosa_data <- subset(iris, Species == "setosa")
t.test(setosa_data$Sepal.Length, setosa_data$Petal.Length, paired = TRUE)
The output shows that:
p-value: \(p < 2.2e-16\)
This is much smaller than 0.05, so we reject the null hypothesis.
Therefore, we can conclude from our result that, the mean of
Sepal.Length
is not equal to the mean of
Petal.Length
for setosa
.
To complement the t-test analysis, we’ll use visualisations to better understand the data and the relationships between groups. While the t-test provides numerical evidence for differences between means, visualising the data helps us grasp the distribution, variability, and potential outliers. In this section, we’ll use different plots to illustrate the results of both one-sample and independent t-tests in an accessible and intuitive way.
Why this plot?
This plot helps us compare the distribution of
Sepal.Length
with the hypothesized mean (\(mu = 5.8\)) in a one-sample t-test. It also
lets us visually assess whether the data looks approximately normal,
which is an important assumption for the t-test.
How should we create it?
We use a histogram to show the distribution of
Sepal.Length
and overlay a density curve to highlight the
shape of the data. A dashed red line marks the hypothesized mean so we
can see how it compares to the data.
ggplot(iris, aes(x = Sepal.Length)) +
geom_histogram(aes(y = ..density..), bins = 30, fill = "skyblue", alpha = 0.7, color = "black") +
geom_density(color = "blue", size = 1) +
geom_vline(xintercept = 5.8, linetype = "dashed", color = "red", size = 1) +
labs(title = "Sepal Length Distribution with Hypothesized Mean",
x = "Sepal Length",
y = "Density") +
theme_minimal()
The output is shown as:
What can we see?
The distribution of Sepal.Length
appears roughly normal,
with no major skewness or irregularities. This suggests that the data
does not violate the assumption of normality, which supports the use of
the t-test. The dashed line at (\(mu =
5.8\)) lies close to the center of the distribution, hinting that
the mean of Sepal.Length
might not be significantly
different from the hypothesized mean. However, the t-test confirms this
numerically.
Why this plot?
We’re comparing Sepal.Length
between two species:
setosa
and versicolor
. A boxplot is great for
this because it shows the range, median, and variability of each group.
It’s a clear way to check if the groups are different.
How should we make it?
We will filter the data for setosa
and
versicolor
. Then, we can create a boxplot and added dots
for group means (those black points).
ggplot(iris %>% filter(Species %in% c("setosa", "versicolor")),
aes(x = Species, y = Sepal.Length, fill = Species)) +
geom_boxplot(alpha = 0.7) +
stat_summary(fun = mean, geom = "point", shape = 23, size = 3, color = "black", fill = "white") +
theme_minimal() +
labs(title = "Comparison of Sepal Length between Setosa and Versicolor",
x = "Species",
y = "Sepal Length")
What can we see?
If the boxes for setosa
and versicolor
overlap a lot, their means might not be very different. If they are
clearly separated, it suggests the groups have different averages. The
graph clearly shows the latter, with a significant difference between
mean of setosa
and versicolor
. This aligns
with our outcome for two sample test.
Why this plot?
The density plot shows the distribution shape of
Sepal.Length
for setosa
and
versicolor
. It’s helpful to see if the groups are really
different or if their ranges overlap a lot.
How should we make it?
We will first filter the data for the two species. We will then plot
the density of Sepal.Length
for each group and added dashed
lines to mark the group means.
ggplot(iris %>% filter(Species %in% c("setosa", "versicolor")),
aes(x = Sepal.Length, fill = Species)) +
geom_density(alpha = 0.5) +
geom_vline(data = iris %>% filter(Species %in% c("setosa", "versicolor")) %>%
group_by(Species) %>%
summarise(mean = mean(Sepal.Length)),
aes(xintercept = mean, color = Species),
linetype = "dashed", size = 1) +
theme_minimal() +
labs(title = "Density Plot of Sepal Length by Species",
x = "Sepal Length",
y = "Density")
What can we see?
This density plot compares the Sepal.Length
distributions for setosa
(red) and versicolor
(blue). The dashed lines represent the means of each group, showing that
versicolor
has a higher mean compared to
setosa
. The distinct peaks and minimal overlap between the
curves suggest a noticeable difference in Sepal.Length
between the two species, with setosa
having more
concentrated values around 5, while versicolor
shows a
broader distribution around 6. This visualization supports the use of a
t-test to assess the statistical significance of the observed mean
difference.
If you are interested in exploring more data visualisation methods, or creating a more beautiful graph, you could find helpful tutorials here Useful links.
Throughout this tutorial, we explored the fundamentals of the
t-test, its types, and their respective applications in
hypothesis testing. By applying the t-test in R, we not only validated
statistical differences but also enhanced our interpretation with
visualizations. The examples from the iris
dataset
demonstrate how to seamlessly integrate statistical tests into
real-world data analysis workflows.
To further solidify your understanding: - Experiment with other datasets to practice t-tests. - Explore more complex statistical methods like ANOVA for comparing multiple groups or post-hoc tests for pairwise comparisons. - Dive deeper into R’s data visualization capabilities to create compelling and informative plots.
Remember, t-tests are a cornerstone of statistical analysis, and mastering them lays the foundation for more advanced techniques. If you have any questions or feedback about this tutorial, don’t hesitate to reach out to us.
Happy coding and analyzing!
Check out our Useful links page where you can find loads of guides and cheatsheets.
If you have any questions about completing this tutorial, please contact us on ourcodingclub@gmail.com
We would love to hear your feedback on the tutorial, whether you did it in the classroom or online!