Coding 4 conservation week 2 : Simple statistics

Welcome to week two! Today we will be using Rmarkdown to execute simple statistics, this module could be an entire class, but we are going to briefly review the topics below in a day:

Overview:

    1. Basic helpful functions in R 
    2. Testing assumptions 
    3. Comparing means 
        3.1 T-tests
        3.2 Non-parametric test
        3.3 Anova

What is Rmarkdown? Rmarkdown is a file type that allows users to save/execute their code into shareable reports. Read more about Rmarkdown with this tutorial

Part 1: Basic helpful functions in R

R basics

variable <- function(argument)

List of useful functions to create and explore data structure

Basic mathematical functions for summary statistics

+ `sum()`

+ `mean()`

+ `min()`

+ `max()`

+ `sd()`

+ `median()`

+ `summary()`

Functions to create & explore data structure

+ `c()` - concatenate function
+ `:` - colon operator for sequence generation
+ `seq()` - sequence generation
+ `rep()` - replicate elements of vectors and lists
+ `View()` - invoke a data viewer
+ `length()` - length of an object
+ `class()` - object class
+ `head()` - return the first 6 rows of an object
+ `tail()` - return the last last 6 rows an object
+ `dim()` - returns the dimensions of an object 
+ `nrow()` - number of rows
+ `ncol()` - number of columns
+ `str()` - display the structure of each column of an object
+ `names()` - display the names of an object, typically the column names
+ `nlevels()` - categorial levels within an object
+ `cbind()` - combine objects by columns
+ `rbind()` - combine objects by rows

Exercises

Let’s learn about our dataframe using some if the functions we learned last week

#view the top 6 rows 
head(c4c)
##     Id Sex Forearm Weight   Age       Date Prev Para   Size
## 1 fb_1   f   64.02  38.58 1.000 2015-01-11    0    0 Medium
## 2 fb_2   f   65.12  39.59 1.478 2015-01-12    1   38 Medium
## 3 fb_3   f   65.41  40.42 1.570 2015-01-13    0    0 Medium
## 4 fb_4   f   66.08  42.23 1.578 2015-01-14    0    0 Medium
## 5 fb_5   f   66.09  42.63 1.598 2015-01-15    0    0 Medium
## 6 fb_6   f   66.32  43.67 1.610 2015-01-16    0    0 Medium
#structure
str(c4c)
## 'data.frame':    1200 obs. of  9 variables:
##  $ Id     : chr  "fb_1" "fb_2" "fb_3" "fb_4" ...
##  $ Sex    : chr  "f" "f" "f" "f" ...
##  $ Forearm: num  64 65.1 65.4 66.1 66.1 ...
##  $ Weight : num  38.6 39.6 40.4 42.2 42.6 ...
##  $ Age    : num  1 1.48 1.57 1.58 1.6 ...
##  $ Date   : chr  "2015-01-11" "2015-01-12" "2015-01-13" "2015-01-14" ...
##  $ Prev   : int  0 1 0 0 0 0 0 0 0 0 ...
##  $ Para   : int  0 38 0 0 0 0 0 0 0 0 ...
##  $ Size   : chr  "Medium" "Medium" "Medium" "Medium" ...
#lets correct the variables (review from last week)

c4c$Sex <- as.factor(c4c$Sex)
c4c$Size<- as.factor(c4c$Size)
c4c$Weight<- as.numeric(c4c$Weight)

It looks like there are N/As in our weight column/variable in our dataset. Let’s remove the N/A values from that columns

#use is.na() method
c4c <- c4c[!is.na(c4c$Weight),]

#use subset() method
c4c <- subset(c4c, !is.na(Weight))

#use tidyverse method

c4c <- c4c %>% 
  drop_na(Weight)
# basic stats - lets get basic stats for the weight variable 

var(c4c$Weight)
## [1] 287.5528
mean(c4c$Weight)
## [1] 57.03703
max(c4c$Weight)
## [1] 87.52
min(c4c$Weight)
## [1] 14.2
sd(c4c$Weight)
## [1] 16.95738
median(c4c$Weight)
## [1] 58.27
range(c4c$Weight)
## [1] 14.20 87.52
quantile(c4c$Weight)
##    0%   25%   50%   75%  100% 
## 14.20 48.99 58.27 63.52 87.52
# or just run summary()
summary(c4c$Weight)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   14.20   48.99   58.27   57.04   63.52   87.52

Useful functions for plotting in base R. Here are a list of useful functions for making plots in R. By using the ? in R you can figure out the details of these different plotting functions.

  1. plot() : Plot x and y against each other
  2. hist() : Make a histogram of your data
  3. scatterplot() : Make a scatter plot of x and y
  4. abline() : Add a straight line to your current plot
  5. lines() : Add more lines/data to your current plot
  6. curve() : Plot a function over some range
  7. boxplot() : Make a box plot to compare different groups of your data 8. densityPlot() : Fit a density kernel to your data
  8. pie() : Make a pie chart
  9. barplot() : Make a bar graph in R 11. mosaicplot() : Make a mosaic plot

Part 2: Testing assumptions

1. Violations of Assumptions

One of our critical assumptions for one-sample and two-sample t-tests is that the distribution of the sample means is normal. We have learned that if the distribution of our sample is normal we are guaranteed that the distribution of our sample means will be normal. Additionally, we have learned that the Central Limit Theorem tells us that if our sample size is big enough, the distribution of our sample means will be normal even if the distribution of our sample (or the population from which we drew this sample) is non-normal. However, it is often the case in ecology that our samples are very non-normal and the Central Limit Theorem does not apply. This may be because the data are strongly skewed or because there are outliers in the data. In this lab you will learn how to deal with non-normal data and data with extreme outliers.

2. Testing for Normality

We first have to check whether our data are normal using the following qualitative and quantitative methods.

  • Histogram: If the histogram is not “bell-shaped” this is your first indication that your data may be non-normal.

  • QQplot: If all data do not stay within the confidence band your data may be non-normal.

  • Shapiro-Wilk Test: A quantitative test that tells you whether your data are normal or not. \(H_0\): Data are normal, \(H_A\): Data are non-normal. If p < 0.05, reject your \(H_0\) and conclude that the data are not normal; if p > 0.05, “conclude” that your data are normal. Remember not to count on the results from this test alone, and always to visually examine a QQplot or histogram.

Checking normality with residuals

This allows us to check normality of more complicated relationships rather than just checking one variable at a time.

3. Checking for outliers

Outliers can drastically influence the results of t-tests and other parametric tests. It is always import to identify outliers before you begin your analysis. While there are formal methods for determining what constitutes an outlier, a simple method is to make a boxplot of your data. A boxplot will identify any points that do not lie within a given range above or below the median of your data. This range can vary depending on how you build your boxplots, but typically the width of the boxplots is related to the first and third quartiles of your data. In R, the default range is 1.5 times the distance between first and third quartiles.

While sometimes outliers can be erroneous values (i.e. you measured something incorrectly) they often tell you something important about the population you are measuring. Therefore, you cannot drop an outlier from your analysis without having good reason to do so. Good reasons could include measurement error or an underlying difference between that individual and the rest of the population. Dropping an outlier because it makes your life and the analysis easier is not a good reason…though we all wish it was. However, it is good practice to run your analysis with and without your outlier to determine if/how the outlier changes your results.

4. Transformations

When you transform your data you apply some function (often non-linear) to your data. Wait… why the heck can you do this? If you think about it, the scale that we use to measure things is completely arbitrary (think about Celsius vs Fahrenheit). We could just as easily measure things in a base 2 system and there would, in principal, be nothing wrong with that. If this helps, you can imagine that by transforming your data you are moving to an alien world where they measure everything on a different scale (e.g. on a log scale). Here are some of the reasons why we transform our data:

  • To make the data more normal
  • To make our variances more similar between groups
  • To reduce the influence of outliers
  • To make our data linear (we will see this in linear regression)

4.1 Choosing a transformation

Here are some common transformations and the scenarios in which you may use them. The variable Y represents your original data and Y’ represents your transformed data.

  • Consider log transformation: \(Y' = \ln (Y)\) if…
    • The measurements are ratios or products of variables
    • The frequency distribution is right skewed
    • The group having the larger mean also has the higher standard deviation
    • The data span several orders of magnitude
    • Note: If you have 0’s in your data, you can do \(Y' = \log{(Y + 1)}\)
    • In R, you can use the expression log(Y) to transform the data Y (remember that log(Y) gives the natural logarithm of Y)
  • Consider Square-root transformation: \(Y' = \sqrt{Y}\) if…
    • The measurements are count data
    • The frequency distribution is right skewed
    • Note: if you have negative numbers, you can’t take the square root. You should add a constant to each number to make them all positive.
    • In R, you can use the expression sqrt(Y) to transform the data Y
  • Square transformations and antilog transformations: \(Y' = Y^2\) and \(Y' = \exp Y\)
    • Use when your data are left skewed
    • In R, you can use the expression Y^2 or exp(Y) to square transform or antilog transform the data Y, respectively.
  • Arcsine transformation: \(Y' = \arcsin \sqrt Y\)
    • Useful when your data are proportions
    • Note: Make sure your proportion data is represented as decimals (i.e. divide by 100)
    • In R, you can use the expression asin(sqrt(Y)) to square-root arcsine transform the proportion data Y

4.2 After you transform your data

Transformations are not magic. After you have transformed your data you must retest all of your assumptions. Moreover, transforming your data inherently changes your null hypothesis. For example, if I log transformed my number of flowers variable and then tested whether the means were different between the edge and the interior of a habitat patch, my \(H_0\) becomes that there is no difference in the log-mean.

Any conclusions you make about your data have to be in terms of the transformation. For example, you would conclude that the log number of flowers on the edge was significantly more than the log number of flowers in the interior.

If a transformation does not fix your problem (e.g. make your data more normal, reduce the heterogeneity of variance, etc.) you cannot proceed with your parametric analysis. Fortunately, there are some non-parametric options that you can consider. Remember that non-parametric does not assume that the data have any kind of distribution.

Determine normality of data

To know what normal (gaussian) data should look like, we will generate a dataframe with a known normal distribution

# Generate randomly sampled data to visualize
norm_samp = rnorm(1000, mean=21 , sd=3) #normal distribution 
exp_samp = rexp(1000) #exponential distribution 

# Visualization with a Histogram
hist(norm_samp,
     breaks=20, #20 bins is more practical, even though our sample size is 10x bigger. You get a sense of how many fall within bins that have a width that makes sense. This normal variable could be "age" for example 
     col='darkgray',
     ylab="Frequency",
     main="Histogram of Normally Distributed Data") 

hist(exp_samp,
     breaks=20, #20 bins is more practical, even though our sample size is 10x bigger. You get a sense of how many fall within bins that have a width that makes sense. This normal variable could be "age" for example 
     col='darkgray',
     ylab="Frequency",
     main="Histogram of Non-Normally Distributed Data") 

The norm_samp data looks normally distributed The exp_samp data does not look normally distributed

Visualize with qqPlots

# Visualization with a qqPlot

# Now we’ll generate a qqplot using qqPlot function in car package.
# Normal data
qqPlot(norm_samp, main = "qqPlot of Normally Distributed Data") #qqplot function

## [1] 464 313
# Exponential data
qqPlot(exp_samp, main = "qqPlot of Non-Normally Distributed Data") #qqplot function

## [1] 802 977

For a qqplot, if all of your data points fall in between the dashed confidence bands you can be pretty confident that your data follow a normal distribution.

For norm_samp, most data points (circles) fall within the confidence bands, so data is normally distributed. Although you can see data point 199 and data point 191 are close to being outliers (i.e. fall outside of dashed confident lines)

For exp_samp, most data points (circles) fall outside the confidence bands, so data is not normally distributed. There are clear outlier data such as data points 268 and 744.

3.2 Quantitatively with Shapiro-Wilk Test for Normality

The null hypothesis of this test is that your data are normally distributed. The alternative hypothesis of this test is that your data are not normally distributed. Therefore, if this test gives us a p-value that is non-significant (i.e. p > 0.05) we can be reasonably confident that our data are normal.

p < 0.05 you can reject the null, your variable is not normally distributed p > 0.05 you fail to reject the null, your variable is normally distributed

Example 1. Let’s use the Shapiro-Wilk test to see if norm_samp is a normal distribution:

# EXAMPLE 1

shapiro.test(norm_samp) #run a shapiro-wilk test on norm_samp
## 
##  Shapiro-Wilk normality test
## 
## data:  norm_samp
## W = 0.99833, p-value = 0.4472
#run a shapiro-wilk test on norm_samp

How to interpret shapiro test results

  1. state your p-value: p-value = 0.7614 (your p-value may be different, but it should be above p > 0.05)
  2. compare your p-value to significant p-value value: 0.7614 > 0.05
  3. look at your null hypothesis: data are normally distributed
  4. contextualize: My p-value of the shapiro-wilks test is 0.76, which is above a p-value of 0.05, hence I can be reasonably confident that my data are normal

Remember p > 0.05 means you fail to reject the null hypothesis based on your data and tests

This shapiro-wilk value makes sense since my histogram and qqPlot looked normally distributed

Example 2 Let’s use the Shapiro-Wilk test to see if exp_samp is a normal distribution:

# EXAMPLE 2
shapiro.test(exp_samp) #run a shapiro-wilk test on exp_samp
## 
##  Shapiro-Wilk normality test
## 
## data:  exp_samp
## W = 0.80627, p-value < 2.2e-16

How to interpret shapiro test results

  1. state your p-value: p-value < 2.2e-16 (your p-value may be different, but it should be below p < 0.05)
  2. compare your p-value to significant p-value value: 2.2e-16 < 0.05
  3. look at your null hypothesis: data are normally distributed
  4. contextualize: My p-value of the shapiro-wlk test is 2.2e-16, which is below a p-value of 0.05, hence I can be reasonably confident that my data are not normal

Remember p < 0.05 means you reject the null hypothesis based on your data and tests

This shapiro-wilk value makes sense since my histogram and qqPlot looked not normally distributed

Once we have determined if our data are normal, then we can determine an appropriate statistical test to compare means of our data

Part 3: Comparing means

3.1.1 One sample t-test

When we want to ask a question about the mean of a sample, but we do not know the true variance of the population (i.e. we have to estimate the variance from the sample) we need to use a one sample t-test. The one-sample t-test has the following assumptions:

  1. Assumes the test variable (e.g. weight of leemurs, etc.) has a normal distribution
  2. Assumes that the observations represent a random sample from the population

The hypotheses for a two-sided one sample t-test are:

\(H_0 : \mu = a\)
\(H_A : \mu \neq a\)

where \(a\) is your hypothesized mean.

To test these hypotheses, calculate a t statistic using the following formula:

\(t = \frac{\bar Y - a}{s/\sqrt{n}}\)

where \(\bar Y\) is the sample mean, \(s\) is the standard deviation of your sample, and n is your sample size.

The resulting t-statistic has \(n-1\) degrees of freedom (df). You can test your null hypothesis in three different ways:

  1. Get the critical t-value \((t_{\alpha(2)})\) with \(n-1\) df for \(\alpha = 0.05\) and compare it to your calculated t-value. Reject your null hypothesis if the absolute value of your t-value is greater than the absolute value or the critical t-value.
  2. Calculate the 95% confidence interval with the equation \(\bar Y \pm t_{\alpha(2),df}SE_{\bar Y}\) and determine if \(a\) (your hypothesized mean) is in the interval. Reject your null hypothesis if \(a\) is not in the 95% CI.
  3. Find the p-value associated with your critical t-value (via R or a t-table with \(n-1\) df) and determine if it is less than your predetermined \(\alpha\). Reject \(H_0\) if \(p<\alpha\)

R code:

# t.test(DATASET$VARIABLE, mu=MEAN) 

3.1.2 Paired t-test

If you want to compare the means between two groups and each data point in one group is uniquely paired with a data point in another group you can use the paired t-test. Take the difference between each pair of data points and then perform a one sample t-test with \(H_0 : \mu_d = 0\). Assumptions:

  1. Assumes that the observations from each group represent a random sample from the population.
  2. Assumes that the difference of the two observations follow a normal distribution.

R code

#first check for normality 
#dataset$difference <- (dataset$variable1 - dataset$variable2)
#hist(dataset$difference)
#shapiro.test(dataset$difference)


#t.test(dataset$variable1 - dataset$variable2, paired = TRUE)

3.1.3 Two-sample t-test

The two-sample t-test compares whether the means of two groups are significantly different from each other. This test has the following assumptions:

  1. Assumes that the observations from each group represent a random sample from the population.
  2. Assumes that the observations follow a normal distribution.
  3. Assumes that the observations from the two groups have the same variance.

If we have two groups called group 1 and group 2, the two-sided hypothes for the two-sample t-test are:

\(H_0 : \mu_1 = \mu_2\)

\(H_A : \mu_1 \neq \mu_2\)

Before we can test these hypotheses we need to ensure that our assumptions are met. You can test the normality assumption using any of the procedures that you already know (e.g. qqplots or the Shapiro-Wilk statistic). To test the homogeneity (equal) of variance assumption we will use something called Levene’s Test. The null hypothesis of the Levene’s test is that the variance between the two groups are equal. The alternative hypothesis is that they are not equal.
If we fail to reject the null hypothesis of the Levene’s test we can pool our variances and use a standard two-sample t-test. We would compute this t statistic using the following equations:

\(SE_{\bar Y_1 - \bar Y_2} = \sqrt{s^2_p(\frac{1}{n_1}+\frac{1}{n_2})}\)

where

\(s^2_p = \frac{df_1s^2_1+df_2s^2_2}{df_1+df_2}\)

is the pooled sample variance (described in your book). With these equations we can calculate the t statistic:

\(t = \frac{\bar Y_1 - \bar Y_2}{SE_{\bar Y_1 - \bar Y_2}}\)

where this t statistic has a df of \(n_1 + n_2 - 2\).

If we reject the null hypothesis of the Levene’s test we use something called Welch’s t-test. We calculate the t statistic with a similar equation as equation 7, but the denominator is changed to account for the different variances between the two groups. Moreover, the df is also calculated a bit differently. For the most part, you can ignore these technicalities. You just need to understand that a different test is used when between group variances are unequal. If possible, we would prefer our two groups to have equal variance because our statistical test will have more power.
Similar to the one-sample t-test, you can draw conclusions using one of the equivalent methods:

  1. Get the critical t-value \((t_{\alpha(2)})\) with \(n_1 + n_2 - 2\) df (assuming equal variance) for \(\alpha = 0.05\) and compare it to your calculated t-value. Reject your null hypothesis if your absolute value of your t-value is greater than the absolute value of the critical t-value.
  2. Calculate the 5% confidence interval via the equation \(\bar Y_1 - \bar Y_2 \pm SE_{\bar Y_1 - \bar Y_2}t_{\alpha(2),df}\) and determine if 0 is in the interval. Reject your null hypothesis if 0 is not in the 95% CI.
  3. Calculate the p-value associated with your calculated t-value (via R) and determine if it is less than your predetermined \(\alpha\). Reject \(H_0\) if \(p<\alpha\).

R code

# Two-sampled t-test with unequal variances using the Welch's t-test
#t.test(GROUP1$VARIABLE, GROUP2$VARIABLE, var.equal = FALSE)

# One-sided test: Mean is significantly less of Group 1 than mean of Group 2
# t.test(GROUP1$VARIABLE, GROUP2$VARIABLE, alternative = "less, var.equal = FALSE)

Resources for more organized and visually appealing Rmarkdown documents

Part 3.1: Exercise

Our parasite data is not normal so we will not be using a ttest to compare parasite load means. Instead we will use data from the lterdatasampler package in R.

Dataset description

hbr_maples: Sugar maple seedlings at Hubbard Brook Experimental Forest (New Hampshire) in calcium-treated and reference watersheds in August 2003 and June 2004

More info found [here] (https://lter.github.io/lterdatasampler/articles/hbr_maples_vignette.html)

Citation: Horst A, Brun J (2022). lterdatasampler: Educational dataset examples from the Long Term Ecological Research program. R package version 0.1.0, https://github.com/lter/lterdatasampler.

# The data we will use for this exercise is preloaded into R! Let's take a look at it using glimpse

#using glimpse function to explore the data 
glimpse(hbr_maples)
## Rows: 359
## Columns: 11
## $ year                <dbl> 2003, 2003, 2003, 2003, 2003, 2003, 2003, 2003, 20…
## $ watershed           <fct> Reference, Reference, Reference, Reference, Refere…
## $ elevation           <fct> Low, Low, Low, Low, Low, Low, Low, Low, Low, Low, …
## $ transect            <fct> R1, R1, R1, R1, R1, R1, R1, R1, R1, R1, R1, R1, R1…
## $ sample              <fct> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,…
## $ stem_length         <dbl> 86.9, 114.0, 83.5, 68.1, 72.1, 77.7, 85.5, 81.6, 9…
## $ leaf1area           <dbl> 13.837, 14.572, 12.451, 9.974, 6.838, 9.660, 8.823…
## $ leaf2area           <dbl> 12.130, 15.267, 9.731, 10.068, 5.480, 7.643, 9.233…
## $ leaf_dry_mass       <dbl> 0.0453, 0.0476, 0.0423, 0.0397, 0.0204, 0.0317, 0.…
## $ stem_dry_mass       <dbl> 0.0300, 0.0338, 0.0248, 0.0194, 0.0180, 0.0246, 0.…
## $ corrected_leaf_area <dbl> 29.104, 32.976, 25.319, 23.179, 15.455, 20.440, 21…
str(hbr_maples)
## tibble [359 × 11] (S3: tbl_df/tbl/data.frame)
##  $ year               : num [1:359] 2003 2003 2003 2003 2003 ...
##  $ watershed          : Factor w/ 2 levels "Reference","W1": 1 1 1 1 1 1 1 1 1 1 ...
##  $ elevation          : Factor w/ 2 levels "Low","Mid": 1 1 1 1 1 1 1 1 1 1 ...
##  $ transect           : Factor w/ 12 levels "R1","R2","R3",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ sample             : Factor w/ 20 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ stem_length        : num [1:359] 86.9 114 83.5 68.1 72.1 77.7 85.5 81.6 92.9 59.6 ...
##  $ leaf1area          : num [1:359] 13.84 14.57 12.45 9.97 6.84 ...
##  $ leaf2area          : num [1:359] 12.13 15.27 9.73 10.07 5.48 ...
##  $ leaf_dry_mass      : num [1:359] 0.0453 0.0476 0.0423 0.0397 0.0204 0.0317 0.0382 0.0179 0.0286 0.0125 ...
##  $ stem_dry_mass      : num [1:359] 0.03 0.0338 0.0248 0.0194 0.018 0.0246 0.0316 0.015 0.0291 0.0149 ...
##  $ corrected_leaf_area: num [1:359] 29.1 33 25.3 23.2 15.5 ...

First, let’s create an exploratory visualization of the data using ggplot. Is there evidence for differences in maple seedling height (millimeters) between the calcium-treated (W1) and untreated (reference) watersheds? We can visualize out data using a boxplot

 ggplot(data = hbr_maples, aes(x = watershed, y = stem_length)) +
  geom_boxplot(aes(color = watershed, shape = watershed), #adds color based on watershed
               alpha = 0.8, #makes color transparent
               width = 0.5) +
  geom_jitter(         
    aes(color = watershed), #geom_jitter adds data points to the boxplots 
    alpha = 0.5,
    show.legend = FALSE,
    position = position_jitter(width = 0.2, seed = 0)
  ) +
  labs(
    x = "Watershed",
    y = "Stem length (millimeters)",
    title = "Stem Lengths of Sugar Maple Seedlings",
    subtitle = "Hubbard Brook LTER"
  ) +
  facet_wrap(~year) + #Facetwrap creates two side-by-side plots to visulaize differences between years 
  theme_minimal()

In both years of the study (2003 and 2004) it appears that seedling heights shift toward longer stem lengths in the calcium-treated watershed (W1), compared to the untreated watershed.

Checking our assumption for a two-sample ttest

Is the data normal? Let’s look at the histogram and qqplot and then run a shapir.wilks test

#simple histogram to view the distribution of our Y  variable using geom_hist()
ggplot(hbr_maples, aes(x = stem_length)) + 
  geom_histogram()

#lets make our histogram  a little bit fancier, adding titles, color coding watershed

ggplot(hbr_maples, aes(x = stem_length, color = watershed, fill = watershed)) + 
  geom_histogram(alpha = 0.5, position = "identity") +
  geom_density(aes(color = watershed), size = 1) +
   labs(
    x = "Stem Length (mm)",
    y = "Frequency",
    title = "Distribution of Sugar Maple seedling stem lengths",
    subtitle = "Hubbard Brook LTER"
  ) +
  theme_minimal()

Next assumption: The two samples have equal variance

Having explored distributions in the histograms above, we can conclude that the data are approximately normal. As a next step, we will use var.test() to test for equal variances of 2004 seedling height (stem_length, in millimeters) observations in each watershed.

Null and alternative hypotheses:

\(H_0\): Ratio of variances is equal to 1 (equal variances)’ \(H_A\): Ratio of variances is not equal to 1 (unequal variances)

Test this using the tidyverse method

hbr_maples %>% 
  filter(year == 2003) %>% #only selects obsevations from 2003
  var.test(stem_length ~ watershed, data = .) #var.test tests for equal variance 
## 
##  F test to compare two variances
## 
## data:  stem_length by watershed
## F = 0.94458, num df = 119, denom df = 119, p-value = 0.7563
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.6583073 1.3553360
## sample estimates:
## ratio of variances 
##           0.944578
hbr_maples %>% 
  filter(year == 2004) %>% #only selects observations from 2004
  var.test(stem_length ~ watershed, data = .) #var.test tests for equal variance 
## 
##  F test to compare two variances
## 
## data:  stem_length by watershed
## F = 1.2701, num df = 58, denom df = 59, p-value = 0.3626
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.7573898 2.1323658
## sample estimates:
## ratio of variances 
##           1.270104

With a p-value > 0.05, we fail to reject the null hypothesis that the ratio of the variances of the two watersheds is equal, and continue with the assumption of equal variances.

Now, we’ll do a two-sided two-sample t-test using t.test() to determine if there is a statistically significant difference in seedling height between the watersheds (using only 2004 observations).

Compare means

Null and alternative hypotheses:

\(H_0:𝜇1=𝜇2\)

\(H_A:𝜇1≠𝜇2\)

By default, t.test() will use a Welch t-test that does not assume equal variances (default argument is var.equal = FALSE). We set var.equal = TRUE based on the outcome of our F-test above:

hbr_maples %>% 
  filter(year == 2003) %>% 
  t.test(stem_length ~ watershed,
         var.equal = TRUE,
         data = .)
## 
##  Two Sample t-test
## 
## data:  stem_length by watershed
## t = -3.7797, df = 238, p-value = 0.0001985
## alternative hypothesis: true difference in means between group Reference and group W1 is not equal to 0
## 95 percent confidence interval:
##  -10.497532  -3.304134
## sample estimates:
## mean in group Reference        mean in group W1 
##                80.98500                87.88583
hbr_maples %>% 
  filter(year == 2004) %>% 
  t.test(stem_length ~ watershed,
         var.equal = TRUE,
         data = .)
## 
##  Two Sample t-test
## 
## data:  stem_length by watershed
## t = -4.3092, df = 117, p-value = 3.432e-05
## alternative hypothesis: true difference in means between group Reference and group W1 is not equal to 0
## 95 percent confidence interval:
##  -16.982759  -6.287862
## sample estimates:
## mean in group Reference        mean in group W1 
##                85.88136                97.51667

Based on the outcome of these t-test, we can reject the null hypothesis and conclude that mean seedling heights in the reference watershed (no calcium treatment) differ significantly from those in the calcium-treated watershed (W1) in both 2003 and 2004.

3.2 Non parametric tests

Non-parametric tests make no assumptions about the distributions of the data and thus are robust to violations of normality. When our data are not normal and we can’t fix them with a transformation we typically turn to non-parametric tests. You typically do non-parametric tests on your untransformed data. Transforming your data will not change the results of a non-parametric test, but it will change your interpretation of the results. Who wants to talk about medians of square-root transformed data when you don’t have to? Keep in mind that non-parametric tests can be used with data that are normally distributed. However, parametric tests that assume normality have more power, so you should use those if your data are normal.

We are only going to consider one non-parametric test in this lab: the Mann-Whitney U-test, aka Wilcoxon Test. These tests can be used in place of a two-sample t-test when your assumptions of normality are not met. However, there are still some assumptions for the Mann-Whitney U-test:

  • Assumes both samples are random samples from their populations
  • Assumes that the distributions of the two groups have similar shapes (i.e. similar variance and similar skew!). I

You have already learned about these tests in class. They rank your data (e.g. smallest value has a rank of 1 and the largest value has a rank of n) and then look at the distributions of these ranks. Note that while you can make similar conclusions with non-parametric tests, the \(H_0\) and \(H_A\) are not specified in terms of the mean, but rather whether the distributions of the two groups differ in central tendency (i.e. the median).

The null hypothesis for a Mann-Whitney U-test is:

\(H_0\) : Median of group 1 equals the median of group 2 \(H_A\) : Median of group 1 does not equal median of group 2

3.2.1 Mann-Whitney U-test/Wilcoxon Test in R

Now let’s walk through an example of how to do the Mann-Whitney U-test in R. We are going to use our parasite data! For this analysis, we are going to see if there is a sinificant difference in male and femal parasite load in infected individuals

First data wrangling!
c4c_pos <- c4c %>%
  filter(Para > 0) %>% #creating a subset with only infected individuals 
   mutate(log_para = log(Para), # adding  a column with square root transformation
          sq_para = sqrt(Para)) # adding  a column with log transformation
Visualizing the data, histograms and boxplot

Boxplot first

#boxplot with untransformed data

boxplot(c4c_pos$Para~c4c_pos$Sex,
     main="Parasite load in Males and Females",
     xlab="Sex",
    ylab ="Parasite load")

#boxblot with log tranformed parasite burden 
boxplot(c4c_pos$log_para~c4c_pos$Sex,
    main="Parasite load in Males and Females",
     xlab="Sex",
    ylab ="Log Parasite load")

Histograms, lets see if the data look normally distributed

par(mfrow=c(3,1)) #sets up your plot (3 x 1)

#histogram with untransformed Y
hist(c4c_pos$Para ,probability = T,
     col = "plum1",
     main="Parasite load distribution",
     xlab="Parasite load")

#histogram with log transformed Y
hist(c4c_pos$sq_para ,probability = T,
     col = "azure1",
     main=" Square root tranformed parasite distribution",
     xlab="Parasite load")

#histogram with square root transformed Y
hist(c4c_pos$log_para, probability = T,
     col = "pink",
     main="Log transformed parasite load distribution",
     xlab="Parasite load")

#qqPlot
qqPlot(c4c_pos$log_para)

## [1] 63  8
#shapiro test
shapiro.test(c4c_pos$log_para) 
## 
##  Shapiro-Wilk normality test
## 
## data:  c4c_pos$log_para
## W = 0.89433, p-value = 7.727e-07
#p-value = 7.727e-07, we reject  the null that our data is normally distributed. 

Since our data is not normal after transformation, we will go back to using the un-transformed data for a non-parametric test. But we also need to check for equal variance. We can do that with a levenes test!

#Run Levene's Test; (y~x) or (y,x)
leveneTest(Para~Sex, data = c4c_pos)
## Levene's Test for Homogeneity of Variance (center = median)
##       Df F value Pr(>F)
## group  1  0.0244 0.8763
##       98

Levene’s Test gives p= 0.98, so we cannot reject the null hypothesis. This is strong evidence that the variances of these distributions are not different. We therefore can use a Mann-Whitney U-test which assumes that the distributions have the same shape.

# Run a Mann-Whitney U-test/Wilcoxon Test 
wilcox.test(Para ~ Sex, data=c4c_pos)
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  Para by Sex
## W = 1243.5, p-value = 0.8142
## alternative hypothesis: true location shift is not equal to 0

Based on the above results we can conclude that males and females do not have signficantly different median parasite loads. Note that the W statistic (or the U statistic) is analogous to a t or Z statistic: it comes from a distribution from which you can determine its associated p-value.

We can also find the medians of each group:

median(c4c_pos$Para[c4c_pos$Sex == "m"], na.rm=T) # na.rm is removing NAs
## [1] 36
median(c4c_pos$Para[c4c_pos$Sex == "f"], na.rm=T) # na.rm is removing NAs
## [1] 23

From this, we can see that males actually have a higher median Parasite burden than females do, but the difference is not significant.

3.3 Anova

3.3.1 Analysis of Variance (ANOVA)

We have now seen many different ways to compare the means of two groups. However, what if we have more than two groups? Unfortunately, we can’t just use a bunch of pairwise t-tests because this would inflate our Type I () error. ANOVA is the answer. It gives us a robust and powerful away to determine whether the means of two or more groups are different from each other. There are a couple terms that we need to familiarize ourselves with before we start using ANOVAs.

  • Factor: Our variable of interest that we are changing. For example, if we were comparing the mean hearing frequency between different felines the factor would be feline and it would have different levels (i.e lion, tiger, and puma).

  • Level/group/treatment: The levels of a factor are the groups that we are comparing. For example, the levels of the factor feline are lion, tiger, and puma.

3.3.2 Assumptions of the ANOVA

Here are the assumptions of the ANOVA and how we test them:

  1. Measurements from each group (level) represent a random sample from the corresponding population
  • Ensure that this assumption is met with your experimental design
  1. Our response variable is normally distributed for each population
  • Test this assumption with a Q-Q plot and a Shapiro-Wilk test of the residuals
  1. The variance is the same in all k populations (homogeneity of variance)
  • Test this assumption with a residual plot or a Levene’s Test. We will talk about how to test these assumptions in section 2.2.

3.3.3 Hypotheses of the ANOVA

If we are comparing the means of k populations, the \(H_0\) and \(H_A\) of a one-way ANOVA are:

  • \(H_0\) : The mean of group 1 = mean of group 2 = mean of group k (μ_1= μ_2 = … =μ_k)

  • \(H_0\) : At least one mean across the groups (μ_j) is not equal

These hypotheses assert that any observation i from group (level) j, y_ij, can be calculated using the following equation: y_ij=μ_j+ϵ_ij

Where ϵ_ij is an error associated with the i measurement from group j and is normally distributed with mean 0 and some variance σ2ϵ. The variance σ2ϵ is constant for all levels. This is just fancy notaiton for restating our two assumptions:

  1. Normality
  2. Equality of Variance

We could equivalently state the null hypotheses for the ANOVA as

\(H_0\) :α_1 = α_2 =….= α_j =…= α_k =0

\(H_A\) :At least one α_j is not equal to 0

In this case, α_j, is just the deviation of group j from the grand mean μ. The grand mean is the mean of all assumptions pooled across groups. If all these deviations are 0, there is no difference between our means. These hypotheses assert that any observation i from group (level) j, y_ij, can be calculated using the following equation y_ij=μ+α_j+ϵ_ij

Equation [eq:effects] and equation [eq:means] are equivalent.

If we reject our \(H_0\), we have no information about which means are different, just that at least one is different.

Resources for more R ANOVA tutorials

Part 3 EXERCISES

Is there a significant difference in adult salamander weight among the years 2010-2019?

We are going to use another dataset provided by the lterdatasampler pacakge in R.

Dataset description

The and_vertebrates dataset contains length and weight observations for Coastal Cutthroat Trout and two salamander species (Coastal Giant Salamander, and Cascade Torrent Salamander) in previously clear cut (c. 1963) and old growth coniferous forest sections of Mack Creek in HJ Andrews Experimental Forest, Willamette National Forest, Oregon.

Citation: Gregory, S.V. and I. Arismendi. 2020. Aquatic Vertebrate Population Study in Mack Creek, Andrews Experimental Forest, 1987 to present ver 14. Environmental Data Initiative. https://doi.org/10.6073/pasta/7c78d662e847cdbe33584add8f809165

Hypotheses

  • \(H_0\): α_2010=α_2011=α_2012…=α_2017

  • \(H_A\): At least one α_j is not equal to 0

Always make sure you know what your factor and its levels are before beginning an ANOVA. Otherwise, things can get really confusing. Okay, let’s begin the analysis.

First, lets load in the dataset and_vertebrates and look at it to get a sense of what’s going on.

##read in your data and check out how its structured 

str(and_vertebrates)
## tibble [32,209 × 16] (S3: tbl_df/tbl/data.frame)
##  $ year       : num [1:32209] 1987 1987 1987 1987 1987 ...
##  $ sitecode   : chr [1:32209] "MACKCC-L" "MACKCC-L" "MACKCC-L" "MACKCC-L" ...
##  $ section    : chr [1:32209] "CC" "CC" "CC" "CC" ...
##  $ reach      : chr [1:32209] "L" "L" "L" "L" ...
##  $ pass       : num [1:32209] 1 1 1 1 1 1 1 1 1 1 ...
##  $ unitnum    : num [1:32209] 1 1 1 1 1 1 1 1 1 1 ...
##  $ unittype   : chr [1:32209] "R" "R" "R" "R" ...
##  $ vert_index : num [1:32209] 1 2 3 4 5 6 7 8 9 10 ...
##  $ pitnumber  : num [1:32209] NA NA NA NA NA NA NA NA NA NA ...
##  $ species    : chr [1:32209] "Cutthroat trout" "Cutthroat trout" "Cutthroat trout" "Cutthroat trout" ...
##  $ length_1_mm: num [1:32209] 58 61 89 58 93 86 107 131 103 117 ...
##  $ length_2_mm: num [1:32209] NA NA NA NA NA NA NA NA NA NA ...
##  $ weight_g   : num [1:32209] 1.75 1.95 5.6 2.15 6.9 5.9 10.5 20.6 9.55 13 ...
##  $ clip       : chr [1:32209] "NONE" "NONE" "NONE" "NONE" ...
##  $ sampledate : Date[1:32209], format: "1987-10-07" "1987-10-07" ...
##  $ notes      : chr [1:32209] NA NA NA NA ...
#lets create a subset with only cutthroat trout and 3 levels on unit type (cascade, pool, and channel). We also want to only include adult salamanders in this analysis,adult mass ranges from 22 to 114 g.

new_vert <- and_vertebrates %>% 
  filter(species == "Coastal giant salamander",  #selects only salamanders
         unittype %in% c("C", "P", "SC"),
         year > 2010,
         weight_g > 22) %>%
  drop_na(unittype) #drops NAs 

new_vert$year <-as.factor(new_vert$year)
#let's look at the structure of our subset
str(new_vert)
## tibble [184 × 16] (S3: tbl_df/tbl/data.frame)
##  $ year       : Factor w/ 9 levels "2011","2012",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ sitecode   : chr [1:184] "MACKCC-L" "MACKCC-L" "MACKCC-L" "MACKCC-L" ...
##  $ section    : chr [1:184] "CC" "CC" "CC" "CC" ...
##  $ reach      : chr [1:184] "L" "L" "L" "L" ...
##  $ pass       : num [1:184] 1 1 1 2 2 1 1 1 1 1 ...
##  $ unitnum    : num [1:184] 1 2 3 2 3 5 5 5 6 7 ...
##  $ unittype   : chr [1:184] "C" "SC" "C" "SC" ...
##  $ vert_index : num [1:184] 48 48 42 38 36 18 31 40 9 18 ...
##  $ pitnumber  : num [1:184] 16396930 26173229 26200280 26197920 26178692 ...
##  $ species    : chr [1:184] "Coastal giant salamander" "Coastal giant salamander" "Coastal giant salamander" "Coastal giant salamander" ...
##  $ length_1_mm: num [1:184] 117 91 99 120 118 108 97 89 119 120 ...
##  $ length_2_mm: num [1:184] 204 165 177 187 202 187 168 153 196 204 ...
##  $ weight_g   : num [1:184] 48.7 28 30.7 55.9 50.6 ...
##  $ clip       : chr [1:184] "NONE" "NONE" "NONE" "NONE" ...
##  $ sampledate : Date[1:184], format: "2011-09-12" "2011-09-12" ...
##  $ notes      : chr [1:184] "RECAP" NA "BLEEDER" "CUTTAIL" ...

Now lets visualize the data with a boxplot:

ggplot(data = new_vert, aes(x = year, y = weight_g, fill = year)) +
  geom_boxplot() +
  labs(
    x = "Unit type",
    y = "Weight (grams)",
    title = "Adult coastal giant salamander w among stream habitats",
    subtitle = "Mack Creek in HJ Andrews Experimental Forest, Willamette National Forest, Oregon"
  ) +
  theme_minimal()

Notice that in general, the variances of each level seem pretty comparable in the boxplot. Moreover, the data don’t look like they violate normality too severely, though there is one outlier.

Checking our assumptions

To test our assumption we need to look at the residuals of our ANOVA model. Residuals are defined as:\(eij=yij−yj¯\)

In words, this says the residual eij is equal to the observed value (yij) minus the predicted value \((ŷ ij−y¯js\))

Residuals are useful for testing our assumptions because they are estimates of our error terms ϵij. Therefore we should expect that our residuals should be normally distributed and that they should have approximately equal variances.

For more info on residuals: http://www.statsmakemecry.com/smmctheblog/confusing-stats-terms-explained-residual.html

To get our residuals, we first need to run our ANOVA. We will use the funciton aov() which is very similar in terms of output and structure to lm(), but will calculate ANOVA results instead of regression results!

# HINT: aov(y ~ x, data = dataset)
salamander <- aov(weight_g~as.factor(year), data=new_vert) #run the ANOVA
par(mfrow=c(2,2)) # places 2 plots on 1 output view
plot(salamander) #look at the diagnostic plots; focus on the first two plots (fitted vs residuals & QQ plot)

While all the plots on this graph are useful, the two that we will focus on are the plots in the first row.

  1. The first plot is called a residual plot and it is used to test our Homogeneity of Variance (homoskedasticity) assumption (i.e. plot titled “Residuals vs. Fitted”). It plots the values predicted by our ANOVA model (see equation [eq:means] and [eq:effects]) against the residuals of the model. If the residuals fall along the zero line for each treatment and don’t have any distinct pattern then we can be pretty confident that our assumption of equal variances is met. If this plot shows a distinct pattern, such as a wedge shape, there is good evidence that our assumptions of homogeneity of variance are not met and we might want to try a transformation.

  2. The other plot that is useful in is the qqplot of the residuals (i.e. plot titled “Normal Q-Q”). You have already seen a lot of the qqPlot so you know that if it deviates from a straight line the residuals likely do not follow a normal distribution. This would violated the normality assumption of the ANOVA.

We have run the ANOVA model! Be warned, we cannot interpret the results until we check the residuals! R has been incredibly helpful and already calculated the residuals for us. Now our residuals are saved in the variable res. With this done, let’s check some of our assumptions.

Normality: Run a Shapiro-Wilk test on res.

par(mfrow=c(1,1)) # resets output view

res <- salamander$residuals #retrieve the residuals

# Tool 1 for checking normality
hist(res)

# Tool 2 for checking normality
qqPlot(res) 

## [1]  99 180
# Tool 3 for checking normality

shapiro.test(res) ## W = 0.97056, p-value = 0.0006311, not normal! 
## 
##  Shapiro-Wilk normality test
## 
## data:  res
## W = 0.97056, p-value = 0.0006311

Time to transform and retest all our assumptions

new_vert$log_weight <- log(new_vert$weight_g)


#retest all the assumptions 
salamander_2 <- aov(log_weight~as.factor(year), data=new_vert) #run the ANOVA
par(mfrow=c(2,2)) # places 2 plots on 1 output view
plot(salamander_2) #look at the diagnostic plots; focus on the first two plots (fitted vs residuals & QQ plot)

par(mfrow=c(1,1)) # resets output view
res2 <- salamander_2$residuals #retrieve the residuals

# Tool 1 for checking normality
hist(res2)

# Tool 2 for checking normality
qqPlot(res2) 

## [1] 165  23
# Tool 3 for checking normality
shapiro.test(res2) ## W = 0.99125, p-value = 0.328, p > 0.05, our data is  normal!
## 
##  Shapiro-Wilk normality test
## 
## data:  res2
## W = 0.99125, p-value = 0.328

P-value (0.328) > 0.05 so we cannot reject the null; suggests that the data are normal. Now lets check our next assumption.

Homogeneity of Variance: Run a Levene’s Test on the stacked data just like you have been doing for the last few weeks. I get a p-value of 0.08664.

leveneTest(log_weight~as.factor(year), data = new_vert) # F value = 1.7661 , 0.08664
## Levene's Test for Homogeneity of Variance (center = median)
##        Df F value  Pr(>F)  
## group   8  1.7661 0.08664 .
##       175                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

P-value (0.08664) > 0.05, so the variances are equal!

Analyzing the ANOVA

Once we have confirmed that our assumptions are met we can analyze the ANOVA. For the hand washing data, we met both our assumptions of normality and homogeneity of variance, so we can analyze the model.

# we actually already ran the anova when we created the residuals as a reminder we did it with:
# HINT: aov(y ~ x, data = dataset)
salamander_2 <- aov(log_weight~year, data=new_vert) 
#So now we just need to get the results with:
summary(salamander_2)
##              Df Sum Sq Mean Sq F value   Pr(>F)    
## year          8  4.144  0.5180   4.049 0.000193 ***
## Residuals   175 22.387  0.1279                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

As described in class, this table contains all of the information that was required to run the ANOVA. We see that we obtained an F-value of 4.049 and a p-value (0.000193) less than 𝛼=0.05 allowing us to reject the null hypothesis and conclude that there has been a significant difference in adult salamander weight in at least one year from 2010 to 2019.

Post-hoc Comparisons

We just concluded from the above ANOVA that at least one of the means differ, but we often want to see which groups actually differ. We know we can’t do a bunch of t-tests because that will inflate our Type I error rate. We are instead going to use a method called a Tukey-Kramer test, which controls our Type I error rate. Here are some important notes about the Tukey-Kramer test:

  • The Tukey-Kramer test assumes that you have already run an ANOVA and rejected your null hypothesis

  • The Tukey-Kramer test does pairwise comparisons for all the groups we are testing and controls the family-wise Type I error rate so it is at most 𝛼=0.05 (it could be less).

  • When your ANOVA is unbalanced (unequal sample sizes in each group), the Tukey-Kramer test is conservative, meaning that the family-wise Type I error is less than 𝛼 and it is harder for us to reject the null.

  • The Tukey-Kramer test makes all of the same assumptions as the ANOVA

To run a Tukey-Kramer test for our hand-washing ANOVA

# HINT: TukeyHSD(aov_model)

TukeyHSD(salamander_2)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = log_weight ~ year, data = new_vert)
## 
## $year
##                  diff          lwr       upr     p adj
## 2012-2011  0.08258972 -0.193983105 0.3591626 0.9904323
## 2013-2011  0.05499478 -0.268538811 0.3785284 0.9998288
## 2014-2011  0.20922045 -0.133020962 0.5514619 0.6007748
## 2015-2011  0.15881195 -0.164721642 0.4823455 0.8340924
## 2016-2011  0.30456963 -0.074143310 0.6832826 0.2268798
## 2017-2011  0.23647394 -0.077130174 0.5500780 0.3080292
## 2018-2011  0.42911525  0.093721430 0.7645091 0.0027494
## 2019-2011  0.46485788  0.106530971 0.8231848 0.0022273
## 2013-2012 -0.02759494 -0.351128534 0.2959387 0.9999992
## 2014-2012  0.12663073 -0.215610684 0.4688721 0.9631990
## 2015-2012  0.07622223 -0.247311365 0.3997558 0.9981232
## 2016-2012  0.22197991 -0.156733033 0.6006928 0.6547741
## 2017-2012  0.15388421 -0.159719897 0.4674883 0.8343610
## 2018-2012  0.34652553  0.011131707 0.6819193 0.0369532
## 2019-2012  0.38226816  0.023941248 0.7405951 0.0268674
## 2014-2013  0.15422567 -0.226970663 0.5354220 0.9384482
## 2015-2013  0.10381717 -0.260676240 0.4683106 0.9930173
## 2016-2013  0.24957485 -0.164677431 0.6638271 0.6199053
## 2017-2013  0.18147915 -0.174229993 0.5371883 0.8023314
## 2018-2013  0.37412046 -0.000940159 0.7491811 0.0511291
## 2019-2013  0.40986310  0.014161653 0.8055645 0.0360698
## 2015-2014 -0.05040850 -0.431604831 0.3307878 0.9999748
## 2016-2014  0.09534918 -0.333673137 0.5243715 0.9987634
## 2017-2014  0.02725348 -0.345552355 0.4000593 0.9999998
## 2018-2014  0.21989480 -0.171417955 0.6112075 0.7049357
## 2019-2014  0.25563743 -0.155501028 0.6667759 0.5780618
## 2016-2015  0.14575768 -0.268494600 0.5600100 0.9728124
## 2017-2015  0.07766198 -0.278047162 0.4333711 0.9989123
## 2018-2015  0.27030330 -0.104757328 0.6453639 0.3699983
## 2019-2015  0.30604593 -0.089655516 0.7017474 0.2748831
## 2017-2016 -0.06809570 -0.474640281 0.3384489 0.9998468
## 2018-2016  0.12454562 -0.299034335 0.5481256 0.9913643
## 2019-2016  0.16028825 -0.281672311 0.6022488 0.9673397
## 2018-2017  0.19264131 -0.173888388 0.5591710 0.7750893
## 2019-2017  0.22838394 -0.159241109 0.6160090 0.6483222
## 2019-2018  0.03574263 -0.369713480 0.4411987 0.9999989

A fancier way of doing this:

# Use the function glht in the multcomp package

# HINT: glht(ANOVA_MODEL, linfct = mcp(FACTOR = "Tukey)
?glht() # you'll see you need to have a factor for your mcp(argument)


Post_hoc <- glht(salamander_2, linfct = mcp(year = "Tukey")) # if year is not a factor when you ran the salamander_2 anova, the glht() will not work

summary(Post_hoc) 
## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: Tukey Contrasts
## 
## 
## Fit: aov(formula = log_weight ~ year, data = new_vert)
## 
## Linear Hypotheses:
##                  Estimate Std. Error t value Pr(>|t|)   
## 2012 - 2011 == 0  0.08259    0.08805   0.938  0.99009   
## 2013 - 2011 == 0  0.05499    0.10300   0.534  0.99982   
## 2014 - 2011 == 0  0.20922    0.10896   1.920  0.59421   
## 2015 - 2011 == 0  0.15881    0.10300   1.542  0.83021   
## 2016 - 2011 == 0  0.30457    0.12057   2.526  0.22245   
## 2017 - 2011 == 0  0.23647    0.09984   2.369  0.30225   
## 2018 - 2011 == 0  0.42912    0.10678   4.019  0.00281 **
## 2019 - 2011 == 0  0.46486    0.11408   4.075  0.00226 **
## 2013 - 2012 == 0 -0.02759    0.10300  -0.268  1.00000   
## 2014 - 2012 == 0  0.12663    0.10896   1.162  0.96205   
## 2015 - 2012 == 0  0.07622    0.10300   0.740  0.99805   
## 2016 - 2012 == 0  0.22198    0.12057   1.841  0.64896   
## 2017 - 2012 == 0  0.15388    0.09984   1.541  0.83048   
## 2018 - 2012 == 0  0.34653    0.10678   3.245  0.03598 * 
## 2019 - 2012 == 0  0.38227    0.11408   3.351  0.02625 * 
## 2014 - 2013 == 0  0.15423    0.12136   1.271  0.93665   
## 2015 - 2013 == 0  0.10382    0.11604   0.895  0.99276   
## 2016 - 2013 == 0  0.24957    0.13188   1.892  0.61432   
## 2017 - 2013 == 0  0.18148    0.11325   1.603  0.79770   
## 2018 - 2013 == 0  0.37412    0.11941   3.133  0.04970 * 
## 2019 - 2013 == 0  0.40986    0.12598   3.253  0.03484 * 
## 2015 - 2014 == 0 -0.05041    0.12136  -0.415  0.99997   
## 2016 - 2014 == 0  0.09535    0.13659   0.698  0.99872   
## 2017 - 2014 == 0  0.02725    0.11869   0.230  1.00000   
## 2018 - 2014 == 0  0.21989    0.12458   1.765  0.69919   
## 2019 - 2014 == 0  0.25564    0.13089   1.953  0.57166   
## 2016 - 2015 == 0  0.14576    0.13188   1.105  0.97189   
## 2017 - 2015 == 0  0.07766    0.11325   0.686  0.99887   
## 2018 - 2015 == 0  0.27030    0.11941   2.264  0.36400   
## 2019 - 2015 == 0  0.30605    0.12598   2.429  0.26980   
## 2017 - 2016 == 0 -0.06810    0.12943  -0.526  0.99984   
## 2018 - 2016 == 0  0.12455    0.13485   0.924  0.99103   
## 2019 - 2016 == 0  0.16029    0.14071   1.139  0.96629   
## 2018 - 2017 == 0  0.19264    0.11669   1.651  0.77052   
## 2019 - 2017 == 0  0.22838    0.12341   1.851  0.64273   
## 2019 - 2018 == 0  0.03574    0.12908   0.277  1.00000   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)
confint(Post_hoc) # shows your lower and upper CI bounds 
## 
##   Simultaneous Confidence Intervals
## 
## Multiple Comparisons of Means: Tukey Contrasts
## 
## 
## Fit: aov(formula = log_weight ~ year, data = new_vert)
## 
## Quantile = 3.1302
## 95% family-wise confidence level
##  
## 
## Linear Hypotheses:
##                  Estimate   lwr        upr       
## 2012 - 2011 == 0  0.0825897 -0.1930279  0.3582074
## 2013 - 2011 == 0  0.0549948 -0.2674214  0.3774110
## 2014 - 2011 == 0  0.2092205 -0.1318390  0.5502799
## 2015 - 2011 == 0  0.1588120 -0.1636043  0.4812282
## 2016 - 2011 == 0  0.3045696 -0.0728354  0.6819746
## 2017 - 2011 == 0  0.2364739 -0.0760471  0.5489950
## 2018 - 2011 == 0  0.4291152  0.0948798  0.7633507
## 2019 - 2011 == 0  0.4648579  0.1077685  0.8219473
## 2013 - 2012 == 0 -0.0275949 -0.3500112  0.2948213
## 2014 - 2012 == 0  0.1266307 -0.2144287  0.4676902
## 2015 - 2012 == 0  0.0762222 -0.2461940  0.3986385
## 2016 - 2012 == 0  0.2219799 -0.1554251  0.5993849
## 2017 - 2012 == 0  0.1538842 -0.1586368  0.4664053
## 2018 - 2012 == 0  0.3465255  0.0122900  0.6807610
## 2019 - 2012 == 0  0.3822682  0.0251788  0.7393575
## 2014 - 2013 == 0  0.1542257 -0.2256542  0.5341055
## 2015 - 2013 == 0  0.1038172 -0.2594174  0.4670518
## 2016 - 2013 == 0  0.2495748 -0.1632468  0.6623965
## 2017 - 2013 == 0  0.1814792 -0.1730015  0.5359598
## 2018 - 2013 == 0  0.3741205  0.0003552  0.7478858
## 2019 - 2013 == 0  0.4098631  0.0155283  0.8041979
## 2015 - 2014 == 0 -0.0504085 -0.4302883  0.3294713
## 2016 - 2014 == 0  0.0953492 -0.3321915  0.5228898
## 2017 - 2014 == 0  0.0272535 -0.3442648  0.3987718
## 2018 - 2014 == 0  0.2198948 -0.1700665  0.6098561
## 2019 - 2014 == 0  0.2556374 -0.1540811  0.6653560
## 2016 - 2015 == 0  0.1457577 -0.2670639  0.5585793
## 2017 - 2015 == 0  0.0776620 -0.2768187  0.4321426
## 2018 - 2015 == 0  0.2703033 -0.1034620  0.6440686
## 2019 - 2015 == 0  0.3060459 -0.0882889  0.7003808
## 2017 - 2016 == 0 -0.0680957 -0.4732362  0.3370448
## 2018 - 2016 == 0  0.1245456 -0.2975714  0.5466627
## 2019 - 2016 == 0  0.1602882 -0.2801459  0.6007224
## 2018 - 2017 == 0  0.1926413 -0.1726225  0.5579052
## 2019 - 2017 == 0  0.2283839 -0.1579024  0.6146703
## 2019 - 2018 == 0  0.0357426 -0.3683132  0.4397984
cld(Post_hoc) #compact letter display of Tukey groups
## 2011 2012 2013 2014 2015 2016 2017 2018 2019 
##  "a"  "a"  "a" "ab" "ab" "ab" "ab"  "b"  "b"
plot(Post_hoc, main = "Fig 1: 95% confidence level", cex.axis = .5) #confidence intervals plotted

# (has a different letter and CIs don't overlap with 0 and p-values significant)

The plot of the confidence intervals generated from the Tukey-Kramer test for every pairwise combination of the groups. To interpret the table, notice that we see every pairwise combination of the levels of our factor. For each of these pairs, the Tukey-Kramer test has run a two-sample t-test and controlled for the family-wise Type I error.

It is also really useful to look at the letters at the bottom of the output. If the levels within your factor do not share a letter, they are significantly different..

Similarly, the Fig 1 displays the 95% family-wise CIs for each pairwise comparison. If the interval does not overlap with 0, we can conclude that the levels are different.