A brief R tutorial about how to perform a data analysis to compare four treatments data. This is achieved by fitting a analysis of variance model, summarizing and making some nice graphs for the data.

# 1 Problem

You want to investigate the effect of light on plant growth. For this you designed an experiment that includes four different light sources. For each source you placed six plants (you replicate each treatment six times) in chambers with a specific light bulb. You kept the plants in their chambers for a fixed time and then you quantified the chlorophyll of each plant.

At the end your experimental data may look like this:

SL | CLQ |
---|---|

SL1 | 21.0 |

SL1 | 27.7 |

SL1 | 19.7 |

SL1 | 23.7 |

SL1 | 28.6 |

SL1 | 21.7 |

SL2 | 34.3 |

SL2 | 33.8 |

SL2 | 34.8 |

SL2 | 36.1 |

I’m only showing you the first ten data points, but remember that in total we have 24 chlorophyll measurements.

The next step is to make a summary table with means and standard deviations, some plots and of course perform and inferential statistic analysis to establish significant differences between light source treatments.

# 2 Solution

First, as always, I must show you my file organization:

- The analysis folder contains all my R scripts.

- The data folder contains the experimental data and all CSV and TXT files products of my analysis scripts.

- The graph folder contains all the plot results of my analysis scripts.

These three folders are contained in a main folder with a representative name.

## 2.1 Data Summary

For make a summary table with means and standard deviations I used the following R script:

```
# Data summary ------------------------------------------------------------
if(!"dplyr" %in% .packages()) library(dplyr)
# 1 Import data -----------------------------------------------------------
main_data <- read.csv("data/main_data.csv")
# 2 Data summary ---------------------------------------------------------
sum_data <- main_data %>%
group_by(SL) %>%
summarise(
MEAN = mean(CLQ),
SD = sd(CLQ),
CV = SD * 100 / MEAN
)
# 3 Save summary ---------------------------------------------------------
write.csv(sum_data, file = "data/summary_data.csv", row.names = FALSE)
```

I followed the usual steps that I used in all my analysis: import data > make analysis > export results. The line of code `if(!"dplyr" %in% .packages()) library(dplyr)`

checks if the necessary packages are already loaded, you can omit the if statement while working on your own.

As can you see in the second step, it is very easy to make this kind of tables using the `dplyr`

package:

With `group_by()`

I specified to perform the next operations, using `summarise()`

, grouping the data by light source treatment, so this way I obtained four means, four standard deviations and four coefficients of variance.

You can see the results in the data folder or by typing `sum_data`

:

`sum_data`

SL | MEAN | SD | CV |
---|---|---|---|

SL1 | 23.73 | 3.67 | 15.46 |

SL2 | 35.28 | 2.23 | 6.33 |

SL3 | 21.38 | 2.31 | 10.79 |

SL4 | 18.88 | 1.81 | 9.60 |

## 2.2 One Way Analysis of Variance Analysis

Now it is time to establish if there are significant differences between treatments. For this I performed an one-way analysis of variance.

I used the following R script:

```
# Analysis of Variance ----------------------------------------------------
# 1 Import data -----------------------------------------------------------
main_data <- read.csv("data/main_data.csv")
# 2 AOV analysis ----------------------------------------------------------
# 2.1 Fit AOV model
data_aov <- aov(CLQ ~ SL, data = main_data)
# 2.2 ANOVA table
anova_table <- anova(data_aov)
# 3 Save results ----------------------------------------------------------
write.csv(
anova_table, file = "data/ANOVA_table.csv",
row.names = FALSE, na = ""
)
```

The code `aov(CLQ ~ SL, data = main_data)`

fits an analysis of variance model using the experimental data and the relation pointed by `CLQ ~ SL`

.

To make the classic ANOVA table you need the function `anova()`

. I made this with the code `anova_table <- anova(data_aov)`

and finally exported it using the function `write.csv()`

.

You can see the result typing or looking in the data folder.

`anova_table`

Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
---|---|---|---|---|---|

SL | 3 | 946.3 | 315.4 | 46.6 | 0 |

Residuals | 20 | 135.3 | 6.8 |

At this case the *p-value* (in the `Pr(>F)`

column) is so small that is rounded to zero (strictly, there is no p-values equal to zero).

So, now that you’ve established that there are significant differences, which treatment means are different? The ANOVA table doesn’t give any information about this question.

## 2.3 Multiple Mean Comparisons (Tukey HSD test)

To establish differences between treatments I performed a Tukey HSD test.

I used the next R script:

```
# Multiple comparisons: Tukey --------------------------------------------
if(!"agricolae" %in% .packages()) library(agricolae)
# 1 Import data -----------------------------------------------------------
main_data <- read.csv(file = "data/main_data.csv")
# 2 HSD test --------------------------------------------------------------
# 2.1 Fit AOV model
data_aov <- aov(CLQ ~ SL, data = main_data)
# 2.2 HSD test on aov model
hsd_test <- HSD.test(data_aov, trt = "SL")
# 3 Save results
# 3.1 Groups from hsd_test
hsd_groups <- data.frame(
SL = row.names(hsd_test$groups),
groups = hsd_test$groups[,2]
)
# 3.2 Save as a data frame
write.csv(hsd_groups, file = "data/hsd_groups.csv", row.names = FALSE)
```

The package `agricolae`

contains the useful function `HSD.test()`

. You can see the result of this test by typing `hsd_test`

:

`hsd_test`

```
## $statistics
## MSerror Df Mean CV MSD
## 6.763917 20 24.82083 10.47811 4.20273
##
## $parameters
## test name.t ntr StudentizedRange alpha
## Tukey SL 4 3.958293 0.05
##
## $means
## CLQ std r Min Max Q25 Q50 Q75
## SL1 23.73333 3.668606 6 19.7 28.6 21.175 22.70 26.700
## SL2 35.28333 2.233756 6 33.3 39.4 33.925 34.55 35.775
## SL3 21.38333 2.306874 6 17.4 23.6 20.800 21.50 23.175
## SL4 18.88333 1.812641 6 16.1 21.3 17.950 19.30 19.675
##
## $comparison
## NULL
##
## $groups
## CLQ groups
## SL2 35.28333 a
## SL1 23.73333 b
## SL3 21.38333 bc
## SL4 18.88333 c
##
## attr(,"class")
## [1] "group"
```

The test is performed using an alpha = 0.05 (you can change this within the function). There are other results that maybe could be useful, but at this case I just save the information related to the groups:

```
# 3 Save results
# 3.1 Groups from hsd_test
hsd_groups <- data.frame(
SL = row.names(hsd_test$groups),
groups = hsd_test$groups[,2]
)
# 3.2 Save as a data frame
write.csv(hsd_groups, file = "data/hsd_groups.csv", row.names = FALSE)
```

The `hsd_table`

contains information about the groups treatments. In this column a different letter indicates a significant difference between treatment means at the specified alpha.

`hsd_groups`

SL | groups |
---|---|

SL2 | a |

SL1 | b |

SL3 | bc |

SL4 | c |

Following alphabetical order the letter “a” indicates that SL2 mean is greater than SL1, SL3 and SL4 mean treatments. The interpretation of the remain letters can be made the same way. The combination “bc” in the treatment SL3 indicates that its mean cannot be considered different from SL1 and SL4 means. That’s weird! Remember that we are talking about statistical differences, so you can expect this kind of results in many of your experiments.

## 2.4 Plots

Finally, plots time. For this kind of experiments it is quite common to represent the magnitude of each treatment mean as a bar with its standard deviation as intervals. Anyway, I present other alternatives to represent the data that could be useful.

### 2.4.1 Bar Plot

To make the bar chart I used the next R script:

```
# Means bar plot --------------------------------------------------------
if(!"ggplot2" %in% .packages()) library(ggplot2)
# Global theme
source("analysis/theme.R")
# 1 Import data ----------------------------------------------------------
# 1.1 Summary data
sum_data <- read.csv("data/summary_data.csv")
# 1.2 HSD groups
groups <- read.csv("data/hsd_groups.csv")
# 1.3 Join summary and groups data frames
sum_data <- merge(sum_data, groups, by = "SL")
# 2 Bar chart
bar_plot <- ggplot(sum_data, aes(x = SL, y = MEAN, label = groups)) +
geom_col(width = 0.5, color = "black", fill = "royalblue1") +
geom_errorbar(aes(ymin = MEAN - SD, ymax = MEAN + SD), width = 0.1) +
geom_text(nudge_y = 5, size = 5) +
xlab("Light source type") +
ylab("Chlorophyll (mg/g)")
# 3 Save plot
ggsave(filename = "graphs/bar_plot.jpeg", bar_plot)
```

I just used the summary table made it in the previous section. I joined the groups information to the summary using the function `merge()`

: `sum_data <- merge(sum_data, groups, by = "SL")`

.

You can notice that I used the `ggplot2`

package and their functions. `geom_col()`

is in charge of representing the means as bars, while `geom_errorbar()`

represents the standard deviations as intervals. `geom_text()`

uses the groups information and places it above each bar.

The code `source("analysis/theme.R")`

runs the next script, that sets a global theme used in all my graphs:

```
# Set theme for all graphs ------------------------------------------------
theme_set(
theme_classic() +
theme(axis.text.x = element_text(color = "black", size = 13),
axis.text.y = element_text(color = "black", size = 13),
axis.title = element_text(color = "black", size = 15))
)
```

The bar chart can be seen typing `bar_plot`

and is saved in the graphs folder.

`bar_plot`

### 2.4.2 Scatter Plot

With a scatter graph we can represent each measurement as a point.

```
# Scatter plot -------------------------------------------------------------
if (!"ggplot2" %in% .packages()) library(ggplot2)
# Global theme
source("analysis/theme.R")
# 1 Import data -------------------------------------------------------------
# 1.1 Experimental data
main_data <- read.csv("data/main_data.csv")
# 1.2 Summary data
sum_data <- read.csv("data/summary_data.csv")
# 1.3 HSD groups
groups <- read.csv("data/hsd_groups.csv")
# 1.4 Join groups and summary
groups <- merge(groups, sum_data, by = "SL")
# 2 Point plot -------------------------------------------------------------
scatter_plot <- ggplot(main_data, aes(x = SL, y = CLQ)) +
geom_point() +
stat_summary(fun = mean, geom = "crossbar", width = 0.2, col = "blue") +
geom_text(data = groups,
aes(x = SL, y = MEAN, label = groups), nudge_x = 0.2,
size = 5) +
xlab("Light source type") +
ylab("Chlorophyll (mg/g)")
# 3 Save plot -------------------------------------------------------------
ggsave(filename = "graphs/scatter_plot.jpeg", scatter_plot)
```

I made this plot in a similar way than the bar chart. I used the `ggplot2`

functions `geom_point()`

to represent each measurement as a point, `stat_summary()`

to add a blue bar representing each treatment mean, and `geom_text()`

to put the letter group a little to the right of each mean bar. The plot looks like this:

`scatter_plot`

### 2.4.3 Bar and Scatter Plot

Using `ggplot2`

you can even combine the both previous plots:

```
# Bar with points plot ----------------------------------------------------
if (!"ggplot2" %in% .packages()) library(ggplot2)
# Global theme
source("analysis/theme.R")
# 1 Import data -------------------------------------------------------------
# 1.1 Experimental data
main_data <- read.csv("data/main_data.csv")
# 1.2 Summary data
sum_data <- read.csv("data/summary_data.csv")
# 1.3 HSD groups
groups <- read.csv("data/hsd_groups.csv")
# 1.4 Join groups and summary
sum_data <- merge(groups, sum_data, by = "SL")
# 2 Bar point plot --------------------------------------------------------
bar_scatter_plot <- ggplot(sum_data, aes(x = SL, y = MEAN)) +
geom_col(
width = 0.5, color = "black", fill = "royalblue1"
) +
geom_errorbar(
aes(ymin = MEAN - SD, ymax = MEAN + SD), width = 0.1
) +
geom_text(
aes(label = groups), nudge_y = 6.5, size = 5
) +
geom_point(
data = main_data, aes(x = SL, y = CLQ), color = "red", size = 1.5
) +
xlab("Light source type") +
ylab("Chlorophyll (mg/g)")
# 3 Save plot ---------------------------------------------------------------
ggsave(filename = "graphs/bar_scatter_plot.jpeg", bar_scatter_plot)
```

This code produces the next plot:

`bar_scatter_plot`

Wow! That’s it! If you wish, you are free to clone the repository with the code and results of this R tutorial, including the experimental data simulation:

More than Two Treatments Comparison

Try to reproduce the analysis step by step, modify or improve the code. It’s all yours!

This work is licensed under a Creative Commons Attribution 4.0 International License.