Using R for design and analysis of results for experiments with mixtures.

“Mixtures are absolutely everywhere you look. Anything you can combine is a mixture.”
(chem4kids.com)

All the code and data in this post are available in the repository: Design of Experiments with Mixtures and their Analysis with R

# What are experimental designs with mixtures?

These are designs aimed at determining the effect of the proportion of different components of a mixture on one or more response variables.

We must emphasize that we are referring to the proportions of the different components in the mixture and not to their absolute amount. That is, it is the proportion that determines the effect.

This type of design has application in the formulation of many products such as beverages, foods, fuels, paints, etc.

In an experimental design of mixtures, the sum of the proportions of each component is equal to 1: And the limits of the proportion of each component must be between 0 and 1: In a practical problem, calculating the proportions of each component is straightforward. For example, suppose that the sum of three components in a soda equals 2.5 g, and the respective amounts of each component are 1, 1, and 0.5 g. The ratio for the first and second ingredient is 1/2.5 = 0.4, and the ratio for the third ingredient is 0.5/2.5 = 0.2. Thus 0.4 + 0.4 + 0.2 = 1.

An experimental design of mixtures will help us determine the proportions of each component to produce the best flavor or to reduce some undesirable physical property in the liquid, for example.

# Types of mixture designs and their generation in R

In this post I will focus on two types of mix designs: simplex-lattice and simplex-centroid. The generation of these designs is simple with the mixexp package.

## Simplex-lattice design

The simplex-lattice design considers q components and allows fitting a model of order m to the experimental data. To generate a design with 3 components of order 3 we use the function `SLD()` as follows:

``````library(mixexp)

# Make a simplex-lattice design
simplex_lattice_desing <- SLD(
fac = 3, # Number of components (factors)
lev = 2  # Number of levels besides 0
)

# Display the design
simplex_lattice_desing``````
``````##    x1  x2  x3
## 1 1.0 0.0 0.0
## 2 0.5 0.5 0.0
## 3 0.0 1.0 0.0
## 4 0.5 0.0 0.5
## 5 0.0 0.5 0.5
## 6 0.0 0.0 1.0``````

It should be noted that it is not necessary to specify the levels (proportions) of each component, as these are automatically determined by the ratio: If the proportions of each line are added together, the result will be equal to 1.
In addition, for three components it is possible to use the `DesignPoints()` function to visualize the experimental region of the experiment:

``DesignPoints(simplex_lattice_desing)`` In this figure, the three vertices correspond to pure mixtures (formed by a single ingredient), the three sides or edges represent binary mixtures that have only two of the three components. The interior points of the triangle represent the ternary mixtures in which the three ingredients are different from zero.

Finally, the design can be exported to our working folder with the function `write.csv()`:

``````write.csv(
simplex_lattice_desing, file = "data/simplex_lattice_desing.csv",
row.names = FALSE
)``````

## Simplex-centroid design

If predictions are to be made within the experimental region, it is important to include centroid points within the experimental region. The simplex-centroid design includes all intermediate mixtures between components. The `SCD()` function is used to generate it:

``````simplex_centroid_design <- SCD(
fac = 3  # Number of components (factors)
)

# Export the design
write.csv(
simplex_centroid_design, file = "data/simplex_centroid_design.csv",
row.names = FALSE
)

# Display the design
simplex_centroid_design``````
``````##          x1        x2        x3
## 1 1.0000000 0.0000000 0.0000000
## 2 0.0000000 1.0000000 0.0000000
## 3 0.0000000 0.0000000 1.0000000
## 4 0.5000000 0.5000000 0.0000000
## 5 0.5000000 0.0000000 0.5000000
## 6 0.0000000 0.5000000 0.5000000
## 7 0.3333333 0.3333333 0.3333333``````

By visualizing the experimental region with three components, it becomes much clearer what we mean by intermediate mixtures:

``DesignPoints(simplex_centroid_design)`` ## Mixing designs with component constraints

It is normal that due to technical or economic constraints, for example, the proportion of one or more components is restricted to a shape limit: It is possible to generate designs considering the constraints for each component with the `Xvert()` function:

``````simplex_restrictions <- Xvert(
nfac = 3,                       # Number of components (maximum 12)
uc   = c(0.612, 0.765, 0.109),  # Upper constrains
lc   = c(0.153, 0.306, 0.053),  # Lower constrains
plot = FALSE,                   # Automatic plot when TRUE and nfac = 3
ndm  = 1                        # Order of centroids included
)

# Export the design
write.csv(
simplex_restrictions, file = "data/simplex_restrictions.csv",
row.names = FALSE
)

# Display the design
simplex_restrictions``````
``````##           x1        x2         x3 dimen
## 1  0.6120000 0.3060000 0.08200002     0
## 2  0.1530000 0.7649999 0.08200002     0
## 3  0.6120000 0.3350000 0.05300000     0
## 4  0.1820000 0.7649999 0.05300000     0
## 5  0.5850000 0.3060000 0.10899998     0
## 6  0.1530000 0.7380000 0.10899998     0
## 7  0.1530000 0.7515000 0.09550000     1
## 8  0.6120000 0.3205000 0.06750001     1
## 9  0.5985000 0.3060000 0.09550000     1
## 10 0.1675000 0.7649999 0.06750001     1
## 11 0.3970000 0.5500000 0.05300000     1
## 12 0.3690000 0.5220000 0.10899998     1
## 13 0.3828333 0.5358333 0.08133333     2``````

It is also possible to visualize the experimental subregion:

``DesignPoints(simplex_restrictions[, 1:3])`` # Analysis of the results of a mix design

For the example analysis, I will use the data published in Performance of reduced fat-reduced salt fermented sausage with added microcrystalline cellulose, resistant starch and oat fiber using the simplex design. In this study, the effect of the proportion of three ingredients on different characteristics of fermented sausages was determined. In this case I will only focus the analysis on one of the response variables (hardness).

## Data import

As usual, the first step in the analysis is to import our data into R:

``````data_sausage <- read.csv("data/data_sausage.csv")

# Display the data
data_sausage``````
``````##     MCC   RS   OF Hardness
## 1  1.00 0.00 0.00    52.49
## 2  0.00 1.00 0.00    48.17
## 3  0.00 0.00 1.00    64.02
## 4  0.50 0.50 0.00    41.82
## 5  0.50 0.00 0.50    39.65
## 6  0.00 0.50 0.50    56.34
## 7  0.33 0.33 0.34    42.26
## 8  0.33 0.33 0.34    42.80
## 9  0.66 0.17 0.17    50.79
## 10 0.17 0.66 0.17    50.48
## 11 0.17 0.17 0.66    48.48``````

You can use the `lm()` function to adjust a complete model or, for the same purpose, the `MixModel()` function of the `mixexp` package:

``````# with lm() function
# -1 indicates to skip the intercept
hardness_model_lm <- lm(
Hardness ~ -1 + MCC + RS + OF + MCC:RS + RS:OF + MCC:OF + MCC:RS:OF, # Model
data = data_sausage
)

# with MixModel() function
hardness_model_mm <- MixModel(
frame = data_sausage,                # Data
response = "Hardness",               # Response name
mixcomps = c("MCC", "RS", "OF"),     # Factor names
model = 4                            # Model to be fit, type ?MixModel
)``````

Note that these models did not include the mean or intercept. Due to the restriction of the sum of components is equal to 1, the parameters in the model are not unique. Basically, the model without mean eliminates the problem of dependence between the coefficients. As we will see later, the interpretation of each coefficient and the hypothesis testing related to each must be done in a special way for this type of design.

## Model coefficients, their interpretation and determination coefficients

The `summary()` function will display a complete report with the coefficients of the model we previously selected, as well as the coefficients of determination:

``summary(hardness_model_lm)``
``````##
## Call:
## lm(formula = Hardness ~ -1 + MCC + RS + OF + MCC:RS + RS:OF +
##     MCC:OF + MCC:RS:OF, data = data_sausage)
##
## Residuals:
##       1       2       3       4       5       6       7       8       9      10
## -1.8143 -0.6241  1.0879 -2.5404 -0.7340  0.4923 -2.4654 -1.9254  7.0439  3.3634
##      11
## -1.8839
##
## Coefficients:
##            Estimate Std. Error t value Pr(>|t|)
## MCC        54.30432    4.49807  12.073 0.000270 ***
## RS         48.79405    4.49807  10.848 0.000410 ***
## OF         62.93214    4.49856  13.989 0.000151 ***
## MCC:RS    -28.75529   22.58741  -1.273 0.271951
## RS:OF      -0.06174   22.59683  -0.003 0.997951
## MCC:OF    -72.93694   22.59683  -3.228 0.032044 *
## MCC:RS:OF  16.95888  131.40193   0.129 0.903539
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.647 on 4 degrees of freedom
## Multiple R-squared:  0.9968, Adjusted R-squared:  0.9911
## F-statistic: 176.5 on 7 and 4 DF,  p-value: 8.155e-05``````

In general, the coefficients in this type of model are interpreted as follows:

• Coefficients of individual components. They do not measure the overall effect of component xi, but only estimate the value of the response at the vertex of the simplex. If these coefficients are not significant, it does not mean that the effect of the individual component is not important, so hypothesis tests on them are usually ignored.
• Coefficients of double interactions. If the sign of this coefficient is positive, there is synergy between the components; if it is negative, there is antagonism between them.
• Triple interaction coefficients. Quantifies the effect of the ternary mixture within the simplex.

The result report can be easily saved with the `capture.output()` function:

``````capture.output(
summary(hardness_model_lm),
file = "data/sum_hardness_model.txt"
)``````

## step() function to improve the model

In order to improve the coefficients of determination or simplify the model, sometimes non-significant terms are eliminated. This can be done somewhat subjectively by trial and error by eliminating one or more terms and then comparing with the full model. R also offers a systematic way to do the above using the `step()` function, this function uses the Akaike information criterion iteratively to simplify and/or improve the coefficients of determination of the model.

The function can display a large number of results depending on the number of iterations it makes to simplify the model, so in this example I will directly save the results in a text file:

``````capture.output(
step(hardness_model_lm),
file = "data/step_results.txt"
)``````

Subsequently, we only need to adjust the simplified model:

``````hardness_model_simp <- lm(
Hardness ~ -1 + MCC + RS + OF + MCC:RS + MCC:OF,
data = data_sausage
)``````

By displaying a summary of results it can be seen that there is not a big difference between the coefficients of determination of this model and the full model. However, the smaller number of terms may have some practical advantage depending on the problem:

``````# Export summary
capture.output(
summary(hardness_model_simp),
file = "data/sum_hardness_simplified_model.txt"
)

summary(hardness_model_simp)``````
``````##
## Call:
## lm(formula = Hardness ~ -1 + MCC + RS + OF + MCC:RS + MCC:OF,
##     data = data_sausage)
##
## Residuals:
##     Min      1Q  Median      3Q     Max
## -2.8704 -1.7706 -1.0711  0.7132  7.0898
##
## Coefficients:
##        Estimate Std. Error t value Pr(>|t|)
## MCC      54.228      3.655  14.836 5.90e-06 ***
## RS       48.862      3.275  14.921 5.70e-06 ***
## OF       63.002      3.275  19.240 1.28e-06 ***
## MCC:RS  -27.419     16.567  -1.655  0.14899
## MCC:OF  -71.575     16.506  -4.336  0.00489 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.804 on 6 degrees of freedom
## Multiple R-squared:  0.9968, Adjusted R-squared:  0.9941
## F-statistic: 368.8 on 5 and 6 DF,  p-value: 2.231e-07``````

## Lack-of-fit test

Another way to evaluate the quality of the model fit, if there is more than one repetition for any of the treatments, is by means of a lack-of-fit test. This can be done directly with the `pureErrorAnova()` function of the `alr3` package:

``````library(alr3)

lof_complete_hardness_model <- pureErrorAnova(hardness_model_lm)

# Export results
write.csv(
lof_complete_hardness_model,
file = "data/lof_complete_hardness_model.csv", na = ""
)

# Display ANOVA
lof_complete_hardness_model``````
``````## Analysis of Variance Table
##
## Response: Hardness
##              Df  Sum Sq Mean Sq    F value   Pr(>F)
## MCC           1 13323.1 13323.1 91379.4260 0.002106 **
## RS            1  7231.4  7231.4 49597.9877 0.002859 **
## OF            1  5804.0  5804.0 39807.9118 0.003191 **
## MCC:RS        1    47.9    47.9   328.4924 0.035090 *
## RS:OF         1     0.1     0.1     0.9393 0.509959
## MCC:OF        1   272.0   272.0  1865.5601 0.014737 *
## MCC:RS:OF     1     0.4     0.4     2.4666 0.360954
## Residuals     4    86.4    21.6
##  Lack of fit  3    86.2    28.7   197.1118 0.052300 .
##  Pure Error   1     0.1     0.1
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1``````

For the simplified model:

``````lof_simp_hardness_model <- pureErrorAnova(hardness_model_simp)

# Export results
write.csv(
lof_simp_hardness_model,
file = "data/lof_simp_hardness_model.csv", na = ""
)

# Display ANOVA
lof_simp_hardness_model``````
``````## Analysis of Variance Table
##
## Response: Hardness
##              Df  Sum Sq Mean Sq  F value   Pr(>F)
## MCC           1 13323.1 13323.1 91379.43 0.002106 **
## RS            1  7231.4  7231.4 49597.99 0.002859 **
## OF            1  5804.0  5804.0 39807.91 0.003191 **
## MCC:RS        1    47.9    47.9   328.49 0.035090 *
## MCC:OF        1   272.1   272.1  1865.93 0.014735 *
## Residuals     6    86.8    14.5
##  Lack of fit  5    86.7    17.3   118.87 0.069517 .
##  Pure Error   1     0.1     0.1
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1``````

In this test, if the p-value obtained for Lack of fit is greater than 0.05, or at the significance level established by the experimenter, it can be concluded that the model fits the data adequately. Note how with the full model we came close to rejecting the hypothesis of lack of fit, while with the simplified model the situation improved somewhat.

## Visualization of the simplified model in two dimensions

It is possible to make a contour plot with the fitted model, only for the case of three components in the mixture:

``````## For simplified model
ModelPlot(
hardness_model_simp,                                  # Our model
dimensions = list(x1 = "MCC", x2 = "RS", x3 = "OF"),  # Variable names
contour = TRUE,                                       # Add contour lines
fill = TRUE,                                          # Add color
axislabs = c("MCC", "RS", "OF"),                      # Axis labels
cornerlabs = c("MCC", "RS", "OF")                     # Corner labels
)`````` The `ModelPlot()` function is also included in the `mixexp` package.

The graph can be exported in png format, for example, as follows:

``````png("graphs/contour_hardnees_model.png",
units = "cm", width = 15, height = 10, res = 150)
ModelPlot(
hardness_model_simp,                                  # Our model
dimensions = list(x1 = "MCC", x2 = "RS", x3 = "OF"),  # Variable names
contour = TRUE,                                       # Add contour lines
fill = TRUE,                                          # Add color
axislabs = c("MCC", "RS", "OF"),                      # Axis labels
cornerlabs = c("MCC", "RS", "OF")                     # Corner labels
)
dev.off()``````

## Mixture effect plot

Another way to plot the results is by using an effect plot for the components of the mixture. This two-dimensional plot can be useful if you have more than three components in the mixture. To do this we can use the `ModelEff()` function included in `mixexp`:

``````## For complete model
ModelEff(
nfac = 3,                   # Number of components (factors)
mod  = 4,                   # Type of model, type ?MixModel for more information
ufunc = hardness_model_mm   # Model fitted by MixModel() function
)`````` `ModelEff()` displays the components in the same order as specified in the fitted model, so x1 corresponds to MCC, x2 corresponds to RS and x3 corresponds to OF. This plot starts with a reference mixture (usually the center of the experimental region) and shows how the response changes as one of the components increases or decreases in the mixture; when one of the components changes, the rest increase or decrease proportionally. The disadvantage of `ModelEff()` is that only complete, not simplified, models can be used to make the plot.

The effect graph can be exported in the same way as the contour graph:

``````png("graphs/effects_hardnees_model.png",
units = "cm", width = 15, height = 10, res = 150)
ModelEff(
nfac = 3,                   # Number of components (factors)
mod  = 4,                   # Type of model, type ?MixModel for more information
ufunc = hardness_model_mm   # Model fitted by MixModel() function
)
dev.off()``````

If the reader would like to consult more examples of analysis with the `mixexp` package, please check the document at the following link: Mixture Experiments in R Using mixexp.

Very good! That’s all for this post, thank you very much for visiting this blog.

Juan Pablo Carreón Hidalgo 🤓  