A basic R tutorial for carrying out the analysis of results of response surface designs. It also discusses how to generate Box-Behnken and Central Composite designs.

Any model is only an approximation

 

All code and data used in this post are available on GitHub: Response Surface Designs and their Analysis with R

What is a response surface design?

Response surface experimental designs and the analysis of their results are oriented to determine the optimal combination of factors that allow us to obtain the best response within the experimental region. As best response we can refer to a maximum or a minimum depending on our objectives.

It is recommended that response surface designs be made once we have previously determined which factors have a significant effect on the response. This can be done by means of a fractional design.

Types of response surface designs

First-order designs

These are designs that only consider the main (linear) effects of each factor. They can be used in the initial stages, when looking for the experimental region where the curvature is found and possibly a maximum or a minimum.

In this equation Y corresponds to the response, β0 is the global mean or intercept, βi is the coefficient obtained by linear regression for a given factor xi and ε corresponds to the random error. In the summation, k corresponds to the number of factors considered by the experimenter.

Second-order designs

Designs that allow estimation of linear, interaction and quadratic or curvature effects.

Among the designs that allow fitting this type of polynomial to the experimental data are the Box-Behnken and the Central Composite designs.

Generation of surface designs with the rsm package

Generating this type of design is quite straightforward with the rsm package. If you want to see the full functionality of this package click on the following link: rsm package. There is also a document with several examples at the following link: examples with rsm.

Box-Behnken Design

To generate a Box-Behnken design we use the bbd() function:

library(rsm)
## Warning: package 'rsm' was built under R version 4.2.1
bb_design_1 <- bbd(
  k  = 4,            # Number of factors,
  n0 = 3,            # Number of center points,
  block = FALSE,     # Consider blocks or not in the design 
  randomize = FALSE  # Randomize or not the order experiments
)

head(as.data.frame(bb_design_1))
##   run.order std.order x1 x2 x3 x4
## 1         1         1 -1 -1  0  0
## 2         2         2  1 -1  0  0
## 3         3         3 -1  1  0  0
## 4         4         4  1  1  0  0
## 5         5         5  0  0 -1 -1
## 6         6         6  0  0  1 -1

You can also export your design in CSV format:

readr::write_csv(bb_design_1, file = "data/bbd_1.csv")

In the previous design each factor level are coded in the standard way. You can also consider the names and units of each factor:

bb_design_2 <- bbd(
  k  = 4,            # Four factors 
  n0 = 3,            # Three central points 
  block = FALSE,     # No blocks 
  randomize = FALSE, # Not randomized
  coding = list(
    x1 ~ (Temperature - 50) / 25,
    x2 ~ (Power - 210)/ 30,
    x3 ~ (time - 30) / 15,
    x4 ~ (Ethanol - 80) / 20
  )
)

head(bb_design_2)
##   run.order std.order Temperature Power time Ethanol
## 1         1         1          25   180   30      80
## 2         2         2          75   180   30      80
## 3         3         3          25   240   30      80
## 4         4         4          75   240   30      80
## 5         5         5          50   210   15      60
## 6         6         6          50   210   45      60
## 
## Data are stored in coded form using these coding formulas ...
## x1 ~ (Temperature - 50)/25
## x2 ~ (Power - 210)/30
## x3 ~ (time - 30)/15
## x4 ~ (Ethanol - 80)/20

In this example, each factor level can be expressed in standard coding (-1, 0 and +1) with the general formula CodedLevel = (Level - CenterPoint) / StepSize. For example, for Temperature the level 25 corresponds to -1 since CodedLevel = (25 - 50) / 25 = -1. The previous convention is commonly used in textbooks dealing with response surfaces, and it is important if you want to determine the steepest-ascent path, which consist in moving the experimental region searching for the optimal combination of factors. For the analysis example below, I will assume that the experimental region is located near, or contains, the experimental optimum.

Central composite design

Like the Box-Behnken design, the central composite design will not allow us to estimate main, interaction and quadratic effects for each factor. To generate it we use the ccd() function:

cc_design_1 <- ccd(
  basis = 3,         # Number of factors
  n0    = 3,         # Number of central points
  randomize = FALSE # Not randomized
)

readr::write_csv(cc_design_1, file = "data/ccd_1.csv") # Export as CSV

head(as.data.frame(cc_design_1))
##   run.order std.order x1 x2 x3 Block
## 1         1         1 -1 -1 -1     1
## 2         2         2  1 -1 -1     1
## 3         3         3 -1  1 -1     1
## 4         4         4  1  1 -1     1
## 5         5         5 -1 -1  1     1
## 6         6         6  1 -1  1     1

By default ccd() creates two blocks, one for regular experimental runs and another block for “start” points. This can be advantageous, since we can first make sure that there are curvature effects with the first block, and then we can estimate the quadratic terms of the model with the “star” points of the second block.

Analysis of results of a response surface design with the rsm package

For this example, I used the data reported in the academic paper Effects of Ultrasonic-Assisted Extraction on the Yield and the Antioxidative Potential of Bergenia emeiensis Triperpenes. This study sought the best combination of the factors shown in the table below to obtain the highest yield of triterpenes from dried roots (rhizomes) of B. emeiensis.


The table shows the codes that I used for each factor, as well as for the levels (letters in square brackets) of each factor (the numbers -1, 0, +1).

If you are interested, the data for this example can be found in the “data” folder in the GitHub repository of this post.

Import data

The first step is to import your data into R. In this case the experimental data are saved in CSV format:

yield_data <- readr::read_csv("data/yield_data.csv")

head(yield_data)
## # A tibble: 6 × 5
##       A     B     C     D Yield
##   <dbl> <dbl> <dbl> <dbl> <dbl>
## 1   180    70    30    30  224.
## 2   210    80    30    30  227.
## 3   210    90    30    20  172.
## 4   180    80    15    30  209.
## 5   210    80    15    40  201.
## 6   240    70    30    30  221.

Just in case, I’m going to code factor levels using the coded.data() function. This will facilitate the use of coded data in response-surface analysis:

yield_data <- coded.data(
  yield_data,                 # Experimental data
  formulas = list(            # List of coding formulas for each factor
    x1 ~ (A - 210)/30, 
    x2 ~ (B - 80)/10,
    x3 ~ (C - 30)/15,
    x4 ~ (D - 30)/10
  ))

head(yield_data)
##     A  B  C  D Yield
## 1 180 70 30 30 223.7
## 2 210 80 30 30 227.2
## 3 210 90 30 20 171.9
## 4 180 80 15 30 208.8
## 5 210 80 15 40 201.3
## 6 240 70 30 30 220.7
## 
## Data are stored in coded form using these coding formulas ...
## x1 ~ (A - 210)/30
## x2 ~ (B - 80)/10
## x3 ~ (C - 30)/15
## x4 ~ (D - 30)/10

Model adjustment

To fit the second order model to our experimental data we use the rsm() function:

yield_model <- rsm(Yield ~ SO(x1, x2, x3, x4), data = yield_data)

As you can see, to fit this model it is necessary to use the special function SO() (SO from Second Order), which specify to rsm() adjust a complete second order model with our data.

Summary of model results

The summary() function will display a complete summary of the model we fit, including the coefficients, t-tests for each coefficient, coefficients of determination, lack of fit test, stationary (optimal) point, among other useful information:

summary(yield_model)
## 
## Call:
## rsm(formula = Yield ~ SO(x1, x2, x3, x4), data = yield_data)
## 
##             Estimate Std. Error t value  Pr(>|t|)    
## (Intercept) 220.8800     4.6340 47.6652 < 2.2e-16 ***
## x1           -1.4917     2.9912 -0.4987  0.625748    
## x2          -27.0333     2.9912 -9.0375 3.222e-07 ***
## x3           10.0333     2.9912  3.3543  0.004724 ** 
## x4           -6.0917     2.9912 -2.0365  0.061073 .  
## x1:x2         5.8500     5.1810  1.1291  0.277818    
## x1:x3         2.3250     5.1810  0.4488  0.660474    
## x1:x4       -10.3000     5.1810 -1.9880  0.066725 .  
## x2:x3         0.9500     5.1810  0.1834  0.857142    
## x2:x4        -5.4500     5.1810 -1.0519  0.310651    
## x3:x4        -8.0250     5.1810 -1.5489  0.143701    
## x1^2         -5.1275     4.0685 -1.2603  0.228167    
## x2^2        -24.0150     4.0685 -5.9026 3.850e-05 ***
## x3^2         -9.9650     4.0685 -2.4493  0.028082 *  
## x4^2        -11.9525     4.0685 -2.9378  0.010803 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Multiple R-squared:  0.9124, Adjusted R-squared:  0.8248 
## F-statistic: 10.42 on 14 and 14 DF,  p-value: 4.216e-05
## 
## Analysis of Variance Table
## 
## Response: Yield
##                     Df  Sum Sq Mean Sq F value    Pr(>F)
## FO(x1, x2, x3, x4)   4 10449.6 2612.41 24.3310 3.542e-06
## TWI(x1, x2, x3, x4)  6   962.9  160.48  1.4947 0.2501645
## PQ(x1, x2, x3, x4)   4  4244.2 1061.04  9.8822 0.0005164
## Residuals           14  1503.2  107.37                  
## Lack of fit         10   957.9   95.79  0.7028 0.7042543
## Pure error           4   545.2  136.31                  
## 
## Stationary point of response surface:
##         x1         x2         x3         x4 
##  0.2484136 -0.4624262  0.7095327 -0.4946289 
## 
## Stationary point in original units:
##         A         B         C         D 
## 217.45241  75.37574  40.64299  25.05371 
## 
## Eigenanalysis:
## eigen() decomposition
## $values
## [1]  -0.6545221  -9.4572076 -16.1561933 -24.7920770
## 
## $vectors
##          [,1]        [,2]       [,3]        [,4]
## x1  0.7790649  0.50504706  0.3573793 -0.10131842
## x2  0.1636823  0.02705505 -0.1174668  0.97912088
## x3  0.3255063 -0.82446880  0.4623083  0.02382985
## x4 -0.5102074  0.25386352  0.8029649  0.17461103

The above summary can be easily exported in TXT format with the capture.output() function:

capture.output(
  summary(yield_model),           # Object to be exported
  file = "data/model_summary.txt" # File name 
)

To evaluate the quality of the model, or rather how well the model fits the experimental data, we usually refer to the adjusted coefficient of determination and to the statistical test of lack of fit whose criteria can be the following:

  • Adjusted coefficient of determination. Values close to 1 are preferred.
  • Lack of fit test. A p-value greater than the significance level established by the experimenter will allow us to conclude that the model is “adequate”.

Both criteria will give us the certainty that we will make good predictions of the response with the fitted model in the experimental region. That is, we can have more confidence in the optimal response predicted by the model.

Graphical representation of the model

Representing the model in pseudo-3D or with contour plots is simple when there are a response and one or two factors, but with more than two factors we face the problem of having to plot more than three dimensions. In this example, this can be approximated by keeping two factors fixed at one of its levels and plotting the response as a function of the remaining two factors. The contour() function can be used to make contour plots following the previous reasoning:

par(mfrow = c(2,3))       # 2 x 3 pictures on one plot
contour(
  yield_model,            # Our model
  ~ x1 + x2 + x3 + x4,    # A formula to obtain the 6 possible graphs 
  image = TRUE,           # If image = TRUE, apply color to each contour
  )

If you are using RStudio you can easily export any type of graph or figure, but it is also possible to do the above directly with the following commands, in this case save the above graphs in jpeg format:

jpeg(
  filename = "graphs/model_contours.jpeg", width = 30, height = 20, 
  units = "cm", res = 400, quality = 100
  )
par(mfrow = c(2,3))       # 2 x 3 pictures on one plot
contour(
  yield_model,            # Our model
  ~ x1 + x2 + x3 + x4,    # A formula to obtain the 6 possible graphs 
  image = TRUE,           # If image = TRUE, apply color to each contour
  )
dev.off()

It is also possible to make “3D” graphs with the persp() function:

par(mfrow = c(2,3))       # 2 x 3 pictures on one plot
persp(
  yield_model,            # Our model 
  ~ x1 + x2 + x3 + x4,    # A formula to obtain the 6 possible graphs
  col = topo.colors(100), # Color palette
  contours = "colors"     # Include contours with the same color palette
  ) 

This graphic can also be exported in jpeg format:

jpeg(
  filename = "graphs/model_3D.jpeg", width = 30, height = 20, 
  units = "cm", res = 400, quality = 100
  )
par(mfrow = c(2,3))       # 2 x 3 pictures on one plot
persp(
  yield_model,            # Our model 
  ~ x1 + x2 + x3 + x4,    # A formula to obtain the 6 possible graphs
  col = topo.colors(100), # Color palette
  contours = "colors"     # Include contours with the same color palette
  ) 
dev.off()

Both graphs are stored in the graphs folder in the repository of this post. For more examples and customization options, please click on the following link: Surface Plots in the rsm Package.

Optimal experimental point

The optimal experimental point, the combination of factors that will result in the best response, can be found in the summary of our model:

opt_point <- summary(yield_model)$canonical$xs
opt_point
##         x1         x2         x3         x4 
##  0.2484136 -0.4624262  0.7095327 -0.4946289

That in the units of each factor corresponds to:

op_point_ru <- code2val(
  opt_point,                     # Optimal point in coded units
  codings = codings(yield_data)  # Formulas to convert to factor units
)
op_point_ru
##         A         B         C         D 
## 217.45241  75.37574  40.64299  25.05371

The optimum point with coded units can be used directly to predict the best response, for this we use the predict() function:

opt_point_df <- data.frame(  # predict() needs a data frame with the points 
  x1 = opt_point[1],         # to be predicted 
  x2 = opt_point[2],
  x3 = opt_point[3],
  x4 = opt_point[4]
  )

best_response <- predict(
  yield_model,             # Our model
  opt_point_df             # Data frame with points to be predicted 
  )
names(best_response) <- "Best yield" # A nice name to our best point
best_response
## Best yield 
##   232.0112

Finally, as the good experimenters that we are, we will only have to evaluate the optimal combination of factors to make sure that the experimental result in the response is close to the one predicted by the model.

Great, that’s it for this post. I hope you find it useful. Thank you very much for visiting this blog!

Juan Pablo Carreón Hidalgo 🤓


https://github.com/jpch26

 

CC BY 4.0

This work is licensed under a Creative Commons Attribution 4.0 International License.

CC BY 4.0