# Experimental Design and Process Optimization with R

*2020-02-10*

# 1 Introduction

The present document is a short and elementary course on the Design of Experiments (DoE) and empirical process optimization with the open-source Software **R**. The course is self-contained and does not assume any preknowledge in statistics or mathematics beyond high school level. Statistical concepts will be introduced on an elementary level and made tangible with R-code and R-graphics based on simulated and real world data.

So, then, what is DoE and why should the reader become familiar with the concepts of DoE?

Very briefly, DoE is the science of varying **many** experimental parameters in a systematic way to gain insight on how to further improve and optimize these parameters. Chapter 2 will show how and why multidimensional DoE techniques are superiour to the classical “one-dimensional” optimization approach. Chapter 6 will demonstrate why and how DoE can be combined with optimization. Finally, the use of DoE and optimization will be practically demonstrated in chapter 7 for improving the performance of a catalytic system.

Historically, Experimental Design started as a branch of statistics in the early years of the 20^{th} century and has meanwhile grown into a mature method with a plethora of applications in the experimental sciences. Consequently, there are many good and comprehensive books available about DoE, some of which we will make frequent reference to in the present text, namely (George E.P. Box, Norman R. Draper 1987), (D.C. Montgomery 2013) and (G.E.P. Box, W.G. Hunter, J.S. Hunter 2005). A more recent text with emphasis on the use of R in conjuction with DoE is (John Lawson 2015). Linear models are comprehensively covered, e.g., by the text book (A. Sen, M. Srivastava 1990). A general, however fairly technical text on linear and nonlinear statistical model building is the excellent book (T. Hastie, R. Tibshirani, J. Friedman 2009). (J.G. Kalbfleisch 1985) is a smooth introduction into statistics, probability and statistical inference.

The present text draws on these books and on many years of experience as a statistical consultant in the chemical industry. Most examples in this course are therefore taken from applications and optimization projects in the chemical sciences.

The primarily intended readers of this document are chemists and engineers entrusted with empirical optimization in research and development. However, the presented methods and concepts are fairly generic and scientist working in other areas such as biology or the medical sciences might benefit from the text.

As to software, R, probably together with Phyton, is the only open-source software which combines the whole spectrum of DoE and optimization with the flexibility of a powerful script language that allows any kind of data pre- and postprocessing within one software environment. That makes, in my opinion, R superior to many commercial GUI based tools which often buy userfriendlyness at the expense of flexibility.

## 1.1 How to install R

The R-software can be downloaded free of charge from the R repository CRAN

An IDE (**I**ntegrated **D**evelopment **E**nvironment) is reqired for smoothly working with R. An IDE allows editing, running and debugging of R code and managing programm in- and output.

In principle any IDE can be used but we recommend R-Studio as the de-facto standard.

The R-introduction at CRAN is a concise introduction into the R-language.

A short R-introduction

## 1.2 Some remarks on how to read the present text

This document is not an introduction into the R language, rather the document follows the philosophy of “learning by doing”. In this spirit the above mentioned text R-introduction is recommended as a first reference together with the present R examples on DoE and optimization. As it is usually easier to modify existing code than writing code from scratch, it is hoped that the R-examples in this course will help learning both R and DoE more rapidly.

The course is divided into seven chapters. There is, however, one stand-alone chapter, chapter 5, which can be skipped by those readers not explicitly dealing with mixture problems. The final chapter 7 is a published, (Siebert M., Krennrich G., Seibicke M., Siegle A.F., Trapp O. 2019), real-world example combining many elements of DoE and optimization for improving the performance of a catalytic system. This application should encourage readers to use these powerful methods for the sake of their own projects.

### Reference

A. Sen, M. Srivastava. 1990. *Regression Analysis, Theory, Methods and Applications*. 1st ed. Springer-Verlag, New York.

D.C. Montgomery. 2013. *Design and Analysis of Experiments*. 8th ed. John Wiley & Sons Inc.

G.E.P. Box, W.G. Hunter, J.S. Hunter. 2005. *Statistics for Experimenters: Design, Innovation, and Discovery*. 2nd ed. John Wiley & Sons, Hoboken.

George E.P. Box, Norman R. Draper. 1987. *Empirical Model-Building and Response Surfaces*. 1st ed. John Wiley & Sons.

J.G. Kalbfleisch. 1985. *Probability and Statistical Inference, Vol 1&2*. 2nd ed. Springer.

John Lawson. 2015. *Design and Analysis of Experiments with R*. 1st ed. Chapman & Hall.

Siebert M., Krennrich G., Seibicke M., Siegle A.F., Trapp O. 2019. “Identifying High-Performance Catalytic Conditions for Carbon Dioxide Reduction to Dimethoxymethane by Multivariate Modelling.” *Chemical Science* 10:45. https://pubs.rsc.org/en/content/articlelanding/2019/sc/c9sc04591k#!divAbstract.

T. Hastie, R. Tibshirani, J. Friedman. 2009. *The Elements of Statistical Learning*. 2nd ed. Springer-Verlag.