5.2 Notes by case study

5.2.1 Case 1

A simple comparative experiment

  • This section is “introductory” in the sense that it introduces Goos and Jones’ approach to experimental design. It mentions a lot of concepts that may not be familiar; don’t worry about this, just add them to your “what is that?” list.
  • Also take note of things that do look familiar – from earlier in this course, or from your previous stats courses. If you make a habit of doing this, it may help you feel more on top of things when GJ start throwing around a bunch of weird technical terms.

5.2.2 Case 2

An optimal screening experiment

  • As in chapter 1, there may be terms in here that are not familiar to you. Add them to your “what?” list! If you’re not sure whether you’ve seen a term or concept before, that’s a great topic for a Topic Conversation discussion! Or you can ask about it in pre-class questions.
  • 2.2: Your first consulting skit! Pay attention to the technical stuff here (what sounds familiar? what should go on the “what?” list?), but also to the way Peter, Brad, Bas, and Dr. Zheng go back and forth, explaining the research question and the design’s features to each other.
    • 2.2.2: Do not panic about completely understanding this calculation of bias; we’ll talk about this much more as we go on. For now, see what you can get out of it. What is the basic idea here, leaving aside the calculations?
  • 2.3.1: Introduces the matrix form of the regression equation. Assuming you feel good about regression and ANOVA, the regression equations for \(Y_i\) (a single observation) should look familiar. What may or may not be new is the idea of using matrices to represent multiple observations.
    • In this format, each row of \(\mathbf{Y}\) and \(\mathbf{X}\) corresponds to a different case/observation/run. What does each column correspond to? What goes in each cell?
    • Note that in this context, the factors under discussion are considered to be quantitative with linear effects. These \(\beta_i\) coefficients do not indicate the effect of a particular level of a (categorical) factor, but the effect of a factor overall (analogous to its slope).
    • I don’t intend to do Serious Business with matrices in this course, but we will be adding and subtracting them, and talking about whether they are invertible. I can post supplementary material on this if you like :)
  • 2.3.2: Shows how to add two-way interactions to the equation. Hopefully review, except for the matrix bit.
  • 2.3.3: Previously, when working with quantitative factors measured at two levels, we’ve coded the levels as “+” and “-”… which we could really think of as “\(\pm 1\)”. This is a bit more detail/justification on that process.
  • 2.3.4: You are welcome to take GJ’s word for it about the matrix form of the least-squares estimator (and indeed all the formulas here). What you should pay most attention to here is the variance-covariance matrix of the vector of coefficient estimates: \(\widehat{\mathbf{\beta}}\). What does each element of this matrix tell you?
  • 2.3.5: This should also be something you have seen before. Revisit our earlier discussions of inference (as well as previous-course notes on inference for regression \(\beta\)’s) if necessary.
  • 2.3.6: Focus here on what the VIF means/describes, not on how it is calculated.
  • 2.3.7: Introduces partial and complete aliasing, which are a common price you pay for running a smaller experiment. Again, do not freak out too much about the derivation/calculation of the alias matrix. Focus on what aliasing means.
  • 2.3.8: The heart of the matter! The D-optimality criterion (and its cousins) will be the guiding principle for most of GJ’s design choices. (You might have worked this out from the book title.)
    • Think about this criterion both in terms of the model matrix and conceptually. The second-to-last paragraph on p. 34 is useful here.
  • 2.3.9: Focus on what the design matrix looks like: we have actually seen this before! Skim the subsection on the coordinate exchange algorithm if you feel like it, just to see how it works, but you’ll never do this by hand.
  • 2.3.10: Returns to the example. Again, don’t get too hung up on how they calculated these exact numbers (R does that). The key point here is designing the experiment to match the model you’ll need.
  • 2.3.11: These are important design principles that apply to both classical and optimal approaches. We’ve given these the occasional shout-out previously, but here they are all nicely laid out with official names.
  • 2.4: The “Background reading” sections of each case study don’t cover material in detail, but they give you a sense of other related work that people are using. I recommend keeping notes on any extensions/variations/related topics from these sections that interest you – you may want to study them in your “new topic” project later!

5.2.3 Case 3

Adding runs to a screening experiment

  • The basic process showcased here (how to extend an existing experiment optimally) is not too involved, and the black box section is short. But the case study also demonstrates or revisits some other important ideas, like sequential experimentation, model selection criteria, VIFs, the interpretation of factor coding (p. 58), and so on. If these concepts are unfamiliar or unclear, take a few minutes to look back over your earlier notes, or discuss with a classmate. (If nobody’s encountered it before, don’t forget to add it to your “what?” list!)
  • 3.2: Case study time! Exciting bonus: this is a sequel with the same cast of characters as chapter 2. I really feel like we’re getting to know Dr. Zheng. Content-wise, some points of particular interest include:
    • Blocking shows up in an interesting way here: accounting for the fact that you’re doing some runs at a different time. You don’t know what may have changed between the initial runs and the follow-up phase, but you can throw in a factor for it and see if there is indeed a difference.
    • Peter’s point at the bottom of p. 49 is worth noting: although there were many possible models, they mostly agreed on which factor settings would maximize yield. This is an example of keeping the research goal in mind – Dr. Zheng’s goal wasn’t really to understand the relationship between the factors and the response, but rather, to find the best factor settings to maximize the response.
    • You may or may not have encountered AICc before. As Peter notes, lower values are better. It’s a way of comparing models – sort of like \(R^2\), except that \(R^2\) favors larger models, which isn’t always a good thing.
      • If you have encountered AIC but not AICc: AICc is “AIC corrected,” a version of AIC that’s adjusted to work better with small sample sizes.
      • If you have encountered BIC: AICc is a little less “harsh” than BIC – it’s more likely to suggest models with slightly more terms than BIC is.
    • The “foldover” approach is briefly described in the “black box” section; it’s also discussed (with more detail) in BHH section 6.8.
      • The idea is to repeat the experiment, except that first you pick a factor and switch its setting in each run – whenever it was low before, now it’s high, and so on. This has the effect that your chosen factor is no longer aliased with anything – can you see why?
      • Of course, the problem with foldovers is that they’re expensive! And, as Peter points out, they can “de-alias” main effects and some two-way interactions, but won’t necessarily de-alias all the two-way interactions from each other, which is Peter and Brad’s goal here.
    • Peter’s line on p. 53 (“That is an interesting issue…”) revisits the idea of D-optimality.
    • p. 54 has a rather nice explanation of statistical power – focus in here if that concept has gotten dusty!
    • Subsection 3.2.2 is mostly (apart from the chitchat) an explanation of how to look at fitted models – interpreting signs of coefficients, checking significance, dropping things that aren’t significant, etc. This process should not be unfamiliar to you, but this is a nice example of how to talk through it.
  • 3.3.1: stays focused on the chapter’s example, but shows the mathematical notation.
    • You should be able to explain the model matrix \(X^*\) – what is each row, column, and cell?
      • How do we create the matrix \(X_1\) from \(X^*\)? (Why are there so many more columns now? How do we know what to put in each cell?)
    • Return to pp. 24-25 if the paragraph at the bottom of p. 60 doesn’t make sense. Remember that estimating parameters independently means you need their variance-covariance matrix to be diagonal – no covariance between different parameters’ estimates. Because that matrix is based on \((X'X)^{-1}\), it will be diagonal if \((X'X)\) is diagonal.
    • Don’t freak out too much about the matrix computations here. Focus on what is happening conceptually.
      • There’s one set of runs in the original experiment (\(X_1\)) and now a set of additional runs (\(X_2\)). (Find their design matrices in the textbook!)
      • Each set of runs provides information, but our goal is to choose \(X_2\) to maximize the total information we’ll get when we look at both sets of runs together.
      • Because we already have certain information in \(X_1\), maximizing the total information isn’t the same as just maximizing the information contained in \(X_2\). Information in \(X_2\) that’s redundant with information in \(X_1\) wouldn’t help us – we want to focus particularly on things/effects that \(X_1\) did not tell us about. That’s the point of the fancy matrix algebra on p. 63.
    • There’s a typo on p. 64 (the information matrix in equation 3.14 isn’t \(X\), it’s \(X'X\)).
    • Table 3.10 returns to the idea of VIFs. Remember that these are relative variances. \(Var(b_{Butanol})\) isn’t 0.056 – it’s \(0.056\sigma_{\varepsilon}^2\).
  • 3.3.2: Skim this if you want to; but, as before, you’ll never do this algorithm yourself.
  • 3.3.3: This little section discusses foldover designs, which are the classical way of augmenting experiments (i.e., adding runs to get more information or de-alias effects from each other). BHH section 6.8 gives more detail if you need it. The takeaway from GJ’s discussion specifically is why a foldover design wouldn’t have worked in this situation.
  • 3.4: Some extra reading suggestions – don’t do them now, but keep this section in mind if you decide to do a related topic for your project 3 later on.
  • 3.5: This subsection is six sentences long and basically recaps everything in chapter 1 of BHH, so if you never bothered to read that, here you go.

5.2.4 Case 4

A response surface design with a categorical factor

  • In this case study, we actually start treating quantitative factors as quantitative (excitement!). You’ll see that the notation here is much closer to regression equations than the ANOVA group-means equations we’ve seen in BHH.
  • This chapter has a combination of new ideas and revisiting things we’ve seen before. This is a particularly good moment to try and tie in things that sound familiar to previous lectures/material, either in GJ or elsewhere.
  • 4.1: Reminder that these intro sections can be really helpful, especially when GJ introduces a whole bunch of vocab or topics that are new to you; this list helps you organize the main points of the reading. You might also try reading the “Summary” section (4.5) first, and then rereading it after you go through the rest of the chapter.
  • 4.2: A new case study! Note that this one has substantially different goals from the experiment in chapter 2 (and in many of the BHH/lecture examples as well). In those “screening experiments,” the goal was to get a sense of which factors were important out of a large set of possibilities. Here, we already know which factors matter (all of these); the question is just how they interact and what exactly they do to the response.
    • 4.2.1: Introduces the parameters of the problem.
      • Note the double goal here: we want the response to be somewhat robust to supplier (not too different for different suppliers), but we also have a desired value for the response. Our job is to use the experiment to try and figure out what factor settings can help accomplish this dual goal.
      • Peter’s question about hard-to-change factor levels will come up later….
      • Box-Behnken and central composite designs are classical techniques for dealing with quantitative predictors that may have a nonlinear relationship to the response. For now, add them to your “what is that?” list and move on.
      • Note the exchange at the top of p. 72. Sometimes you do have some flexibility in budget or what you can do… if you can present arguments for it!
      • Take some time with equation 4.1 and make sure you see what all the terms are.
      • Brad’s comment between equations 4.2 and 4.3 is really interesting! In this context, you are hoping there is an interaction effect between supplier and some other predictor(s). If there is no such interaction, then the effect of supplier is constant, and no matter what you do, you can’t get rid of the difference between suppliers. See also the interaction plots on p. 80/81 to help visualize this.
      • Follow along with the degrees-of-freedom calculation on p. 73 – this should be familiar reasoning!
      • The bottom of p. 73 introduces a new idea: a different criterion for what constitutes an “optimal” design. Pay especial attention to how D-optimality and I-optimality are different – they have different mathematical goals. The goals of the research problem help suggest which optimality criterion makes more sense!
      • Note Brad’s comment at the end of p. 75. Even Goos and Jones, who are professional optimal design fanatics, have to be able to talk in terms of classical designs.
      • Table 4.4: Review what these “relative variances” are relative to. We multiply these by \(\sigma^2_{\varepsilon}\) to get the actual variances of the coefficient estimates, \(Var(\hat{\beta}_{whatever})\) (see p. 25). Without having done the experiment, we don’t have an estimate of \(\sigma^2_{\varepsilon}\) so we can’t calculate the actual value of \(Var(\hat{\beta}_{whatever})\), but we can talk about how it would compare to \(\sigma^2_{\varepsilon}\).
      • Fraction of Design Space plots are another new thing, and very helpful – practice reading this plot!
    • 4.2.2: Equation time!
      • Note Brad’s first step here, which isn’t even shown fully: throwing out effects that aren’t statistically significant or practically significant. Being able to do this initial significance testing is a benefit of having had a few extra degrees of freedom to estimate error!
      • The bottom of p. 80 discusses extrapolation. Because we’re now treating the factors as “really” quantitative, we can talk about what we think would happen for values of those predictors that we didn’t observe in the experiment…but we have to be cautious about it.
      • You’ve seen interaction plots like these before – make sure you can interpret them.
      • Peter makes an interesting throwaway comment at the top of p. 81. Sometimes it’s a good idea not to use the maximum possible number of runs in an experiment, so that you have budget left to do any necessary follow-ups.
      • Match up figure 4.4 with the original research goals: average peel strength around 4.5, and minimal difference in peel strength between suppliers.
  • 4.3: As usual, more details and technicalities on the concepts that came up in the case study.
    • 4.3.1: Authors sure do love showing you how many different shapes quadratic relationships can capture.
    • 4.3.2: The “collinearity” issue described here is actually something that’s been in play for a while now, although we have been semi-ignoring it. When you have a categorical factor with multiple levels and also an intercept term, you have confounding. In our ANOVA work, we’ve often used fix #3: have a term for the intercept and each factor level, but constrain the factor-level-adjustment coefficients to sum to zero. Here, they go with fix #1 (designate one level of the factor as the “default”), which you probably did back in previous stats courses.
      • There’s a similar thing in play when we look at the interaction between supplier and a quantitative variable. Again, they pick one level of the factor (one supplier) to be the default/baseline, and only have “adjustment” coefficients for the other suppliers’ interactions.
    • 4.3.3: More matrix excitement, but the key concepts are the same as before.
      • Note especially the line about D-efficiency being useful for comparing designs.
    • 4.3.4: More matrix/vector stuff here. If you’ve taken Stat 340 this probably looks familiar; if you haven’t, it probably doesn’t. Don’t worry too much if you are in the latter group :)
      • The idea that the relative variance of prediction is \(var(\hat{Y}| \mathbf{x})/\sigma^2_{\varepsilon}\) should make sense even if you don’t know the matrix stuff.
      • The takeaway from the matrix definition is that the prediction variance depends on (1) the overall error variance \(\sigma^2_{\varepsilon}\); (2) where in the design space you’re trying to predict; and (3) what information you already have (that’s where the information matrix comes in). If you are trying to predict at some combination of factor settings that’s a long way from what you observed in the experiment, your prediction will be very uncertain!
    • 4.3.5: Read the first paragraph. Then take GJ’s word for it and skip ahead to the part after the “Attachment” box.
      • We won’t really work with G-optimal designs in this course, but it’s interesting to see an example of yet another optimality criterion.
    • 4.3.6: In which a term means exactly what you’d think it means.
    • 4.3.7: Skim this section as a preview of things we’ll think about in the future. This ties into our ongoing assumption of independent errors.
      • The last paragraph mentions using a blocking factor to account for possible changes in conditions over time. GJ have already done a limited version of this with the augmentation experiment in chapter 3; we’ll work more with blocking factors later.
    • 4.3.8: “Design region” is really useful vocab.
      • Note the mention of setting factors at “their extreme levels.” Because each factor gets its own version of “extreme”, which is then scaled down to \(\pm1\), we can think of all the factors as operating on the same scale – that’s how we get away with spotting factors without using inference.
  • 4.4: As usual, refer to this if you want to explore one of the topics more fully, but don’t worry about it otherwise.
  • 4.5: A nice recap of the highlights of the chapter.

5.2.5 Case 5

A response surface design in an irregularly shaped design region

  • This case study is a combination of review and new stuff (this is one of the things I like about GJ). It gives another example of a response surface design – treating quantitative predictors as actually quantitative instead of just setting “high” and “low” levels and calling it a day. But it also introduces some higher-order effects, and this interesting twist of limiting the design region (i.e., the possible combinations of settings for the factors).
  • 5.2: This one is particularly big on showing real-world consulting interaction, in GJ’s trademark slightly-awkward way.
    • 5.2.1: Ah, the classic “previous consultant did something wrong” approach.
      • You should be able to recognize this previous consultant’s design as a full factorial (although the “central composite design” and “axial points” parts should probably go on your “what?” list). Why is it called a 3-by-3?
      • There’s an interesting point here about how to decide where “high” and “low” levels of the factors are: start with what you usually do!
      • Watch out for the extremely complicated diagram at the bottom of p. 97.
      • On p. 98: the previous consultant’s choices weren’t good because he didn’t take the specific context constraints into account, but, as Brad points out, they’re not bad as general rules of thumb.
      • There’s a really interesting point at the start of p. 99. Generally, we have had to use the MSE of the residuals as our estimate of the natural variance between individual runs, \(\sigma^2_{\varepsilon}\). But if you’re lucky, you may have other information on which to base that estimate – as you do here, where the company has lots of runs with the same specific combination of factor settings (i.e., their usual method). Based on these runs, they know that the “unavoidable” run-to-run variation is on the order of 0.5%. So if the MSE of the residuals is an order of magnitude larger, that indicates that there is some effect happening that is not being accounted for in the model!
      • Peter ‘n’ Brad spend a lot of time and dialogue working out the actual boundaries of the design region of interest, but this is kind of how it works – you can’t always just tell a client “give me your linear constraints,” you have to explain what that means.
      • Note that there are six “sides” in the design region’s boundary. The two diagonal ones are defined by the two inequalities on p. 100-101. What defines the vertical and horizontal sides?
      • For practice, confirm Peter and Brad’s statements about how many terms of each order there are. Why do you need 10 runs minimum to fit the model?
      • Figure 5.4 is another example of the map of the design space we saw long ago: again, the dots are the “locations” (i.e., factor settings) of each run. Note that a couple of the locations are replicated (there’s more than one run with those factor settings). Sometimes you’ll show this on the plot by using a different kind of point; I just circled those two points in my copy of the book.
    • 5.2.2: Now we shift from thinking about the design to thinking about the model (although, of course, the design choices were informed by the model they expected to want to fit).
      • Note the familiar technique of dropping insignificant terms to improve the estimates/precision of the other terms.
      • “Pure error degrees of freedom” is a semi-new term – see 5.3.2 for the details on this.
      • Figures 5.5, 5.6, and 5.7 are sort of like a version of interaction plots when both predictors involved are quantitative. What is the relationship between Time and Yield when Temperature = 535? What is the relationship between Temperature and Yield when Time = 540?
      • Typo toward the bottom of p. 106: the lowest time Gray wanted to study was 360 seconds.
      • p. 108: Another new plot! Contour plots are great when you don’t have a computer on hand to do color-coded versions.
      • For a fun activity, try drawing Original Consultant’s run locations onto the design space, either in figure 5.8 or figure 5.4. This really brings home how inappropriate those factor settings were given the research context!
  • 5.3: Black box time!
    • 5.3.1: If you don’t feel super confident in GJ’s count of how many terms there are of each type, write them out for \(k=2\) or \(k=3\).
      • They note that in practice you usually can’t do a full cubic model with 5 or more factors – how many runs would that take?
    • 5.3.2: Details on the lack-of-fit test mentioned in the skit. This digs into why it can be good to have replicated runs/factor settings even if you don’t have the budget to replicate the entire experiment (or replicate every factor setting).
      • You don’t have to prove that SSPE and SSLOF are statistically independent, but what does it mean for them to be independent?
      • Notice the interesting calculation of df for pure error, SSPE. It doesn’t depend on the total number of runs \(n\), the way SSE does: it depends on how many runs you have that are exact replicates, as defined by \(m\) and \(n_i\). You can have a gazillion runs and it won’t help you estimate pure error unless some of them use the exact same factor settings.
      • Hooray, another F test!
    • 5.3.3: Skim this if you want to. But again, you won’t do the coordinate-exchange algorithm in practice; you’ll ask R to deal with it for you.
  • 5.4: Probably not of great interest?
  • 5.5: Chapter summary. Note the key contrast between this case study and #4, as pointed out here: the “allowable region of experimentation,” or design space of interest, used to be a cube (well, a hypercube) but in this case is an irregular polygon.

5.2.6 Case 7

A response surface design in blocks

  • This case study introduces a fairly common situation: we have some blocking factor (day, person, dog, whatever), and we need to account for differences between days/people/dogs. But we don’t actually care about the effect of each individual day/person/dog because we’re trying to generalize to days/people/dogs we haven’t seen before. In this situation, we treat the block as a random effect – we know how much variation it’s responsible for, even though we don’t know exactly what the effect is for a new day/person/dog.
  • 7.1: The item to star here is #3 (and #5, I suppose). This is the key difference between this kind of blocking factor and a treatment factor: you’re not going to have the same blocks later!
  • 7.2.1: Welcome to another episode of Awkward Consulting Theater!
    • We haven’t really talked about central composite designs, but the short version is, they’re sort of like factorial designs with some extra runs in helpful locations. (This could be a good project topic, just putting that out there.)
    • Why does the possible presence of curvature mean Maroussa needs three levels for each factor?
    • Although we haven’t read about central composite designs or axial points, you should be able to follow other features of Maroussa’s initial design – the fractional factorial and center runs.
    • Likewise, you don’t need to be able to predict (as Peter does at the top of p. 138) which effects would and wouldn’t be confounded, but you should know what it means for them to be confounded.
    • You can skim over the extended discussion on p. 138-139 about how to reconcile a CCD with the blocking constraints because, as Brad points out, they’re not going to use a CCD anyway.
    • Peter really loves this question at the bottom of p. 139 about resetting the factor levels between each run; he asks it basically every time. Eventually we will find out why.
    • GJ’s design is not orthogonally blocked (oh snap, etc.). Be sure you know what it means to be “orthogonally blocked,” but don’t worry about how Peter instantly knows this design isn’t.
    • On p. 143: We’ve (briefly) seen VIFs before; I find them more intuitive than efficiency factors, myself, but whatever works for you.
  • 7.2.2: Analyzing the data.
    • Brad’s comment at the top of p. 145 is absolutely the best line in this entire book.
    • Interpret each of the terms in the model (7.1) for yourself.
      • Note the oddity of the \(\gamma_i\) (block effect) term! There’s no indicator variable for each specific day with its own little \(\beta\)…just this general \(\gamma_i\). It turns out that, just like \(\varepsilon\), this is considered a random value: each day has some effect that’s assumed to be random.
    • Notice that everyone is making an assumption here: although there may be two-way interactions between the treatment factors, there is no interaction between the blocks and any treatment factor. Some days the pastry is overall fluffier than other days (I think that’s what they mean by “expansion index”), but the differences between pastries made to each recipe are consistent across days.
    • You may have encountered stepwise model-selection methods before (if you haven’t, this is great material for Topic Conversations!). Short version: keep taking terms out of the model until everything that’s still there looks useful.
    • This “desirability” thing is very fancy – it’s one way of dealing with a situation where you have multiple response variables that you care about. Skip this for now and consider it as a project topic :)
    • Peter’s short email on p. 149 is a key summary of a lot of math that’s about to happen. Remember that we’ve always had this assumption of independent errors in all our regression adventures. But now, we don’t have independent errors: runs on the same day are similar in ways that aren’t accounted for by our factors, so their errors are not independent.
    • Skim Attachment 7.1. The key point here is in the last paragraph: REML estimates take into account the degrees of freedom you already used for estimating other things.
  • 7.3: MATH HAPPENS
    • 7.3.1: Don’t get too hung up on the vector notation at the top of p. 152. Instead focus on the bottom half of p. 152:
      • The variance of the \(Y\) values is partly due to block-to-block variation, and partly due to “within” block individual variation.
      • Observations within the same block are correlated (nonzero covariance) because they share the same block effect. But observations from different blocks aren’t correlated at all (this is how life used to be!).
      • We can summarize this with the variance-covariance matrix for the actual \(y\) values for the observations within any given block; we call this matrix \(\boldsymbol \Lambda\) (equation 7.6, p. 153). (That’s a capital lambda, if you’re wondering.)
      • The last paragraph of this section hides the real kicker: because you are not estimating an actual effect for each block, but just the variation between blocks (\(\sigma^2_{\gamma}\)), you only have to estimate one thing!! Which saves you a looot of degrees of freedom.
    • 7.3.2: Skim this section if you have taken Stat 340; we’re not going to use this theory directly, but it’s good to see what’s happening. (Does this remind you of weighted least squares?)
      • If you haven’t taken 340, skip everything in this section except the definition of the matrix \(V\). This is the variance-covariance matrix for all the response values across all of the blocks. If you “zoom in” on the diagonal entries, they’re actually the \(\Lambda\) matrices defined back at (7.6). Everything else is 0 – what does that mean?
    • 7.3.3: Skim or skip. The takeaway here is that there are multiple ways you can actually go about calculating these estimates, and one nice thing about REML is that you can use it with unbalanced designs.
    • 7.3.4: Read the first paragraph and skim or skip the rest. As GJ points out in the last sentence, R is already worrying about this for you.
    • 7.3.5: This should all sound rather familiar, just with different notation for the covariance matrix/information matrix.
      • The last paragraph of this section is quite interesting, I think, but you don’t need to know the technical details of why this is true.
    • 7.3.6: Don’t worry too much about the vector notation, again: look at the example instead.
      • The key here is that if your block effects are not orthogonal to your treatment factor effects (estimated independently), then your factor effects are less certain/precise, with higher variance. Why does this make sense?
    • 7.3.7: An interesting little side note.
  • 7.4: Optional, as usual. Designs with two blocking factors (optimal or otherwise) could be a nice project topic…
  • 7.5: Quick recap, with some nice examples of situations where you’d need random-effects-style blocking.

5.2.7 Case 8

A screening experiment in blocks

  • In some ways this material will be more familiar to you than the last chapter, since they talk about using fixed effects for blocks – which is what we used to do in the old days. For some reason GJ decided to put this later, though, and they reference the previous chapter so you have to read that one first. Go figure.
  • 8.1: There are two new concepts here, mostly:
    • The idea that fixed-effect block models require more runs to fit than random-effect block models
    • Orthogonality between the block effect and the treatment effects (okay, this isn’t exactly new, but there’s more detail about it)
  • 8.2.1: In which Peter and Brad count the cars on the New Jersey Turnpike.
    • Make sure you agree with Dr. Xu’s count of the number of terms in the model on p. 164-165.
    • The \(2^{6-1}\) idea should be familiar, so use this as a chance to review :)
    • We haven’t really talked about “contrast columns,” but you can confirm that these are indeed equal to \(x_1 x_3 x_5\) etc. Although these three-way interactions won’t be estimated in the actual model, you can see their purpose here: all the runs within each block have the same value for each of these columns, while each block has a different combination of settings for the three columns.
    • How can you tell that \(x_1 x_2\) and the others are confounded with block effects?
    • We’ve seen VIF before but this is a good review/more detail. There’s an interesting extension on p. 167 where they break down the “blame” for variance inflation between “nonorthogonal treatment factor estimates” and “treatments not orthogonal to blocks.” You don’t have to know how to do this, but it’s cool that it’s possible.
    • There’s a nice exchange of sick burns (well, by statistician standards) on p. 167 for your entertainment.
  • 8.2.2: Skip this, unless you find it interesting (arguably it is, in a Sudoku-ish kind of way). You may also want to look through it for terms you recognize (like generators, VIF, orthogonal, D-efficient, etc.) – it can serve to check your understanding of these terms and provide another example.
    • Peter makes an interesting point when comparing his “clever” design with the D-optimal one: although the overall variance in estimating the factor effects is lower for the D-optimal design, the variance of some effect estimates is lower in Peter’s design. So if you were in a situation where for some reason you wanted more precision about some estimates than others, you might actually go for Peter’s version.
  • 8.2.3: In which Dr. Xu plays it cool.
    • At the bottom of p. 175, Brad says what we kind of glossed over when we were doing all that sum-of-squares ANOVA stuff: it relies on orthogonal designs. Fortunately, R steps in where our nice analytical methods falter.
    • I recommend looking at the results and drawing your own conclusions before you see what Brad and Dr. Xu have to say about it – this is good practice.
    • Brad’s line at the top of p. 178 summarizes the fixed-vs.-random debate in a nutshell. Think back to the last chapter and note how that situation lines up with what Brad is saying about random blocks.
    • Holy cats, non-integer degrees of freedom! The good news is that you do not have to figure these out yourself; Peter’s summary is what you need to know.
      • Incidentally, if you did certain kinds of two-sample \(t\) tests back in intro stats, you may have seen non-integer df before. Usually, though, we stick that part in a footnote and hope nobody notices.
  • 8.3.1: The major content here is GJ’s argument for using random effects for blocks instead of fixed. Do you agree with their arguments? Can you think of situations where you would want to use fixed blocks instead?
    • Reminder: the bolding in equation 8.1 represents that these are vectors – this is a bunch of different \(\beta_i x_i\) terms added together.
    • New vocab: \(f\) being the “model expansion” of \(\mathbf{x}\) just means it includes all the other terms we “create” based on the actual factor levels, like interactions, quadratic terms, and higher-order stuff.
    • The idea that random block effects means you can use “inter-block” (between-block) information – but you can’t with fixed block effects – is a little mindbending. Take some time with this.
      • A point to note here is that using a random effect to represent blocks means that you can have a general estimate of how different all the blocks tend to be, based on your estimate \(\hat{\sigma}^2_{\gamma}\).
      • This means that if you see two runs in different blocks and one \(y\) is way higher than the other, you have a sense of whether that could reasonably just be due to block-to-block variation, or whether it has to be due to an actual treatment factor effect.
      • But with fixed effects for blocks, there is no limit to how different two blocks could be – you don’t have any kind of a distribution for the effect of each block. So if you have two runs in different blocks, even if the \(y\)’s are suuuuuper different, for all you know that’s all due to the difference between the blocks; it might have nothing to do with the treatment factors.
  • 8.3.2: Lots of matrices in here! If you have taken Stat 340, then read this as an extension of the matrix-form linear regression/estimation equation you’ve seen before. If not, don’t sweat it.
    • The thing everyone should note here is the part about dealing with the terms for each level of the factor – you need to designate a baseline group, drop the intercept, or constrain all the effects to add to 0. This isn’t completely new but it’s nice to see it in this new context.
  • 8.4: More reading if you like this kind of thing. The fact that fixed block effects are a special case of random block effects is kind of neat.
  • 8.5: A recap, as usual.

5.2.8 Case 10

A split-plot design

  • This chapter centers around a basic concept: what to do when there are groups (or blocks) of runs, where some factor settings change at the block level, and other factors can change for individual runs within each block. There are Mathematical Consequences to this, which we won’t go into very far; the most important thing is to have a conceptual understanding of this structure of experiment, and what kinds of analysis you can do with it.
  • GJ introduce split-plot designs in a particular context: sometimes, it’s too much work/money/effort to reset all the factors for each experimental run, so you do a bunch of runs in a row with some factor at the same setting. This creates dependence among that group of runs – they essentially act as a block, or plot. This happens a lot in industrial experiments and people sometimes forget to take it into account, which is why it’s Peter’s personal life goal to mention it as often as possible. But do bear in mind that split-plot designs can arise for other reasons as well :)
  • 10.1: Point #2 in this intro briefly mentions something that I think can be very helpful in understanding split-plot designs: it’s sort of like you have two different experiments going at once. In the “higher level” experiment, the units are blocks, and each block gets randomized to some level of the whole-plot factor. Then there’s a lower level, where the units are individual runs within each block; these runs get randomized to different levels of the split-plot factor. On p. 222, Peter talks about this as a “master design” with a smaller design embedded in it.
  • 10.2.1: Hey, did you want to learn some stuff about wind tunnels?
    • Apparently GJ did not want to pay for the rights to use the phrase “post-it note” on p. 220.
    • GJ never miss an opportunity to insult one-factor-at-a-time experiments, as on p. 221. It can be a handy review exercise to think about what such an experiment would look at here – how many runs you would need, what you would be able to estimate, etc.
    • Not the focus of this chapter, but note the mention of NASCAR’s regulations at the bottom of p. 221: this is restricting the design region of interest!
    • Dr. Cavendish, as usual for GJ’s case study clients (not necessarily for actual real-world clients), has come up with his own designs. You should be able to follow his descriptions, more or less. Spend some time with the diagrams (figure 10.1 etc.), working out how this graphically represents each of the runs. It’s sort of the same principle as the design region plots we’ve seen before, but a little more abstract.
    • Don’t worry too much about the exact mathematics of Dr. Cavendish’s second design, but think about Peter’s argument: by “spending” so many runs on identical center-point replicates, you get yourself a great estimate of variance components, but can’t get as good an estimate for each actual factor effect. In this particular context, that’s not a good tradeoff – but it might be desirable in another context.
    • How can Brad tell that his design (table 10.2 – note that this goes onto the next page) has a “\(3^2\) factorial design plus one additional center point for the hard-to-change factors”? (Remember the hard to change factors are front and rear ride height.)
    • Despite the vaguely distasteful analogy, it is worth thinking about the back-and-forth on p.226 about Dr. Cavendish’s design vs. the I-optimal design. Clients really do like symmetry, and there are reasons for that; symmetric designs are mathematically simpler, and more likely to be orthogonal. But if your goal is precise prediction, then I-optimality is the way to go.
    • Peter’s point on p. 229 is worth noting: if your goal is to get a particular value for the response, then you do need to know the intercept. If you’re just trying to maximize or minimize the response, then you don’t care about the intercept.
    • We haven’t seen A-optimality before; you, like Dr. Cavendish, can ignore it for now.
    • Brad’s line on p. 229 is key: “The larger variance for the estimates of the hard-to-change factors’ effects is the price you pay for not resetting these factors every time.” Those hard-to-change factors change block to block, not run to run, so you have fewer “real” observations for those factors.
      • And yet, as Peter points out on p. 230, the interactions between the hard-to-change and easy-to-change factors can be estimated more precisely, because you get a new observation each time you change one of the easy-to-change factors.
    • Another FDS plot! Note the interesting feature of this one: one design is actually better (lower prediction variance) for predictions on part of the design region, but the other design is better for most of the design region.
    • Don’t patronize your clients, Brad.
  • 10.2.2: There’s a bunch of extra stuff in this section, but skim most of it, noting the following:
    • Dr. Cavendish now mentions that there are actually four possible response variables (surprise!). The fourth of these is another approach to optimizing multiple response variables (aka desirability), by mathematically combining them into a single score that you want to optimize.
    • Dr. Cavendish, bowing under the weight of narrative necessity, goes ahead and does the data analysis incorrectly – he ignores the dependence among runs in the same block. Don’t spend a bunch of time trying to understand what he did (since, after all, it’s wrong). The key point is this: doing it his way doesn’t bias your actual factor effect estimates, but it does mess up the standard errors and df, and therefore your test statistics and significance results are wrong.
      • The last paragraph on p. 236 is a good summary of this.
    • Brad mentions at the end of p. 238 that certain effects are practically insignificant. This is a good thing to remember! Just because you are statistically confident that an effect isn’t zero, doesn’t mean you actually care about it, if it’s really tiny in practical terms.
  • 10.3.1: Recaps the reasoning for doing split-plot experiments (well, one possible reason).
    • Check out the “Attachment” if you want to; it may help make the whole “plot” terminology make more sense.
  • 10.3.2: Math time!
    • The model equation here is quite similar to the equation for a mixed-effects model (fixed treatment factor effects, random block effect).
    • Note the terminology definitions at the bottom of p. 242.
    • Note the difference between the “whole-plot effects \(\gamma_i\)” (the block effect, basically, which we’re treating as a random effect) and the effects of the hard-to-change treatment factors, which are set at the whole-plot level. We do want to estimate fixed effects for the hard-to-change factors!
    • These matrices are, again, very similar to the variance-covariance matrices for the responses (not the factor effects!) that we saw with random block effects.
    • Don’t worry about the GLS/REML stuff, that’s what grad school is for. Or at least post-340.
  • 10.3.3: Skim these giant walls of text, noting the following:
    • The second paragraph on p. 244 reiterates a key point: because you don’t reset the hard-to-change factors very often (because, you know, it’s hard), you don’t have as many “actual” observations for each level of those factors, so you can’t estimate those factors as well.
    • Item 1 talks about why you can get more precise estimates of the interactions between hard-to-change and easy-to-change factors than you can for the hard-to-change factor effects. This is cool but you do not need to follow the reasoning, unless it interests you.
    • Item 2 discusses interactions between two easy-to-change factors. Again, not super important unless it interests you. It boils down to: if you don’t actually have different levels of the interaction effect within each whole plot, then you can’t estimate it as well.
    • Item 3 points out a similar issue: because the quadratic effects only have a couple of values within each whole plot, you don’t have very precise estimates of those as well. Alas.
  • 10.3.4: Quite a long list of situations where you can end up doing a split-plot design and not knowing it. (For another example, think waaaay back to my notes section 1.3, on replication and pseudo-replication, where we talked about baking bread at different temperatures using common batches of dough!)
  • 10.3.5: This is not our main focus, but it’s nice to follow the reasoning here as a review/refresher on how many degrees of freedom a model takes up, and what that means for the required number of runs.
  • 10.3.6: The last paragraph in this subsection is the only new thing. You don’t need to understand why this is true, but it’s kind of fascinating that it is true.
  • 10.3.7: Skip, unless you’re out of crossword puzzles and need something soothing to do.
  • 10.3.8: This is partly some technical stuff to skim, but an important takeaway is to remember that your hard-to-change factor effects are only getting estimated based on the number of whole plots/blocks you have – which may be quite small. This means you might have “only just enough” whole plots to estimate each hard-to-change factor plus intercept and \(\sigma^2_{\gamma}\) before running out of degrees of freedom. As we’ve seen in some past examples, in this situation, your estimate of the error variance is pretty bad, so your statistical power is terrible. It’s hard to trust significance testing that uses only one or two df for error!
  • 10.4: Extra reading, as usual.
  • 10.5: People really like this Cuthbert Daniel quotation.
    • This summary contains one last warning about “stealth” split-plot designs. If you do not reset the factor levels independently for each run, you are doing a split-plot design, whether you know it or not! So always make sure the people actually performing the experiment know that they have to reset the factors every time…or just plan to do a split-plot design in the first place.