12 Lab 6 (Stata)

12.1 Lab Goals & Instructions

Today we are using a new dataset. See the script file for the lab to see the explanation of the variables we will be using.

Research Question: What characteristics of campus climate are associated with student satisfaction?

Goals

Use component plus residuals plots to evaluate linearity in multivariate regressions.
Add polynomial terms to your regression to address nonlinearity
Turn a continuous variable into a categorical variable to address nonlinearity
Add interaction terms to your regression and evaluate them with margins plots.

Instructions

Download the data and the script file from the lab files below.
Run through the script file and reference the explanations on this page if you get stuck.
No challenge activity!

Jump Links to Commands in this Lab:

cprplot

This is the main new command in today’s lab. Otherwise we will be playing with margins plots quite a bit.

Lab Files

Download UM_DEIStudentSurvey.dta

Download 401-1-Lab6.do

12.2 Components Plus Residuals Plot

This week we’re returning to the question of nonlinearity in a multivariate regression. First we’re going to discuss a new plot to detect nonlinearity –specifically in regressions with more than one independent variable: the component plus residuals plot.

Sometimes we want to examine the relationship between one independent variable and the outcome variable, accounting for all other independent variables in the model. They take the residual and subtract the parts of the residual that come from the other independent variables.

Let’s run through an example.

STEP 1: First, run the regression:

regress satisfaction climate_gen climate_dei instcom ///
    fairtreat female ib3.race_5

>         fairtreat female ib3.race_5

      Source |       SS           df       MS      Number of obs   =     1,416
-------------+----------------------------------   F(9, 1406)      =    134.18
       Model |  692.700417         9   76.966713   Prob > F        =    0.0000
    Residual |  806.483199     1,406  .573601137   R-squared       =    0.4621
-------------+----------------------------------   Adj R-squared   =    0.4586
       Total |  1499.18362     1,415  1.05949372   Root MSE        =    .75736

-------------------------------------------------------------------------------
 satisfaction | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
--------------+----------------------------------------------------------------
  climate_gen |   .4549723   .0402608    11.30   0.000     .3759946      .53395
  climate_dei |   .0914773    .039135     2.34   0.020     .0147081    .1682466
      instcom |    .291927   .0325804     8.96   0.000     .2280156    .3558384
    fairtreat |   .1642801   .0384167     4.28   0.000     .0889198    .2396404
       female |  -.0647484   .0419125    -1.54   0.123    -.1469663    .0174694
              |
       race_5 |
       White  |   .3724681   .0683034     5.45   0.000     .2384805    .5064557
        AAPI  |   .3445667   .0754328     4.57   0.000     .1965937    .4925396
Hispanic/L~o  |   .2711118   .0763919     3.55   0.000     .1212575    .4209662
       Other  |   .2489596   .0872887     2.85   0.004     .0777294    .4201897
              |
        _cons |  -.2139648   .1334109    -1.60   0.109    -.4756706    .0477411
-------------------------------------------------------------------------------

STEP 2: Run the cprplot command specifying the independent variable you want to examine.

Basic Command:

cprplot climate_dei, lowess

Command with clearer line colors:
I changed the regression line to be dashed and the lowess line to be red. This makes the lines and patterns easier to distinguish.

cprplot climate_dei, rlopts(lpattern(dash)) ///
    lowess lsopts(lcolor(red))

INTERPRETATION:
If the independent variable being examined and the outcome variable have a linear relationship, then the lowess line will be relatively straight and line up with the regression line. If there is a pattern to the scatter plot or clear curves in the lowess line, that is evidence of nonlinearity that needs to be addressed.

Now we’ll move on to addressing nonlinearity when we find it.

12.3 Approach 1: Polynomials

One way we can account for non-linearity in a linear regression is through polynomials. This method operates off the basic idea that $x^2$ and $x^3$ have pre-determined shapes when plotted (to see what these plots look like, refer to the explanation of this lab on the lab wepage. By including a polynomial term we can essentially account account for some curved relationships, which allows it to become a linear function in the model.

Squared Polynomial

Here’s what a $y = x^2$ looks like when plotted over the range -10 to 10. It’s u-shaped and can be flipped depending on the sign.

This occurs when an effect appears in the middle of our range or when the effect diminishes at the beginning or end of our range. Let’s look at an example:

STEP 1: Evaluate non-linearity and possible squared relationship

scatter satisfaction instcom || lowess satisfaction instcom

This is flipped and less exagerated, but it’s still an upside down u-shape.

STEP 2: Generate a squared variable for key variable

gen instcom_sq = instcom * instcom

STEP 3: Run regression with the squared expression to check significance
NOTE: You must always put both the original and the squared variables in the model! Otherwise, you aren’t telling STATA to model both an initial and cubic change to the line.

regress satisfaction climate_gen climate_dei fairtreat female ib3. ///
    race_5 instcom instcom_sq

>         race_5 instcom instcom_sq

      Source |       SS           df       MS      Number of obs   =     1,416
-------------+----------------------------------   F(10, 1405)     =    122.56
       Model |  698.460566        10  69.8460566   Prob > F        =    0.0000
    Residual |   800.72305     1,405  .569909644   R-squared       =    0.4659
-------------+----------------------------------   Adj R-squared   =    0.4621
       Total |  1499.18362     1,415  1.05949372   Root MSE        =    .75492

-------------------------------------------------------------------------------
 satisfaction | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
--------------+----------------------------------------------------------------
  climate_gen |   .4335635   .0406921    10.65   0.000     .3537397    .5133874
  climate_dei |   .0965462   .0390414     2.47   0.014     .0199605     .173132
    fairtreat |   .1597253   .0383197     4.17   0.000     .0845553    .2348954
       female |  -.0721038   .0418415    -1.72   0.085    -.1541823    .0099747
              |
       race_5 |
       White  |   .3629737   .0681488     5.33   0.000     .2292894     .496658
        AAPI  |   .3254566   .0754296     4.31   0.000     .1774898    .4734233
Hispanic/L~o  |   .2634913   .0761834     3.46   0.001     .1140459    .4129368
       Other  |   .2499186   .0870079     2.87   0.004     .0792393     .420598
              |
      instcom |    .741871   .1452069     5.11   0.000     .4570254    1.026717
   instcom_sq |  -.0718686   .0226061    -3.18   0.002    -.1162139   -.0275233
        _cons |  -.7750299   .2209744    -3.51   0.000    -1.208505   -.3415547
-------------------------------------------------------------------------------

STEP 4: Generate margins graph if significant NOTE: We use ## interact variables in a model. When you interact a variable with itself, it acts as a squared term. This is called ‘factor notation’ and we must use it instead of the squared variable we created in order to get margins.

regress satisfaction climate_gen climate_dei fairtreat female ib3. ///
    race_5 c.instcom##c.instcom
margins, at(instcom = (0(1)5))
marginsplot, noci

Cubed Polynomial

Here’s what a $y = x^3$ looks like when plotted over the range -10 to 10. It’s slightly s-shaped.

This occurs when the effect is perhaps less impactful in the middle of the range. Let’s go through the example. The steps are the same, so we’re going to skip the generating a new variable step.

STEP 1: Evaluate non-linearity and possible cubic relationship

scatter satisfaction fairtreat || lowess satisfaction fairtreat

You can see our slight characteristic s-shape to the data.

STEP 2: Run Regression with cubic using interaction factor ( ## )
NOTE: We interact the variable “fairtreat” with itself twice to make a cubed term. Again, we need to do this in order to generate margins. If you find the regression output harder to read with factor notation you can manually create new cubed variable.

regress satisfaction climate_gen climate_dei instcom female ///
    ib3.race_5 c.fairtreat##c.fairtreat##c.fairtreat

>         ib3.race_5 c.fairtreat##c.fairtreat##c.fairtreat

      Source |       SS           df       MS      Number of obs   =     1,416
-------------+----------------------------------   F(11, 1404)     =    110.48
       Model |  695.586279        11  63.2351163   Prob > F        =    0.0000
    Residual |  803.597337     1,404  .572362776   R-squared       =    0.4640
-------------+----------------------------------   Adj R-squared   =    0.4598
       Total |  1499.18362     1,415  1.05949372   Root MSE        =    .75655

-------------------------------------------------------------------------------
 satisfaction | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
--------------+----------------------------------------------------------------
  climate_gen |   .4434539   .0405782    10.93   0.000     .3638534    .5230544
  climate_dei |   .0917208   .0391412     2.34   0.019     .0149392    .1685024
      instcom |   .2880597   .0325941     8.84   0.000     .2241213     .351998
       female |  -.0688182   .0419067    -1.64   0.101    -.1510246    .0133882
              |
       race_5 |
       White  |   .3710566   .0684485     5.42   0.000     .2367842     .505329
        AAPI  |   .3416881   .0761053     4.49   0.000     .1923958    .4909803
Hispanic/L~o  |   .2759983   .0765125     3.61   0.000     .1259071    .4260894
       Other  |    .254486   .0872918     2.92   0.004     .0832496    .4257224
              |
    fairtreat |  -1.405129   .7489441    -1.88   0.061    -2.874299    .0640413
              |
  c.fairtreat#|
  c.fairtreat |    .493122   .2257115     2.18   0.029      .050354      .93589
              |
  c.fairtreat#|
  c.fairtreat#|
  c.fairtreat |  -.0479178   .0215204    -2.23   0.026    -.0901333   -.0057022
              |
        _cons |   1.324045   .8063997     1.64   0.101    -.2578329    2.905923
-------------------------------------------------------------------------------

Margins plot:

margins, at(fairtreat = (1(1)5))
marginsplot

12.4 Approach 3: Creating a Categorical Variable

A second way we can account for non-linearity in a lienar regression is through transforming our continuous variable into categories. Age is a very common variable to see as categorical in models. We can capture some aspects of nonlinearity with ordered categories, but it may not be as precise as working with squared or cubed terms.

Let’s run through an example:

STEP 1: Evaluate what categories I want to create

sum climate_gen, d

                 Composite: General climate
-------------------------------------------------------------
      Percentiles      Smallest
 1%     1.571429              1
 5%     2.285714              1
10%     2.714286              1       Obs               1,797
25%     3.142857       1.142857       Sum of wgt.       1,797

50%     3.714286                      Mean           3.607732
                        Largest       Std. dev.      .7253975
75%     4.142857              5
90%     4.571429              5       Variance       .5262015
95%     4.714286              5       Skewness       -.500229
99%            5              5       Kurtosis       3.205013

It looks pretty evenly spread across the range, so I’m going to create five categories.

STEP 2: Create the Category

gen climategen_cat =.
    replace climategen_cat =1 if climate_gen >=1 & climate_gen<2
    replace climategen_cat =2 if climate_gen >=2 & climate_gen<3
    replace climategen_cat =3 if climate_gen >=3 & climate_gen<4
    replace climategen_cat =4 if climate_gen >=4 & climate_gen<5
    replace climategen_cat =5 if climate_gen >=5

STEP 3: Run regression with indicator

regress satisfaction climate_dei instcom fairtreat female ib3.race_5 ///
        i.climategen_cat

>                 i.climategen_cat

      Source |       SS           df       MS      Number of obs   =     1,416
-------------+----------------------------------   F(12, 1403)     =     96.90
       Model |  679.419147        12  56.6182622   Prob > F        =    0.0000
    Residual |  819.764469     1,403  .584293991   R-squared       =    0.4532
-------------+----------------------------------   Adj R-squared   =    0.4485
       Total |  1499.18362     1,415  1.05949372   Root MSE        =    .76439

-------------------------------------------------------------------------------
 satisfaction | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
--------------+----------------------------------------------------------------
  climate_dei |   .1449277   .0390442     3.71   0.000     .0683363    .2215191
      instcom |   .2859731   .0331552     8.63   0.000      .220934    .3510122
    fairtreat |   .1982309   .0381992     5.19   0.000     .1232972    .2731647
       female |   -.060501   .0423081    -1.43   0.153    -.1434949    .0224929
              |
       race_5 |
       White  |   .3467763   .0691179     5.02   0.000     .2111907    .4823618
        AAPI  |   .3305859   .0764519     4.32   0.000     .1806136    .4805582
Hispanic/L~o  |   .2366182   .0770686     3.07   0.002      .085436    .3878003
       Other  |   .2477479   .0882183     2.81   0.005      .074694    .4208018
              |
climategen_~t |
           2  |    .589944   .1564542     3.77   0.000     .2830347    .8968534
           3  |    1.04883   .1580689     6.64   0.000     .7387531    1.358907
           4  |   1.297643   .1671818     7.76   0.000     .9696895    1.625596
           5  |   1.501673   .2290519     6.56   0.000     1.052352    1.950994
              |
        _cons |   .0892198   .1855177     0.48   0.631    -.2747021    .4531418
-------------------------------------------------------------------------------

STEP 4: Double-Check linearity with margins

margins climategen_cat
marginsplot, noci

12.5 Interactions

We have finally arrived at interactions. It is finally time for ‘margins’ to TRULY shine. Wrapping your head around interactions might be difficult at first but here is the simple interpretation for ALL interactions:

The effect of ‘var1’ on ‘var2’ varies by ‘var3’

The association of ‘var1’ and ‘var2’ significantly
differs for each value of ’var3’s

Interactions are wonderful because for any combination of variable types. The key thing to be aware of is how you display/interpret it. Let’s see some options.

Continous variable x continuous variable

The first thing we are going to look at is the interaction between two continuous variables. Let’s run a simple regression interacting climate_dei & instcom. The question I’m asking here then is: Does the effect of people’s overall sense of DEI climate on their satisfaction differ based on a person’s perception of institutional commitment to DEI?

First we run the regression with the interaction term:

regress satisfaction climate_gen undergrad female ib3.race_5 ///
    c.climate_dei##c.instcom

>         c.climate_dei##c.instcom

      Source |       SS           df       MS      Number of obs   =     1,428
-------------+----------------------------------   F(10, 1417)     =    116.36
       Model |  689.376078        10  68.9376078   Prob > F        =    0.0000
    Residual |  839.539188     1,417  .592476491   R-squared       =    0.4509
-------------+----------------------------------   Adj R-squared   =    0.4470
       Total |  1528.91527     1,427  1.07141925   Root MSE        =    .76972

-------------------------------------------------------------------------------
 satisfaction | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
--------------+----------------------------------------------------------------
  climate_gen |   .5038651   .0380054    13.26   0.000     .4293123     .578418
    undergrad |  -.0216865   .0429315    -0.51   0.614    -.1059026    .0625296
       female |  -.0738629   .0425678    -1.74   0.083    -.1573655    .0096397
              |
       race_5 |
       White  |   .4226195   .0679349     6.22   0.000     .2893557    .5558834
        AAPI  |   .3526495   .0766823     4.60   0.000     .2022265    .5030726
Hispanic/L~o  |   .3064282    .076708     3.99   0.000     .1559547    .4569016
       Other  |    .303079   .0877973     3.45   0.001     .1308523    .4753057
              |
  climate_dei |   .4501156   .0965718     4.66   0.000     .2606766    .6395546
      instcom |   .6256633     .09919     6.31   0.000     .4310882    .8202385
              |
           c. |
  climate_dei#|
    c.instcom |  -.0978223   .0272883    -3.58   0.000    -.1513522   -.0442924
              |
        _cons |  -.9367096   .2943789    -3.18   0.001    -1.514175   -.3592444
-------------------------------------------------------------------------------

Then we look at the margins plot. Because I’m mostly interested in what the graph looks like, I’ve added quietly to the front of the margins command. This tells Stata to run the margins command in the background without displaying the results in the console or in your log.

quietly margins, at(climate_dei=(1(1)5) instcom=(1(1)5))
marginsplot

When creating a margins plot with a continuous x continuous interaction:

You need to specify the (min(interval)max) to tell STATA which predicted values to calculate for the plot.
Because both variables are continuous and you want STATA to calculate for each combination of two numbers, you have to put both in the same_at(xxx) bracket so STATA knows to interact them.

Interpretation:

The association between rating of DEI climate and satisfaction is MODERATED by perception of the institution’s commitment to DEI.
The association between rating of DEI climate and satisfaction varies based on perception of the institution’s commitment to DEI.
For students with low perception of the institution’s commitment to DEI, increased DEI climate ratings are associated with an significant increase in satisfaction. As perception of the institution’s commitment to DEI increases, the effect of DEI climate on satisfaction dampens (the slope gets less steep).

Sometimes, you may decide that interpreting this relationship in this direction is difficult to interpret/doesn’t make sense. In situations like that, you might want to change what is your key ‘x’ and your ‘moderator’ variable. Essentially, you are switching your x and y axis.

One way to do this is to switch which variable comes first in the _at() bracket:

quietly margins, at(instcom=(1(1)5) climate_dei=(1(1)5))
marginsplot

The other way is to tell marginsplot which variable to ‘plot’ (present as *moderator on the graph essentially:

quietly margins, at(climate_dei=(1(1)5) instcom=(1(1)5))
marginsplot, plot(climate_dei)

And both graphs come out the same.

Updated Interpretation:
Because we switched which variable is the moderator, our interpration of the relationship changes.

The association between perception of institutional commitment to DEI and satisfaction is MODERATED by the rating of DEI climate.
The association between perception of institutional commitment to DEI and satisfaction varies based on rating of DEI climate.
For students who rate the DEI climate lower, increased perception of institutional commitment to DEI is associated with higher satisfaction. For more positive ratings of DEI climate, the positive effect of perception of institutional commitment to DEI on satisfaction is dampened.

One last thing you can change is the number of lines that appear on the graph.
Approach 1: change the intervals

quietly margins, at(instcom=(1(1)5) climate_dei=(1(2)5))
marginsplot

Approach 2: specify the values that should be predicted

quietly margins, at(instcom=(1(1)5) climate_dei=(1 3 5))
marginsplot

Continuous variable x dummy variable

Once you get a handle on continuous variables, the continuous dummy variable is extremely straightforward.

First run the regression.

regress satisfaction climate_gen instcom ib3.race_5 i.female##c.climate_dei

      Source |       SS           df       MS      Number of obs   =     1,428
-------------+----------------------------------   F(9, 1418)      =    128.65
       Model |    687.2481         9     76.3609   Prob > F        =    0.0000
    Residual |  841.667166     1,418  .593559356   R-squared       =    0.4495
-------------+----------------------------------   Adj R-squared   =    0.4460
       Total |  1528.91527     1,427  1.07141925   Root MSE        =    .77043

-------------------------------------------------------------------------------
 satisfaction | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
--------------+----------------------------------------------------------------
  climate_gen |   .5208932   .0368994    14.12   0.000     .4485099    .5932765
      instcom |   .2856222   .0326096     8.76   0.000     .2216539    .3495904
              |
       race_5 |
       White  |   .4265764   .0679328     6.28   0.000     .2933168     .559836
        AAPI  |   .3730469    .076419     4.88   0.000     .2231404    .5229533
Hispanic/L~o  |   .3190265   .0767491     4.16   0.000     .1684726    .4695805
       Other  |   .3101714   .0877853     3.53   0.000     .1379685    .4823743
              |
       female |
      Female  |   -.650237   .1951897    -3.33   0.001    -1.033129   -.2673455
  climate_dei |   .0498149   .0471972     1.06   0.291    -.0427689    .1423987
              |
       female#|
c.climate_dei |
      Female  |   .1588592   .0519094     3.06   0.002     .0570318    .2606866
              |
        _cons |    .355042   .1677356     2.12   0.034     .0260054    .6840787
-------------------------------------------------------------------------------

Then look at the margins plot:

quietly margins female, at(climate_dei=(1(1)5))
marginsplot

Interpretation:

The association between rating of DEI climate and satisfaction is MODERATED by gender
The association between rating of DEI climate and satisfaction varies based on a student’s gender identity
The positive effect/association of rating of DEI climate on/with satisfaction is stronger for females than males.

Continuous variable x Categorical variable

Categorical variables are often feel most confusing for interactions.

Let’s say I’m interested in how climate_dei is moderated by race. Let’s look at the regression results:

regress satisfaction climate_gen instcom female i.race_5##c.climate_dei

      Source |       SS           df       MS      Number of obs   =     1,428
-------------+----------------------------------   F(12, 1415)     =     98.65
       Model |  696.431535        12  58.0359612   Prob > F        =    0.0000
    Residual |  832.483731     1,415  .588327725   R-squared       =    0.4555
-------------+----------------------------------   Adj R-squared   =    0.4509
       Total |  1528.91527     1,427  1.07141925   Root MSE        =    .76703

-------------------------------------------------------------------------------
 satisfaction | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
--------------+----------------------------------------------------------------
  climate_gen |   .5163975   .0370079    13.95   0.000     .4438013    .5889937
      instcom |   .2820706   .0325448     8.67   0.000     .2182293    .3459118
       female |  -.0669802   .0421579    -1.59   0.112    -.1496789    .0157186
              |
       race_5 |
        AAPI  |   .5523891   .3057571     1.81   0.071    -.0473968    1.152175
       Black  |  -1.043364   .2810924    -3.71   0.000    -1.594766   -.4919609
Hispanic/L~o  |  -.9812627   .2657842    -3.69   0.000    -1.502636   -.4598894
       Other  |  -.2390071   .3184738    -0.75   0.453    -.8637387    .3857245
              |
  climate_dei |   .0902564   .0489236     1.84   0.065    -.0057142     .186227
              |
       race_5#|
c.climate_dei |
        AAPI  |  -.1571012   .0788798    -1.99   0.047     -.311835   -.0023673
       Black  |   .1855982   .0830573     2.23   0.026     .0226696    .3485268
Hispanic/L~o  |   .2401252   .0712755     3.37   0.001     .1003082    .3799421
       Other  |   .0307093   .0874273     0.35   0.725    -.1407918    .2022105
              |
        _cons |   .6527546   .1785275     3.66   0.000     .3025476    1.002962
-------------------------------------------------------------------------------

And then the margins plot:

quietly margins race_5, at(climate_dei=(1(1)5))
marginsplot, noci

When creating a margins plot with a continuous x categorical interaction:

Plot your variable of interest, that you think is a moderator, on the graph by putting it before the comma in the margins command. In this case we’re interested in the effect of race.

Interpretation:

What we see then is how the effect of DEI climate rating on satisfaction varies by racial identity.

Let’s say, though, that you’re only interested in comparing how DEI and satisfaction differs. You might want to specify which racial groups to plot.

quietly margins, at(climate_dei=(1(1)5) race_5=(2 3 4))
marginsplot

Categorical variable x dummy variable

We’ll now look at the categorical and dummy variables interaction.

First the regression:

regress satisfaction climate_gen climate_dei instcom undergrad ///
    i.race_5##i.female

>         i.race_5##i.female

      Source |       SS           df       MS      Number of obs   =     1,428
-------------+----------------------------------   F(13, 1414)     =     89.58
       Model |  690.509811        13  53.1161393   Prob > F        =    0.0000
    Residual |  838.405456     1,414  .592931722   R-squared       =    0.4516
-------------+----------------------------------   Adj R-squared   =    0.4466
       Total |  1528.91527     1,427  1.07141925   Root MSE        =    .77002

-------------------------------------------------------------------------------
 satisfaction | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
--------------+----------------------------------------------------------------
  climate_gen |   .5146673   .0378538    13.60   0.000     .4404117     .588923
  climate_dei |   .1383065   .0389207     3.55   0.000      .061958     .214655
      instcom |   .2899488   .0328652     8.82   0.000      .225479    .3544186
    undergrad |  -.0102032   .0430071    -0.24   0.813    -.0945678    .0741613
              |
       race_5 |
        AAPI  |  -.1500545   .0789974    -1.90   0.058    -.3050192    .0049102
       Black  |  -.2799878    .106818    -2.62   0.009    -.4895265   -.0704491
Hispanic/L~o  |  -.1299022   .0869353    -1.49   0.135    -.3004383    .0406339
       Other  |   .0592291   .1093019     0.54   0.588    -.1551821    .2736403
              |
       female |
      Female  |  -.0515413   .0643228    -0.80   0.423    -.1777197    .0746371
              |
race_5#female |
 AAPI#Female  |     .20961   .1132215     1.85   0.064    -.0124902    .4317103
Black#Female  |  -.2356153   .1338683    -1.76   0.079    -.4982172    .0269866
Hispanic/L~o #|
      Female  |   .0289424   .1185801     0.24   0.807    -.2036695    .2615543
Other#Female  |  -.3206363   .1473326    -2.18   0.030    -.6096503   -.0316223
              |
        _cons |    .449525    .134609     3.34   0.001     .1854701    .7135798
-------------------------------------------------------------------------------

And then the margins plot:

quietly margins female, at(race=(1(1)5))
marginsplot

The first thing to notice is how ENTIRELY unhelpful this graph is because of how many things are happening. The way to do it is to break it down:

FOCUS ON TWO DOTS EACH COLUMN TO SEE GENDER DIFFERENCES IN EACH RACIAL GROUP. We can see the difference between female and male satisfaction for each racial group. We can see, for example, that there is a major difference in satisfaction by gender for black students and students whose identity was grouped into other. Interestingly, the confidence intervals tell us that while the ‘other’ category’s difference is statistically significant, we can’t be sure for black students given the overlap.
FOCUS ON LINES TO SEE RACIAL DIFFERENCES IN EACH GENDER CATEGORY. We can see the difference between the races for each gender. We can see for example, that black female students have lower satisfaction than all other female students, and that gap is statistically different with all the groups except women in the ‘other’ category.

What if we wanted to see these differences more clearly?

APPROACH 1: Change the type of graph we see

marginsplot, recast(bar) by(female)

The ‘recast’ function allows you to use a different type of graph The ‘by’ creates a new graph for each value in the specified variable

APPROACH 2: Create margins that show the coefficient differences

quietly margins, dydx(female) at(race=(1(1)5))
marginsplot, recast(bar)

The ‘dydx’ command calculates the marginal effects of the variable specified. This shows how much more or less satisfaction is for women compared to men for each race. The unit of ‘dydx’ here: the change in outcome units.

quietly margins female, dydx(race)
marginsplot, recast(bar) by(female)

Here, we see how much more or less satisfaction is for each racial group compared to white students in their shared gender. Here, we care about whether or not the confidence interval crosses over 0. If it does, then we can see that this is likely not statistically significant.

There is no challenge activity in today’s lab. Interactions can be challenging to wrap your mind around, but the better you can understand an interaction on a graph the more you will grasp interactions.