Chapter 4 Method

4.1 Context

The setting for the present study was nine out-of-school STEM programs during 2015 in the Northeast United States. Descriptions of the programs are provided in Appendix A. Two intermediary organizations which were contracted by the local school districts to administer the summer programs. The two intermediaries were responsible for soliciting and enrolling youth; establishing guidelines for the design of the programs, and the goals of the programs; and providing training and professional development for the staff, hereafter referred to as youth activity leaders.

There was a difference between the two intermediary organizations, namely, one separated academic and enrichment-related activities, whereas, in the other, the academic and enrichment components were more integrated, which may have program-related effects on youths’ engagement. Many of the programs aim to involve youth in work with data. These learning environments bring together youth activity leaders, educators, and those with technical expertise in STEM domains. Youth spent around three hours per day for four days per week for the approximately four-week programs, which were taught by youth activity leaders and scientists, engineers, and other community members with technical expertise.

4.2 Participants

Participants consisted of 203 youth. Participants were from diverse racial and ethnic backgrounds (see Table 3.1). The mean age of participants was around 13 years old (from youth whose age was available: M = 12.71, SD = 1.70, min. = 10.75, max. = 16.36). Detailed demographic characteristics of youth are presented in Table 3.1.

Table 4.1: Demographic characteristics of youth
Youth	Percentage
Sex
Male	50
Female	50
Race/Ethnicity
Hispanic	48
White	6
Black	36
Multi-racial	3
Asian/Pacific Islander	7
Parent Education
High School or Below	79
Graduated from College (B.A. or B.S.)	21

4.3 Procedure

Before the start of the programs, youth completed a pre-survey that included questions about their experience in STEM, intention to pursue a STEM major or career, and other motivation and engagement-related measures; items about youths’ interest in STEM were the only items used from this survey in this study.

At the beginning of the programs, youth were introduced to the study and the phones used for data collection related to the ESM. As indicated in the literature review, ESM is a method of data collection that involves asking youth to respond to short questions on phones (that were provided as part of the study). In particular, youth were signaled at random times (within intervals, so that the signals were not too near or far apart) in order to obtain a sample of their experience throughout the program. ESM data were collected two days each week, for three weeks (weeks 2-4 of the program). In all of the programs, about equal video-recording time was dedicated to classroom and field experiences. This detail is noteworthy because programs associated with one of the intermediaries rotated between classroom and field experience days, while the other used the first half of each day for one and the second for the other. Each day, youth were signaled four times. These signals were at the same time for all of the youth within their program, but at different times between programs and between days within programs (with the constraint that no two signals could occur less than ten minutes apart).

The programs were video-recorded by research team members on the days during which ESM data were collected. So that the measures relating the video-recording and ESM data can be matched, the videos included a signal from the person doing the video-recording that identified the ESM signal to which youth were signaled to respond.

4.4 Data Sources and Measures

Data sources consist of ESM measures of engagement and youths’ perceptions of themselves and the activity, pre-survey measures of youths’ interest, youths’ demographic information, and the video-recordings of programs.

4.4.1 ESM measures of engagement for the profiles

Measures for engagement were created from five ESM questions, three serving indicators for the experience of engagement and two for the conditions of engagement. The three indicators for engagement were for learning (for the cognitive engagement construct), working hard (for behavioral engagement), and enjoying (for affective engagement). The variables for the conditions are for perceived challenge and perceived competence.

All five items were ultimately used to construct the profiles of engaged. Each of the ESM items consisted of the item text and the following four item response options, of which youth were directed to select one: Not at all (associated with the number 1 on the survey; 1), A little (2), Somewhat (3), and Very Much (4), as presented in Table 3.2. Note that because these items were measured using single-item indicators (which is common in studies using ESM; Hektner et al., 2007), information about the reliability and validity information for these measures is not included.

Table 4.2: ESM measures for profiles
Construct	Item
Cognitive engagement	As you were signaled, were you learning anything or getting better at something?
Behavioral engagement	As you were signaled, how hard were you working?
Affective engagement	As you were signaled, did you enjoy what you are doing?
Perceived challenge	As you were signaled, how challenging was the main activity?
Perceived competence	As you were signaled, were you good at the main activity?

4.4.2 The five aspects of work with data

Different aspects of work with data were identified from video-recordings. Specifically, codes for work with data were generated on the basis of the activity that the youth activity leaders were facilitating. The activity youth activity leaders were facilitating were from the STEM-Program Quality Assessment (STEM-PQA; Forum for Youth Investment, 2012), an assessment of quality programming in after-school programs. I then identified the specific activities that corresponded to the five aspects of work with data, as defined here. While I chose to match the five aspects of work with data to the STEM-PQA code(s) that I interpreted as aligning most closely (in the cases of generating data and interpreting and communicating findings, choosing to use two STEM-PQA items as codes), there are other ways that these could be matched. For example, in the NGSS (NGSS Lead States, 2013), asking questions emphasizes coming up with questions that can be answered through an investigation, whereas the STEM-PQA code used to indicate asking questions emphasizes exploring possible solutions to problems and testing hypotheses. Here are the aspects of work with data (in bold) and the STEM-PQA code(s) to which I corresponded them.

Asking questions: Predict, conjecture, or hypothesize (Staff support youth in using a simulation, experiment, or model to answer questions, explore solutions, or test hypotheses [e.g., Youth run a robotics program to determine whether it does what they expect it to; Youth try an alternate way to solve an equation and test their results against another example, etc.])

Making observations: Classify or abstract (Staff support youth in using classification and abstraction, linking concrete examples to principles, laws, categories, and formulas [e.g., Mice, porcupines, and squirrels are all rodents, rodents are all mammals; The pool ball moved because for every action, there is an equal and opposite reaction; etc.])

Generating data: Collect data or measure (Staff support youth in collecting data or measuring [e.g., Youth use rulers or yardsticks to measure length; Youth count the number of different species of birds observed in a specific location, etc.]) and Highlight precision and accuracy (Staff highlight value of precision and accuracy in measuring, observing, recording, or calculating [e.g., measurement error can impact an experiment or conclusion; measure twice, cut once; scientist always need to double-check their calculations before drawing conclusions; you must observe carefully to see the difference between various species of sparrows, etc.])

Data modeling: Simulate, experiment, or model (Staff support youth in using a simulation, experiment, or model to answer questions, explore solutions, or test hypotheses [e.g., Youth run a robotics program to determine whether it does what they expect it to; Youth try an alternate way to solve an equation and test their results against another example, etc.])

Interpreting and communicating findings: Analyze (Staff support youth in analyzing data to draw conclusions (e.g., after an experiment, youth are asked to use results to make a generalization like “Your heartbeat increases when you exercise”, etc.)) and Use symbols or models (Staff support youth in conveying STEM concepts through symbols, models, or other nonverbal language (e,g, youth use diagrams, equations, flowcharts, outlines, mock-ups, design software, dioramas, physical models, prototypes, graphs, charts, tables, equations, etc.))

I then used these codes as part of the following coding frame, with the code names, possible values, code description, and examples from this study, as presented in Table 3.3. Note that this coding frame was not developed to assess work with data but rather was adapted for this purpose based on aligning dimensions of the STEM-PQA with the categories of the coding frame for work with data in this table.

Table 4.3: Coding Frame for the Aspects of Work with Data
Code Name	Values	Description	Example
Asking questions	1: Present; 0: Not Present	Discussing and exploring topics to investigate and pose questions.	Youth generated questions they investigated related to tide ponds in an estuary ecosystem.
Making observations	1: Present; 0: Not Present	Watching and noticing what is happening with respect to the phenomena or problem being investigated.	Youth observed the projectile motion of an object launched with a catapult.
Generating data	1: Present; 0: Not Present	Figuring out how or why to inscribe an observation as data and generating coding frames or measurement tools.	Youth wrote in a table the number of pieces of recyclables they collected in parts of local waterways.
Data modeling	1: Present; 0: Not Present	Understanding and explaining phenomena using models of the data that account for variability or uncertainty.	Youth calculated the average number of plant species found across a number of sites in the field.
Interpreting and communicating findings	1: Present; 0: Not Present	Discussing and sharing findings.	Youth presented the outcomes of an investigation or engineered design in light of a research question or problem.

Raters contracted by American Institute of Research (AIR) were trained in the use of the Program Quality Assessment tool (PQA), the broader assessment tool for which the STEM-PQA is a supplement. Raters completed a four-hour online training module on the overall PQA tool and then attended an in-person two-day training led by a trainer from the David P. Weikart Center for Youth Program Quality, the tool’s publisher, where they learned about the instrument, trained on its use, and then established inter-rater reliability with a master coder. For the STEM-PQA, three of the same raters contracted by AIR to code the (overall) PQA measure used the STEM-PQA supplement to score one video segment, for which there were no disagreements on scoring for any of the items. The programs were divided up among all of the raters, so raters coded some of the videos for all of the programs. When the raters encountered a situation that was difficult to score, they would discuss the issue by telephone or more often by email after viewing the video in question and reach a consensus on how to score the specific item. Note that these codes were unique to each signal to which youth responded (but were not unique to each youth, as the youth in the same program were signaled at the same time).

Out of the 248 instructional episodes, 236 were code-able for work with data; for the 12 that were not codeable, issues with the video-recordings were the primary source of the missing data. These 236 responses were used for all of the analyses.

4.4.3 Survey measures of pre-interest in STEM

Measures of youths’ pre-interest were used as youth-level characteristics that predict the profiles of engagement. In particular, three items adapted from Vandell, Hall, O’Cadiz, and Karsh (2012) were used, with directions for youth to rate their agreement with the items’ text using the same scale as the ESM items: Not at all (associated with the number 1 on the survey), A little (2), Somewhat (3), and Very Much (4). Reliability and validity information on this scale is presented in Vandell et al. (2008).

This measure was constructed by taking the maximum value for the scales for the different content areas (science, mathematics, and engineering)s so that the value for a youth whose response for the science scale was 2.5 and for the mathematics scale was 2.75 would be 2.75. See Beymer, Rosenberg, and Schmidt (2018) for more details on this (use of the maximum value) measurement approach. The items are presented in Table 3.4. Overall levels of this measure were high (M = 3.044 (SD = 0.901). The individual interest measure represented the mean of interest items across all relevant domains. Thus for some students, the mean was based on three items, while for others it was based on as many as nine items representing all three domains (with Cronbach \(\alpha\) values ranging from .77 - .86 for each domain-specific interest scale).

Table 4.4: Measure for pre-program interest in STEM
Construct	Item text
Pre-program interest in STEM	I am interested in science / mathematics / engineering.
	At school, science / mathematics / engineering is fun
	I have always been fascinated by science / mathematics / engineering)

4.4.4 Other youth characteristics

In addition to the measures described in this section, demographic information for youths’ gender, and their racial and ethnic group are used to construct demographic variables for gender and membership in an under-represented (in STEM) group; membership in an under-represented group was identified on the basis of youths’ racial and ethnic group being Hispanic, African American, Asian or Pacific Islanders, or native American.

4.5 Data Analysis

4.5.1 Preliminary analyses

Correlations (first-order Pearson) and the frequency, range, mean (M), and standard deviation (SD) are first presented for all variables. In addition, the frequency of the codes for aspects of work with data and the numbers of responses by youth, program, and instructional episode are presented.

4.5.2 Analysis for Research Question #1 (on the frequency and nature of work with data)

There were two primary steps taken to answer this question, one more quantitative in nature and one more qualitative. The quantitative aspect focused on the frequency of work with data, whereas the qualitative aspect focused on the specific nature of work with data.

For the quantitative aspect, the codes for the aspects of work with data (described above in the section on the measures) were counted up and presented as a proportion of the number of code-able instructional episodes. As the signals represent a sample of youths’ experiences in the programs, results from this analysis provide insight into how often each of the aspects took place during the programs. Note that this coding frame focused on the degree of instructional support the activity leaders provided for youth to work with data, thus results from this analysis will show how often support for the different aspects of work with data was provided, though youth may engage in the aspects of work with data to varying degrees.

The frequency of work with data, the focus of the quantitative analysis for this research question, will provide insight into how regular the aspects of work with data were, but not about the ways in which work with data was enacted. For example, qualitative differences in how youth were asking questions will not be evident from the codes as they were used. In order to provide more detail in terms of the nature of work with data in summer STEM segments, the data was coded with an open-ended, qualitative approach.

Specifically, three research assistants were trained for approximately eight hours, over the course of four meetings. Then, each research assistant coded all of the segments associated with the videos for a particular. Two coders coded every segment, except for the segments for which the quantitative coding indicated no aspects of work with data were present; instead, for these segments, only one coder coded each segment. For all of the guiding questions, the coders also took note of who (the youth, youth activity leader, or someone else) was the focus of the aspect of work with data. For example, with respect to interpreting and communicating findings, denoted when youth were sharing the results from a hands-on investigation or when it was the youth activity leader doing so as a summary on the basis of the work youth recently completed. Table 3.5 summarizes the aims of the open-ended, qualitative coding, as well as example codes from this study. Note that these examples are different in nature than those for the coding frame for work with data (see Table 3.3), as these codes were written in an open-ended matter, whereas the codes for work with data were applied based on the coding frame (with only 0’s and 1’s as possible value for codes).

Table 4.5: Coding frame for the open-ended, qualitative coding of instructional episodes
Topics	Description	Example of Codes
Nature of activity	Note the nature of what was happening in terms of the activity or activities youth were involved in.	Youth coming up with ideas for their final project, activity leader walking around giving them encouragement, helping students think of ideas. While youth are working, activity leader pulls up a website for a franchise, tells youth that’s the first step to working on their project, describes where to find the numbers they need, talks about what information a company’s website will include, encourages youth to look further into a company using other websites. Youth then take survey.
Youth or youth activity leader	Note the extent to which youth or the youth activity led the activity (or whether it was led by both the youth and youth activity leader).	Activity learner shows youth containers they will put specimens in so they can observe it and tools they can use to capture bugs. Activity leader describes chart that students were given outlining different symbols for different kinds of animals (e.g., leaf for producer), describing different methods to obtain specimens.
Work with data	For each aspect of work with data determined to be present, note how youth were involved in working with data.	Youth are counting in their districts how much plastic they gathered (from one of their field trip sites at the bay). They are asked to write down the amount of plastic each district gathered (generating data). They are then sharing their findings with each other based on calculations they did with the data. They were calculating how many pieces of plastic they collected together over four days (134 pieces)

After coding all of the segments for each program, the coders and I met to discuss potential issues that emerged throughout the coding. The goal of the meetings was to address any problems encountered when using the guiding questions and to clarify how they applied the coding frame. After the coding was complete, I then read through all of the codes for all of the segments then made notes associated with each of the five aspects of work with data. I used these notes to write descriptions of the nature of work with data for each of the five aspects. After reading through the qualitative codes and my descriptions of the nature of work with data during each segment, I grouped the descriptions into themes, which I present in the results for this research question. I also used these themes to calculate proportions, which are also presented in the findings for this section. In summary, an open-ended, qualitative coding approach was used to create descriptions of the ways in which each of the aspects of work with data was enacted. This analysis is used to provide insights into the nature of work with data in summer STEM programs.

4.5.3 Analysis for Research Question #2 (what profiles of engagement emerge)

Latent Profile Analysis (LPA; Harring & Hodis, 2016; Muthen, 2004) was used to identify profiles of engagement. LPA allows for capturing the multidimensional nature of engagement through profiles in terms of discovering groups of the ways in which youth experience engagement together and at once. A key benefit of the use of LPA, in addition to likelihood estimation-based fit indices, is probabilities of an observation being a member of a cluster (unlike in cluster analysis). These profiles make it possible to analyze the multivariate data collected on engagement in a way that balances the parsimony of a single model.

For these analyses, five variables were included: the three indicators for the experience of engagement (cognitive, behavioral, and affective) and the two necessary conditions for it (perceptions of challenge and competence). In addition, solutions with between two and ten profiles were considered. As part of LPA, the model type selection-where the type refers to which parameters are estimated–is a crucial topic. For the present study, six model types were considered:

Varying means, equal variances, and covariances fixed to 0
Varying means, equal variances, and equal covariances
Varying means, varying variances, and covariances fixed to 0
Varying means, varying variances, and equal covariances
Varying means, equal variances, and varying covariances
Varying means, varying variances, and varying covariances

The MPlus software (Muthen & Muthen, 1998-2017) was used to carry out LPA through open-source statistical software I developed and published to the Comprehensive R Archive Network, tidyLPA (Rosenberg, Schmidt, & Beymer, 2018). More specific details on LPA are included in Appendix B.

To select a solution in terms of the model type and the number of profiles to be interpreted and used in subsequent analyses, a number of fit statistics and other considerations were taken into account. These include a range of information criteria (AIC, BIC, and sample adjusted BIC [SABIC]), statistics about the quality of the profile assignments (entropy, which represents the mean posterior probability), a statistical test (the bootstrapped LRT [BLRT]), and concerns of interpretability and parsimony. On the basis of these criteria, a particular solution was selected and used as part of subsequent analyses; as the model selection process is an integral part of providing an answer to this question, the model and number of profiles selected are described in the section for the results for this research question.

4.5.4 Analysis for Research Question #3 (sources of variability for the profiles)

How youth are engaging was a function of who they are as an individual, what they happen to be doing during a particular instructional episode, and which youth program they are enrolled in, as well as random variation. This analysis seeks to identify how much of the variation was at each of these levels through using null models, or models only with the indicators for the three levels (youth, instructional episode, and program). These models can show how much variability in the profiles was systematic at these different levels and was potentially attributable to each of these types of factors. These null models may also suggest something about where you might want to be looking to explain sources of youth’s engagement.

Sources of variability in these profiles can be used as additional information in their own right for interpreting the profiles and in order to anticipate the effects of predictor variables at the youth, instructional episode, and program levels. First, the proportion of the variability at each of these levels was explored through the use of null, or variance components. These are models that only include grouping (i.e., the variable identifying which youth a response was from, what signal the response was associated with, and from which program the youth and signal were from) factors. These models provide insight into at which of these “levels” predictors may be able to explain the outcome.

Variability in terms of the number (and proportion) of profiles each youth reports can also be considered. The breakdown of responses in each of the six profiles by youth was used to show the extent to which youth report their most reported profile. In addition, apart from this overall mean proportion of youths’ responses, the mean proportion for specific profiles (i.e., when youth report a particular profile the most, how often, on average, do they report it?) are also considered.

The ICCs provide information about sources of variability in the profiles of engagement with respect to the same profile. One way to better understand the nature of variability across profiles is by examining how often youth reported the same profile: Whether youth exhibit stable or highly variables modes of engagement (i.e., are some youth always Fully engaged?) can provide a descriptive portrait of youths’ experiences the many instructional episodes they were involved in. To determine how stable youths’ engagement was, for each youth, the profile that youth reported most was identified, and then the proportion of their responses in that profile was calculated. These proportions are also presented in the results for this question.

4.5.5 Analysis for Research Question #4 (how work with data relates to engagement)

This question is focused on how work with data relates to the profiles of engagement. For the primary results for this question, mixed effects models that account for the cross-classification of the instructional episode (because of the dependencies of the responses associated with each of the 248 distinct ESM signals) and youth are used and for the “nesting” of both within each of the nine programs are used. The lme4 R package (Bates, Martin, Bolker, & Walker, 2015) was used. All of the models for this and the subsequent research question use random effects for youth, instructional episode, and program effects. Youth and the instructional episode can be considered to be crossed with both nested within the program.

The probability of a response belonging to the profile was the dependent variable, and the aspects of work with data are the independent variables. There are six models, for each of the six profiles. Because the outcome from LPA is not a hard classification (i.e., an observation is in a profile—or not) but a probability, the dependent variable is treated as a continuous variable.

First, null models with only the random parts (i.e., random youth, instructional episode, and program effects) were specified. Then, the five aspects of work with data were added as predictors to the model. The results will be interpreted on the basis of which of the statistical significance and the magnitude and direction of the coefficients associated with these five predictors. For example, if the coefficient for the effect of the asking questions aspect of work with data upon one of the profiles was 0.10, and is determined to be statistically significant, then this would indicate that when youth are engaged in this aspect of work with data, then they are ten percentage points more likely to report a response in that particular profile.

For this question, models with the aspects of work with data both separate from and together with the youth characteristics were fit. The models with both together were also used as part of research question #4, though they are presented here (and interpreted in the sections for both results). In specific, mixed effects models, predicting the probability of membership in each of the six profiles as the dependent variable–using the work with data codes as predictors–were specified.

Because the results were found to be identical when the aspects of work with data and the youth characteristics were considered in separate and in the same model, the results from the two sets of variables being in the same model were used both to provide answers to both this and the next research question. Note that a composite for work with data (made as the sum of the individual aspects of work with data) was considered, but as it did only yield one (small) statistically significant result, the results for this analysis are not presented in the results.

4.5.6 Analysis for Research Question #5 (how youth characteristics relate to engagement)

This question is focused on how the relationships of work with data differ on the basis of youth characteristics. In particular, their pre-program interest, gender, and URM status are used as predictor variables. The same (mixed effects) models (and statistical software) used for the previous research question are used for this research question. The dependent variable was again the probability of a response being in the profile.

The three youth characteristics (pre-program interest in STEM, gender (entered s a dummy code with the value of “1” indicating female), and URM status (also entered as a dummy code, with “1” indicating a youth from a URM group) are added as predictors. Like for the previous research question, the statistical significance and the magnitude and direction of the coefficients associated with each predictor are interpreted to answer this question. For example, and similar to the interpretation of the predictors associated with RQ #3, if the relationship between pre-program interest and a profile is 0.05, then for each one-unit increase in pre-program interest, then youth are five percentage points more likely to report a response in a particular profile.

Models with the youth characteristics separate from and together with the aspects of work with data were fit. Like for the results of the previous question, the models only with the youth characteristics yielded very similar results. Thus, the models presented in the previous section with both youth characteristics and the aspects of work are interpreted as part of answering this question.

As described in the previous sub-section, because the results were very similar when the aspects of work with data and the youth characteristics were added in separate models compared to when they were included in the same model, the results for both sets of predictors in the same model are presented and interpreted. In addition, interactions between statistically significant aspects of work with data and all of the youth characteristics were examined, though because none of these interactions were found to be statistically significant, they were not included with the results.

4.6 Sensitivity Analysis

For observational studies, such as the present study, it can be important to determine how robust an inference is to alternative explanations. One approach to addressing this is sensitivity analysis, which involves quantifying the amount of bias that would be needed to invalidate an inference. Using the approach described in Frank, Maroulis, Duong, and Kelcey (2013), I carried out a sensitivity analysis for inferences made relative to significant relations. I used the R package konfound (Rosenberg, Xu, & Frank, 2018).

The result of the sensitivity analysis, and what was used to interpret and contextualize findings, is a numeric value, between 0 and 1, for each effect that indicates the proportion of the estimate that would have to be biased in order to invalidate the inference. A value close to 0 (such as .05) indicate that a tiny change in the size of the effect would change the inference made (i.e., a statistically significant result that is interpreted would no longer be interpreted as an effect). Larger values, such as values around .50, indicate that a substantial amount of an effect could be due to bias (i.e., less than 50% of an effect could be due to bias in the sample), but even still, the same inference about a statistically significant could be made, suggesting that such an effect is more robust than one with a smaller value.

I used sensitivity analysis to interpret and contextualize hypotheses about key relationships for research questions #4 and #5 for this study, for the relationships between the aspects of work with data and youth characteristics and the profiles of engagement. In particular, I carried out a sensitivity analysis for the coefficients that were statistically significant in order to provide some insight into how robust the results are. In addition, I carried out a sensitivity analysis for coefficients that were close to statistically significant but were not statistically significant, in order to better understand how little would need to change in order for an effect to be determined to be significant. Higher values from the analysis (i.e., values closer to 1) indicated more robust estimates in that the inferences would still hold even if there were substantial bias in the estimate and that were interpreted as robust findings, while lower values, when present, indicated less robust findings that I interpreted with more caution.