Chapter 5 Journal Entries

These journal entries are hilarious to look back on. My first journal entry confessed zero experience with R programming, and I’m writing this in a bookdown right now.

The journal entry assignment did teach/remind me a few things:

  • Spelling (and as a product, writing) is the bane of my existence.

  • Weekly assignments do not mesh well with my habit of never keeping a proper calendar.

  • Writing prompts, no matter how simple, are problematic for me to stay on track with.

  • Free-flow thoughts out of my mind are never elegant.

Journal Entry 1

  1. It’s astounding to me that statistics is a relatively newly explored field. I recognize an amount of bias on my end considering my undergraduate education was handled by K-State, but I’ve seen statistics as a field that everyone see’s relavent and imagined most universities studied the topic. My time in the corporate world did clue me in to the idea that data sciences isn’t necessarily valued by all companies, but I drew the conclusion that it was generalized ignorance, not a newer field lacking a long enough history to acquire ubiquitous respect.

  2. I’m struggling to conceptualize the possible depth of how data can organize itself. I’ve only ever used data as a tool and analyzed it using computer programs and mild human intervention with a sort of “shot gun” approach. I’m aware that the data I’ve looked at in the past had differences in their distribution, how they were read at the point of visualization, and their interpretation when context was applied. What I will struggle with, not just now but moving forward, is the idea that there are names for what I saw. Those names may affect my analytical approach and I can only hope I don’t develop any biases in the process.

  3. I’m learning R from scratch but have a background in multiple languages and experience with technology ranging from fixing hardware to building an operating system. I use a lot of tools for learning programming languages because I’m generally slow at getting a grip on them. One of the tools I use is AI for trivial code review, it’s not writing any code directly for me it just searches for minor mistakes I don’t want to spend hours finding myself. If I submit a screenshot ever and you see one or more AI tabs, that’s why.

Journal Entry 2

  1. Data isn’t necessary to make forecasts, however proper utilization of data can yield higher quality ones. That said, data on its own is completely useless. This is why we make use of statistical models, to take our mass of data, generate new points using the data for regression/forecasting, and make inferences using visualization. This idea is seemingly lost in a surprising amount of the population, and it is completely indiscriminant of academics versus industry specialists. Anecdotally, is appears fallacies surround data collection and analytics are abundant.

  2. Spatio-Temporal statistics makes sense from a practical standpoint. It’s more valuable to frame data against its real presentation in the world, (everything exists at a place and time), that much is clear. That said, why do we care to solve these problems with time and space conjoined? Why can’t we solve the questions we have within the context of space, then the context of time, then join them afterwards. Why is Spatio-Temporal a field of statistics rather than a method of visualization? What do we achieve by complicating our puzzles trying to solve them as one unit, rather than piecewise. Isn’t the entire purpose of developing calculation methods for mathematics to pull apart a complex problem and make it possible?

  3. I got the .kml to render properly for my data set. GPS is clearly inconsistent crossing from outside to inside. I’m interested to know how you’d tackle this problem in the context of an animal being tracked going from traveling across a field to hiding in a cave.

Journal Entry 3

The mathematics and theories behind applied statistics are important to understand practically and conceptually. Beyond that though, being able to hand calculate distribution equations has diminishing returns. The practice of manually computing these base concepts is a more involved way to understand what they measure and how to manipulate them. A workable understanding can still be achieved without rigorous practice of these hand calculations, since an in-depth knowledge of distribution theory and experimental design carries so much weight towards success.

  1. How is it that a model or machine learning program can be successful at prediction but not forecast or hindcast. It logically tracks that these have different timelines attached to them, forecasting or hindcasting being far further down the line than prediction. It just isn’t intuitive how a model can succeed short term but fail long term. There are lines of fit that can prove this concept simply, the larger the line the more its angle in relation to the data effects the end points. But wouldn’t the predictions have higher error anyway due to that angle? Would it not be best to only stick to models that can accurately perform forecast and hindcast when trying to do short-term predictions? Is the error associated with these short-term prediction models acceptable enough to proceed or resource efficient enough to justify their use?

  2. The format of instruction for the class drifts away from heavily teaching and practicing R, which is understandable. But if it’s possible to include a glossary of examples for utilizing the distribution models available in the PDF handout in R, that would be extremely helpful. The models are slowly clicking together conceptually so being able to read through a psuedo reference guide on how to use them would make that process smoother.

Journal Entry 4

  1. PDFs and PMFs follow a fairly set architecture, an expected value and a variance. They allow us to look at some relative spread of probabilities tied to data and make use of that information. Building them out to perform predictions is a core component of spatio-temporal statistics, but the primary skill involved is not knowing the more “manual-labor” end of mathematical statistics. It’s more important to understand how and why a function is built the way it is, for at least three reasons. The first is the immense value behind knowing what function to apply to what scenario based on the parameters of the situation. The second is that there is a level of mathematical maturity needed to intuitively recognize when a function is not working correctly when plugged into a program like RStudio. The third is that R is built off of individual developers producing and maintaining packages for certain use-cases, which can bar someone from being able to use a specific function if the package is no longer supported or poorly constructed. Knowledge of these functions is fundamental to effectively and efficiently performing good statistical science.

  2. Distributions have a very clear problem the more abstract the scenario is made. A single coin toss with two possible outcomes is very simple to understand. The coin has 50% odds to land on either heads or tails. This can be warped if accounting for other, unexpected parameters, and increasing the trials. The first situation sounds perfect for a Bernoulli Distribution, but if the coin is being tossed hundreds of times, the distribution function shifts. If a third outcome is added where the coin isn’t caught and instead falls into cracks in the floorboard below, the function has to shift again. If the coin is tossed by a different person every 3rd trial, there’s a level of uncertainty behind whether the way each person tosses the coin effects the probability of each of the possible outcomes.

  • This is a ridiculous example, but at some point, these problems have to have occurred in a study. A study on flying squirrels fails to account for their camera traps being located on the grounds of someone who frequently hunts at night. Or the camera trap itself is having hardware issues that causes it to delete every 10th picture which leads to a location being considered uninhabited. Is there a point where uncertain parameters’ effects become so undeniably strong that they just become their own pieces in the function?
  1. Statistics research as a whole is still vaguely confusing. How much does it rely on other fields performing studies to provide data? Is it crossing a boundary past doing statistical research to design a study around gathering biological data, and if so, does that imply that this field is entirely co-dependent on collaboration across departments?

Journal Entry 5

  1. Building a hierarchical model is extremely multi-faceted. It involves working from an end goal and addressing every question between the start of the project and that goal. A workable understanding of higher mathematics can be instrumental in making sure that this construction goes smoothly, more to the extent of knowing enough about fixing toilets to know when to call a plumber. There comes a point where the knowledge and skill with the math becomes beyond the reach of someone specializing in applied statistics. The format of hierarchical models takes pieces of different distributions and mathematical models, and stitches them together to address far more factors than achievable with standard practices.

  2. Bayes’ theorem makes conceptual sense, inferences on an unknown probability based around parameters potentially linked to that probability is semi-intuitive. It’s difficult to recognize how those pieces of the puzzle, the potentially linked events or parameters, are found. To recognize that a series of seemingly random car crashes in an area are potentially linked is not a massive leap of imagination, but the step beyond that where their link is conceptualized, quantified, and placed into a model is confusing. Making a link between ice cream sales and crime rate is a fun party trick and can be explained with intuitive thoughts about heat effecting crime. But it’s only fun and simple because it’s already been done, it took imagination to get there in the first place.

  3. If you have spare time in the schedule for the class, would it be possible to show a brief example detailing how difference equations and differential equations can yield similar effects on spatio-temporal research? I understood what you were saying with regards to difference equations being more workable for those who aren’t specializing in intensive mathematics, but its hard to understand why we wouldn’t always consult someone capable with differential equations if they have such stronger prevalence in other sciences.

Journal Entry 6

  1. In building a hierarchical model, it’s rational to make simplifying assumptions in order to start progress towards a more refined solution. In class we looked at the construction of a model to fit the growth rate of whooping cranes, which created increasingly complicated problems due to the simplicity of the model. It didn’t matter however, because by starting with that simplistic model we were able to build a rough idea of what the final product might look at, and arm ourselves with questions to make the final product better. This was actually very helpful in developing my final project. Instead of trying to pose a question that needed to be answered and wondering if it was feasible, I could instead pose the question and try to fit a model to it. If I could get the model to make partial sense then the project was worth digging further into.

  2. Parameter models are still a bit of a mystery. We defined parameter models briefly but there’s still a lot of ambiguity around what can be defined as a parameter model. It’s likely just a problem of lack of maturity in the field, but I’m wondering what the bounds are on parameter models since the data and process models seem to just fit into distributions. For instance, if I’m fitting a hierarchical model to give a Binomial response from a Poisson value that’s distributed based off of another function, is that a parameter or just a difference/differential equation I’ve fit to my process model?

  3. I don’t current have anything else I’d want you to know about. I cast my vote as yes for the questions and answers lecture.

Journal Entry 7

  1. Before working with the totality of the data provided or intended, it’s important to write out the model to rationalize its use and test its efficacy. This process of prior predictive analysis, while useful for checking if the model itself works for the purpose it was built, can be used to perform statistical analysis without very much data to begin with. Using limited data at best we can make projections as long as the model is constructed correctly. A digestible analogy would be to think of a line of fit on a standard scatter plot. That scatter plot sets the slop of the line and could, (to an extent), be draw past the boundaries of the plot. While it’s a simplification, the analogy captures the spirit of what’s being done with the Bayesian model. This prior predictive analysis is best done with limited information to ensure that the model being built is put together for the question or goal in mind rather than for the data available.

  2. In the R demonstration we show a process of generating simulated data but we mentioned the concept of intentionally cherry-picking the simulated data that fits our line closely or exactly. I’m still confused as to what the purpose of this is besides validating the model itself and quantifying the accuracy of its projections. Any purpose, beyond building a good estimate of how trustworthy our model is or restructuring our model as we discover more effective boundaries through the simulated data, is hard to understand. This also seems to contradict with the previous statements made in class about how more data fixes very little. If millions of simulations are run that doesn’t present as much different from gathering more data in the first place. The analysis and assumptions just lose error at what I’d assume is a high degree of diminishing return.

  3. I’m wondering what the limitations of Bayesian models are since they appear to be able to make any projection if framed well enough. It’s a little ridiculous to ask what can’t be done in a field that works with infinites but what are the scenarios where Bayesian models stop being the right answer and something else comes in?

Journal Entry 8

  1. Recognizing the purpose behind a study or experiment, and applying an appropriate model to fit that purpose is beyond just a proper statistical sciences practice or a source of job security. Technology, data acquisition, and other logistics can source major setbacks or errors throughout the process of trying to work on a study or experiment. These can’t always be resolved or should be resolved by the statistician and time efficiency occasionally plays out better through waiting for an issue to be resolved rather than working around it. The act of setting appropriate goals and working through a proper model for those goals isn’t generally set back by the previously stated sources. Resource efficiency is everything in the end and rarely do statisticians have enough to consider them infinite. We have to do what we can when we can and know when it’s time to either resort to previously understood, simpler methods, or step back and let something resolve itself before we continue.

  2. How does spatial statistics stand on its own? From everything we’ve observed in class so far it appears that space can be excluded and still yield appropriate statistical science, but when we look at just spatial data where does it become anything more than topography? Plotting percentage expected yield of crops across America based on soil-types seems more like a chemical problem fit to a map. Does it just become statistics the moment that we quantify uncertainty? It presents as inefficient to not consider time and other variables in a study or experiment.

  3. Considering R is not currently the most fantastic tool for use as a GIS, would it still be viable for proof-of-concept work? Would you recommend something else to conduct this?

Journal Entry 9

  1. The Gaussian process is a display of the ability to take data from a study or experiment and make inferences by quantifying the level of correlation between data points. This presents as a more mathematical and sound approach to the general idea that comparing apples to oranges isn’t appropriate. Assigning weights to our data based off of how relevant they are to one another and our overall inference is intuitive, but the math itself is a lot more intimidating than the concept. We were able to see in class how these correlations can be quantified then displayed on a graph, which seems to be a method of avoiding the bias of assuming certain points are or are not related.

  2. The lack of rigorous linear algebra experience is definitely creating a significant amount of grief in class for me. I understand enough of the fundamentals to recognize notation and perform computations, it’s the conceptual understanding of what is going on with the mathematics that’s problematic. For instance, it’s logical how a linear model works numerically and I can explain what it’s doing for us from a statistical sciences standpoint. What is actually happening in the vectors and matrices we’re working with is headache inducing to visualize. I’m struggling to recognize if I need more experience with linear algebra and linear modeling, or if there’s a link that’s going to be connected later on in my education.

  3. I’m not sure what distribution theory contains beyond just recognizing which distributions make sense where. If that’s all the topic contains then I feel confident, if there’s more to it than that I definitely need literature recommendations.

Journal Entry 10

  1. Linear modeling primarily refers to the format of equation behind the statistical predictions being made, with no express rule that the results have to be a straight line. We saw in class how starting with an intercept-only model is actually a good baseline with spatial data to create almost a “wish list” of what you’re looking to improve in your inferences, despite the model itself not being efficient at spatial statistics itself. Originally, I was struggling with the concept of intentionally using a model that is known to not be effective for a study, but looking back with this perspective it presents as an act perfectly in-line with the science we study. Making baseline assumptions is important and starting with something that at least fits one of those assumptions is like an author free-flowing thoughts in order to push past a writer’s block. It’s more important that we attack these questions we have piece-wise than to let perfect be the enemy of good enough.

  2. Linear models as a whole are still a massive pain point for me. I have yet to grasp, after two months of daily study, what is going on in most of the equations I encounter. If anything, the notation is what is causing the most damage. Looking at the equations themselves and noting what the underlying functions are measuring is working out, but keeping track of the substance of any individual vector or some co-efficient is not. The standard practice seems to be avoiding defining variables which is the opposite of what I got very used to in Biology. To that extent some of this science feels artificially difficult, but it may be some degree of arrogance that causes me to believe a written explanation would be sufficient to keep me on pace. Regardless, I need to continue studying as much as I can to pull up to the front with everyone else.

  3. I was wondering, at this point, between distribution theory, linear models, and missing data techniques, what other subjects are important to add to my study routine in reference to my career rather than my degree program?

Journal Entry 11

  1. I learned two things in regard to working on a project in R. The first is that the documentation for R is overall far less reliable than I’m used to with any other language. It takes only a few searches and some reading to grasp a concept in Java, Python, or C#. R on the other hand has most of its documentation tied up behind a strong knowledge-base in statistical sciences. While the language is simple and intuitive when you have that foundational understanding, it’s extremely problematic when you don’t. The second thing I learned is that the skill differential with my peers is strange. I think this is a product of that first point I made, many of my peers are far more worried about troubleshooting syntax errors and issues with visualization. I’m personally losing sleep over what models to work with and how they function since all I ever worked with professionally could be summarized as linear regression.

  2. Models. I’m not lost, I’m in a different zip code. I know I’m going to keep struggling for a while, but it’s difficult to stay calm about that when I feel unable to do research/class projects without at least knowing how to do something besides fit a basic linear model to a set of data and call it a day. At this point my anxiety with models is more based in an excitement to start actually doing statistics more than some level of worry for the future or imposter syndrome.

  3. I have nothing to add at this point in time.

Journal Entry 12

  1. I went back through everything that I tried with my code and learned a decent amount about what was going on. The code I was using was to supposedly create new data was actually splitting my data into randomly selected segments for the purpose of training a machine learning model. The only other action it performed was occasionally changing some of the data to similar but “false” data in the new strings for the same mission statement of training a machine learning program. The documentation I found this code string in was very non-descriptive about what the string was accomplishing so I associated its actions with what the documentation was about as a whole, logistic regression. I went back through and re-wrote my code to use this split data at the end as a form of model checking, whether or not that’s correct to a model check is something I assume I’ll learn later. The original documentation on this machine learning split led me down a further rabbit hole that may end up being very useful for the final project.

  2. My struggle point is very clearly identified at this point. I was staying away from spending a lot of time on programming per the advice of the lectures, but everything makes more sense when I put it in the context of code strings. I’m not going to suddenly understand the majority of the underlying math or statistics that’s used to develop these models. I just need to stick to what I’m good at while I reach a workable understanding of statistics, and what I’m good at is programming.

  3. I’m still not certain how I’m going to make a final project using hierarchical models when I have no real capability of building them, but I’ll make use of the resources around me until I can’t anymore.

Journal Entry 13

  1. It was indirect, but the major thing I learned in class what the value of reading R documentation to verify if the person who created the package actually understands the science they’re making use of. I’m used to documentation being as valuable as the person’s programming expertise, and I’m still going to proof check using that metric but the addition of checking to see if the person writing the documentation understands statistics is a something I’ll be using heavily. Everything else of significance came from office hours, I was humbled by the extent of distribution theory’s significance since I went on a slight rabbit hole with my project trying to solve a problem I had designed myself.

  2. I hesitate to say that I’m not struggling with anything new now. Still battling to get a better grip on distribution theory and math stats, but nothing is added to the roster of headaches right now.

  3. Twitter bots are very easy to make as it turns out.

Journal Entry 14

  1. I learned how important it is to be able to explain things without using jargon. Especially given the amount of confusion people experience when they look at statistics, it’s vital to be able to make everything make sense to someone who hasn’t put in the years of education you may have. Giving people a proper education like K-State does is still a good strategy, but not everyone is going to have that opportunity and it’s best I practice explaining things in a simpler format early.

  2. I had an idea of what my model was doing when I was originally working with it, now that it’s been edited by two professional statisticians, I have no clue what it’s doing under the hood and I’m happy with that for the time being.

  • But I have no idea what this thing is doing.
  1. I’m just going to take Aidan’s advice and make this my Master’s project. The more I work with it and consult other epidemiology researchers the better I feel about making it a full-time effort. It won’t end up in a model for human movement, but it might end up in a model for disease transmission across transient human populations and shifting networks.

Journal Entry 15

  1. Not everything is necessarily significant to a model, and after fitting your model it’s always a possibility that one or more parameters aren’t creating any relevant effect. It’s intuitive but it still hurts to realize that adding parameters can just have zero effect, especially when the implications behind that go against a logical hypothesis.

  2. Measuring the relevance of a model’s parameters can be done with mathematical methods and provide a numerical value to help see an actual break point of when it crosses past having an effect versus having no effect. I recognize that, I understand that’s something I’ll learn in STAT 705, but for now the only thing I know how to do is look at aa visualization of data/results and make inferences from my experience working with big data in a less statistical manner.

  3. I have nothing to add right now.

Journal Entry 16

  1. I’ve been reading a lot on disease modeling as part of my project and the lecture reinforced the findings from that reading. Ecological models and disease models parallel each other since they’re both looking at abundance and migration of some biological entity. Vector-borne diseases weren’t something I was considering when reading on this subject, but it logically tracks. When the pathogen requires a vector for its early lifecycle and transmission it’s not an issue of modeling disease spread as much as it’s an ecological model with an extra parameter.

  2. I’m learning SAS at the same time as R. I know there’s been discussion in the class that R isn’t the end-all-be-all of programming languages for statistics, it just tends to provide more functional freedoms on top of it being a free program. I’m wondering if there’s any inherent academic handicap to resolving to conduct all my analyses in SAS over R though, because as I work with the program more, I’m finding it to be a lot more reliable than R in terms of its capabilities. Pairing that with the issues surrounding R package upkeep and potentially having my early documentation for R completely invalidated due to its consistently changing environment. If my plan is to spend as little time dedicated to being an excellent programmer as possible, SAS seems like the better long-term option since hypothetically I only learn its features once.

  3. I have nothing extra to add.

Journal Entry 17

  1. If anything, I just learned how good the mgcv package is and how awful it is to troubleshoot. Getting the parameters of the study on English Grain Aphid to function well is a less than ideal experience, but with my limited experience in mathematical statistics the gam function is a nightmare to try to fix. I’m just throwing things at the function to see if they line up with what I’m expecting it to do, which is about as computer science student as it gets.

  2. After finding out why zero-inflated negative binomials are hard to work with, (as it was explained to me there isn’t a consistent link function?), I have a lot of questions on why any review board would ask why it wasn’t used over a zero-inflated Poisson.

  3. I need a degree of clarity on how I should be spending the time I use for my studies outside of course-work and research. I understand that I don’t need an extreme level of mathematical experience, but I can feel the handicap of not having learned anything beyond basic linear algebra. I recognize that my programming capabilities and research experience provide me with an advantage that I should use, but I continuously hear conflicting information on whether I should lean into those advantages or just let them exist as they are. I’m used to logical inconsistencies, I worked in corporate offices for two years, but usually there was a more direct line in the sand of when to ignore contradicting feedback. Everyone here has legitimate education and credentials so it’s a lot harder to categorize the input I get.

Journal Entry 18

  1. Regarding activity 3, I was considering changing the model from a relationship between grassland and aphid populations to some form of cultivated land. After the most recent lecture, I don’t think it’s relevant to change the raster layer. The observations we’re trying to make don’t have any weight towards learning more about their prevalence in cultivated vegetation but instead figuring out what factors beyond the obvious influence aphid populations. We already know that these aphids crowd around wheat, the only thing I could do is prop up a plot with cultivated land as a studied metric to highlight the relationship between grassland and aphid populations. I’m going to do that but it’s useful for future reference in consulting to see that it’s not necessary.

  2. Models like disease tracking get high yield out of just location and time. Factors beyond location and time have a sometimes negligibly small effect on the model. With that statement made, is most applied statistical research aiming to ensure that similar cases like disease don’t have other major predictive variables, or is it an effort to fine tune a model and make its predictive accuracy higher for its niche use-case?

  3. I have nothing extra to add.

Journal Entry 19

  1. I learned three things yesterday in relatively short succession. The Bayesian Hierarchical Framework is a format for declaring assumptions we’re using for the construction of our model, and I’ve been declaring research assumptions for almost a decade just never with mathematical rigor. Bookdown is poorly supported, and the developers do not understand a large amount of their own error codes. The internet is stuffed with a large amount of useful, highly specific data, with not nearly enough analysis being done on that data.

  2. Vocabulary is an issue for me. I have a lot of incorrect labels for concepts and techniques in my head and it’s creating persistent strife in communicating thoughts and goals with individuals possessing more formal experience with this science.

  3. I found a data reporting website with every departure port, end destination, route, and inspection report for every cruise voyage that’s had a norovirus outbreak since the construction of the VSP.

Journal Entry 20

  1. I didn’t necessarily learn much outside of how to defend making choices on final goals in research, but I was reminded of the most important component of the first step in troubleshooting. When trying to replicate a failure, has someone else replicate it too. It’s the entire reason we power cycle machines when we first start working on an error, we start at nothing and try to get back to where the other person was having difficulties.

  2. No egregious struggles within the class, at least not now that the bookdown website is letting me use it.

  3. I have nothing additional to add.

Journal Entry 21

  1. Covariates come in a lot more unexpected forms than I originally anticipated. Throughout the lecture I held a lot of concern over the fact that we couldn’t pin down fracking well locations, only general oil and gas wells. Adding time into the model and then looking at the predictions one decade apart was a much easier method of showing a strong correlation between fracking and earthquake occurrence than anything else I was trying to come up with.

  2. This is the first time I haven’t had any amount of authority on a topic we’re pulling data from to build a model. I contribute a lot of why I was coming up with very roundabout ways to confirm the hypothesis of fracking wells being a strong correlative agent for earthquakes to my lack of deeper understanding about the topic. I understand we as statisticians work with the subject matter experts reaching out to us to develop our model but it’s concerning to imagine that we either become very familiar with a multitude of other scientific topics over our career or we risk developing poorly performing models.

  3. I have nothing extra to add.

Journal Entry 22

  1. Statistical research, despite being founded in stronger logic than any other science I’ve worked with, is just as prone to having absolutely no real results from a study. I spent months on the cruise project and learned a lot personally in the process, but I don’t think it added much to human knowledge beyond confirming that population density does very little for disease prevalence.

  2. I’m struggling with GAM functions. I can tell that what they produce is a much more reliable and interpretable result from the GLM functions I used in the cruise study but getting them to work in the first place is an uphill battle, underwater, hands tied, blindfolded. It’s a major boon to see comprehensive documentation like what “mgcv” has, but it’s almost too quality of documentation because interpreting what it’s saying seems like something I’ll need 2 or 3 semesters of graduate statistics just to attempt semi-consistently.

  • I’m in no rush, it’s just disheartening to see a shiny toy and know that you’re not smart enough yet to play with it.
  1. Bookdown.org is down, someone probably didn’t submit a tax form, so that’s funny.

Journal Entry 23

  1. R packages for models/distributions are ideally just a method of saving time, while R packages that add functionality like sf that lets you use Rstudio as a GIS software are a more reasonable use-case for downloading a variety of packages.
  • I’ve been struggling with working on models beyond just basic linear models because I don’t have a strong enough background to write out the full extent of the model I’m trying to work with. Recently though I was able to write out a model for a problem I was trying to solve, and because I knew how to write it out, I spent a lot more of my time working with base R before I started looking for a package to solve a few tedious parts of my problem.

  • I only spent so much time in base R because of the recent earthquake example where we saw the value of using less packages to achieve our goal, building not only more reliable reproducibility of the results at a later date but also more understandable code for a variety of audiences.

  1. I’m struggling to see what expansion beyond machine learning models and changing the distribution of glm lm and gam I can really make to my repertoire of models in R. I don’t have much beyond that, it’s confusing to think that I can’t do much beyond that aside from just custom coding the parameters to trick the functions into doing more than their initial purpose.

  2. I have nothing extra to add.