1 Stlats: The guts of our players (Mostly done. Minor additions planned.)

1.1 Before we get started.

This chapter aims to provide some insights into the generation of a player’s stLats. A stLat being an underlying characteristic of the player that in all honesty, we can only guess what they mean.

StLats are as follows:

StLats
anticapitalism
baseThirst
buoyancy
chasiness
coldness
continuation
divinity
groundFriction
indulgence
laserlikeness
martyrdom
moxie
musclitude
bat
omniscience
overpowerment
patheticism
ruthlessness
shakespearianism
suppression
tenaciousness
thwackability
tragicness
unthwackability
watchfulness
pressurization
totalFingers
soul
peanutAllergy
cinnamon
fate
batStars
pitchStars

At the bottom, you’ll see two variables called batStars and pitchStars. Those are aggregates of certain stLats that are important for batting and pitching players. We’ll get to those formulas in the batting and pitching chapters, or as other relevancies become more relevant.

The general objective of this chapter is to show how the stLats relate, or don’t relate to each other. This will be done by looking at how stLats correlate.

1.2 Players and their stlats

You can use the search function to search for teams or player names. You can use the arrows at the top of each column to sort by least to greatest or vice versa. Use the horizontal bar at the bottom to scroll left and right to see other stlats.

By itself, there’s nothing much to see here unless you are interested in a particular player.

1.3 StLat Correlations

Now… There may be a lot of interest (or not) in how the stlats correlate with each other. Here is a list of the correlation between the appropriate stLats.

1.3.1 Correlations Table

You may notice this column on the right labeled p.BY. For those of you with some knowledge of statistics, this is for all intents and purposes a p-value, just a modified one. Details of that modification can be found in another appendix. Those of you less in the know, the smaller that p.BY is, the stronger the case that the two variables in a given row are related in some manner.

Click the arrows at the top of the p.BY column to sort the p-values from lowest to highest. Any p-value smaller than 0.001 gives us very strong evidence that the two variables are related, if not directly, then by some underlying mechanism.

There’s some stuff here that is uninteresting in an interesting way. If you sort by the p-value (lowest to highest), you’ll see that the top stLat correlations are between a stLat and one of the stars formulas. This really isn’t all that surprising given that the stLats are used to calculate the stars. We have the formulas!

Divinity, and ruthlessness. Thackwability and unthwackability. These are the nobility of pitching and batting stlats. After those 4, values for the kTau correlation plummet from 0.43 to 0.2. (Note to self… discussion about levels of relative strength using kTau/tau/\(tau\)).

The p-values skyrocket once to get past coldness versus pitchStars, and musclitude versus batStars in the sorted list. At this point in the list, you should probably ignore everything else. There are no other meaningful relationships unless we have reason to believe so. Many variables in the stars formulas aren’t showing up as relevant if we make a cut off here. Their contribution to the stars formula is pretty low apparently. For example, shakespearianism, it’s in the formula for pitch stars but it isn’t above this conservative cutoff for worthwhile correlations.

Basically, no stLats seem to be meaningfully correlated with any other stLats, excluding pitch and bat Stars but those are meta-stLats and not baseline one. The best correlation we get is indulgence with laserlikeness with a \(tau\) value of approximate 0.15 (not that high) and a BY p-value of 0.075 (too high).

1.3.2 Visualization of stLat relationships/correlations (or lackthereof)

I’ll take the highest 4 correlations of stLats with stLats that do not include bat/pitchStars and make scatterplots of them:

  1. indulgence and laserlikeness
  2. baseThirst and continuation
  3. patheticism and tenaciousness
  4. chasiness and cinnamon (that word is stupid, it makes my “fingers” cross)

Remember, these are scatterplots of the highest correlated stLats. Yeah… that’s just whitenoise. It turns out, any pair of baseline stLats should be considered completely uncorrelated.

If you want me to waste time making an interactive widget that lets you look at all stlats with stlats. By all means, let me know. You can find me in The Void.

1.4 Individual StLat Distributions

All the stLats have the same distribution. Each stLat is created in the exact same way. Let’s take 4 stLats from the scatterplots in the previous section. I’ll make univariate distribution plots of the four stLats and overlay them on top of eachother.

  • indulgence
  • patheticism
  • baseThirst
  • chasiness

These are violin plots, which are some fancy version of a boxplot. Don’t know when they got so trendy but I guess they can look nice.

You can click in the legend to remove one stLat or double click to isolate a stLat. Anyway, there really isn’t much to distinguish the distributions. There is something going on with patheticism on the right hand side but to my knowledge that’s because of peanuts or some other bullshit that messes with things. However, when stLats for players are originally generated, there really is no distinction between any stLats except for the name. Mostly… tragicness and a few others are kind of… weird.

If you want to see the distributions of the other stlats, your basically looking at them. Substitute and one name for another and not much changes. I might add the ability to see ALL the stlats but… that seems like time poorly spent at this point. Sorry!

TODO: TALK ABOUT FINGERS AND OTHER WEIRD ONES AT SOME POINT?

1.5 General Summary

Basically, stLats are hidden player attributes that make each player unique. They are not correlated with each other. They all are generated from the same number generation mechanism. This may sound boring or whatever. But that is fucking fantastic for modeling player performance. The ideal scenario is any variables involved are independent between each other. Which is what all of this indicates.

Usefulness in uselessness. :hearts:

1.6 Chapter Appendix: Some talk about correlations

There will be a lot more intense maths stuff in the appendix in the last chapter of the book that has to do with correlations. I’ll try to summarise.

1.6.1 Correlation Coefficients in general

Throughout this writeup, you will see that across many numerical variables a “correlation” is calculate.

What does this mean:

  • An objective formula was used to calculate the strength of the relationship between two variables; the resulting number is called the correlation coefficient.
  • Values for this correlation range from -1 to 1.
  • Values close to -1 or 1 imply a strong relationship between the two variables.
  • If the correlation is 1 or -1, the variables are essentially the same variable.
  • Values close to zero imply little to no relationship (USUALLY).

Correlation coefficients cannot, by themselves, prove that a causation based relationship exists. However, they do not disprove that either (in most traditional cases.)

The phrase “Correlation does not equal causation” has kind of fucked things over a bit. It’s great that that got learned and that people should be wary of correlation. However, this has led to the increased dismissal that correlations actually mean anything.

Correlations do mean something. Whether it is simply that two variables relate, or there really is something going on directly between two things. Sometimes two completely unrelated things will be strongly correlated, but have seemingly no valid connection.

Someone found a “strong” correlation (stupid low sample size though) between ice cream production and a specific type of crime. Ridiculous right? Of course ice cream doesn’t cause crime. But you should still wonder just what the hell is going on! I mean why the hell would that happen. Turns out, there’s a fair amount of research that indicates that crime rates increase in the summer. Well so does ice cream eating (I think). Of course, then you get deeper questions, well heat doesn’t cause crime but WHY are those two related? And dig deeper.

Don’t just dismiss information. Don’t just dismiss the absence of information. Read up on something called the survivorship bias. You supposedly have something resembling a brain in there. Use it. Don’t just throw some dumb rule you memorized at a piece of information and dismiss something. That’s just lazy. Also, do use a lazy rule sometimes too. Because sometimes prudence is the better option. You’ve got a mind in there. Use it. Decide when to use mental tools AND WHY.

However, all that aside…

Correlation can tell you when to dig deeper. With StLats that’s a bit of a problem. The stLats are thrown into an abyss and some output appears on your ticker on the website. That’s about the best we can say.

1.6.2 Many Correlation Coefficients: Kendall’s Tau (\(\tau\))

There are many measures of correlation out there. And you can find information about them in the appendix. The basics of it are:

  1. The most common correlation coefficient is called the linear correlation coefficient or Pearson’s correlation.
  2. This only measures the strength of a linear relationship.

When citing “correlations”, this document uses what is called Kendall’s \(\tau\). \(tau\) is just a measure of how well two variables relate to each other.

It follows those general rules:

  • Values for this correlation range from -1 to 1.
  • Values close to -1 or 1 imply a strong relationship between the two variables.
  • If the correlation is 1 or -1, the variables are essentially the same variable.

I am using Kendall’s \(\tau\) because it can measure the strength of a relationship that isn’t linear. Technically, it works accurately on “monotonic” relationships, of which I found some in the data.

\(\tau\) does have a something that approaches an intuitive interpretation:

  • Concordance: An increase in the \(x\) variable occurs with an increase of the \(y\) variable.
  • Discordance: An increase in the \(x\) variable occurs with a decrease of the \(y\) variable.
  • If \(\tau\) is zero, there were an equal number of concordances and discordances, so basically there isn’t distinctive pattern in increases and decreases of \(y\).
  • If \(\tau\) is positive, then \(\tau\) is the proportion of times that concordances happened more often than discordances which indicates that in general,i.e., increases in \(x\) are associated with increases in \(y\).
  • If \(\tau\) i negative, then \(|\tau|\) (absolute value) tells us the proportion of times that discordances happen more often than concordances, i.e., increases in \(x\) are associated with decreases in \(y\).