Topic 3 Z-scores
Z-score are transformations of the data to create some standardization.
Here is how we calculate Z-scores:
\[\begin{equation} z=\frac{x_i-\overline{x}}{sd} \end{equation}\]
It is important to note that Z-scores always have:
- Mean = 0
- SD = 1
The shape of the Z-score distribution equals the shape of the distribution of the original values. We can check this with SPSS using the commands from Labs 1 and 2.
3.1 The Z-distribution
What if our original distribution was a normal distribution?
Because Z-scores standardize the values of any distribution, we can use them to generalize the properties of ANY normal distribution.
This idea is fundamental for inferential statistics.
The Z-distribution, therefore, is a distribution of Z-scores created from the values of a perfectly normal distribution.
3.2 Important concepts
3.2.1 Percentiles
Remember what percentile meant when you took the SATs or ACTs?
\(95^{th}\) percentile = 750, means 95% of all values fall below 750.
Notation: The \(X{th}\) percentile of a given variable is often referred to as \(C_{X}\).
- E.g. \(C_{95}\). is the \(95{th}\) percentile of a given variable.
3.2.2 Quartiles
Equally divide the data into 4 percentiles: the \(25^{th}\) percentile; the \(50^{th}\) percentile; the \(75^{th}\) percentile; the \(100^{th}\) percentile.
Notation: There are only 4 quartiles. We define them as: \(Q_{X}\).
- 1st quartile = \(25^{th}\) percentile = \(Q_{1}\)
- 2nd quartile = \(50^{th}\) percentile = \(Q_{2}\)
- 3rd quartile = \(75^{th}\) percentile = \(Q_{3}\)
- 4th quartile = \(100^{th}\) percentile = \(Q_{4}\)
Note: Since the 4th quartile is the highest value, we are often more concerned about \(Q_{1}\), \(Q_{2}\) and \(Q_{3}\).
3.3 The empirical rules of the normal curve
Recall the rules of the normal curve from lecture (68%, 95% and 99%). Keep them in mind as we discuss Z-scores.
3.4 The Z-table
Because standard normal distributions are used very often, it is useful to have a table that summarizes its percentiles.
This is what Z-tables do. They give you the percentile for a given z value in a perfectly normal distribution. See the z-table at NYU classes.
3.5 Exercises
Part 1
Open the data set “Standard normal_N_10000.sav” from NYU classes.
This is a standard normal distribution with \(n=10000\). We will use it to illustrate the properties of the standard normal and to understand the Z-table.
Using SPSS, perform the following commands:
- Calculate measures of central tendency and measures of dispersion for the \(x\) variable.
- Check the histogram for the \(x\) variable. Be sure to include a normal curve above it.
- Create Z-scores for the values of \(x\). SPSS will create the variable \(Zx\).
- Calculate measures of central tendency and measures of dispersion for the \(Zx\) variable.
- Check the histogram for the \(Zx\) variable. Be sure to include a normal curve above it.
- Calculate the 90th, 95th, 97.5th and 99th percentile of the \(Zx\) distribution. Find those variables in the Z-table.
Part 2
Open the data set “earnings_data.sav” from NYU classes.
Perform the calculations in Part 1 for the variable \(wages\).