Topic 3 Z-scores

Z-score are transformations of the data to create some standardization.

Here is how we calculate Z-scores:

$\begin{equation} z=\frac{x_i-\overline{x}}{sd} \end{equation}$

It is important to note that Z-scores always have:

Mean = 0
SD = 1

The shape of the Z-score distribution equals the shape of the distribution of the original values. We can check this with SPSS using the commands from Labs 1 and 2.

3.1 The Z-distribution

What if our original distribution was a normal distribution?

Because Z-scores standardize the values of any distribution, we can use them to generalize the properties of ANY normal distribution.

This idea is fundamental for inferential statistics.

The Z-distribution, therefore, is a distribution of Z-scores created from the values of a perfectly normal distribution.

3.2 Important concepts

3.2.1 Percentiles

Remember what percentile meant when you took the SATs or ACTs?

$95^{th}$ percentile = 750, means 95% of all values fall below 750.

Notation: The $X{th}$ percentile of a given variable is often referred to as $C_{X}$ .

E.g. $C_{95}$ . is the $95{th}$ percentile of a given variable.

3.2.2 Quartiles

Equally divide the data into 4 percentiles: the $25^{th}$ percentile; the $50^{th}$ percentile; the $75^{th}$ percentile; the $100^{th}$ percentile.

Notation: There are only 4 quartiles. We define them as: $Q_{X}$ .

1st quartile = $25^{th}$ percentile = $Q_{1}$
2nd quartile = $50^{th}$ percentile = $Q_{2}$
3rd quartile = $75^{th}$ percentile = $Q_{3}$
4th quartile = $100^{th}$ percentile = $Q_{4}$

Note: Since the 4th quartile is the highest value, we are often more concerned about $Q_{1}$ , $Q_{2}$ and $Q_{3}$ .

3.3 The empirical rules of the normal curve

Recall the rules of the normal curve from lecture (68%, 95% and 99%). Keep them in mind as we discuss Z-scores.

3.4 The Z-table

Because standard normal distributions are used very often, it is useful to have a table that summarizes its percentiles.

This is what Z-tables do. They give you the percentile for a given z value in a perfectly normal distribution. See the z-table at NYU classes.

3.5 Exercises

Part 1

Open the data set “Standard normal_N_10000.sav” from NYU classes.

This is a standard normal distribution with $n=10000$ . We will use it to illustrate the properties of the standard normal and to understand the Z-table.

Using SPSS, perform the following commands:

Calculate measures of central tendency and measures of dispersion for the $x$ variable.
Check the histogram for the $x$ variable. Be sure to include a normal curve above it.
Create Z-scores for the values of $x$ . SPSS will create the variable $Zx$ .
Calculate measures of central tendency and measures of dispersion for the $Zx$ variable.
Check the histogram for the $Zx$ variable. Be sure to include a normal curve above it.
Calculate the 90th, 95th, 97.5th and 99th percentile of the $Zx$ distribution. Find those variables in the Z-table.

Part 2

Open the data set “earnings_data.sav” from NYU classes.

Perform the calculations in Part 1 for the variable $wages$ .