1.7 Describing data numerically
- Different measures can be computed to describe how data are distributed. These measures are classified in three categories:
- Measures of location
- Measures of variation
- Measures of shape
Location measure is understood as the “center” of the data, e.g. the average price of a new product, the middle price of a new product, or the most frequent price of a new product
The most widely used measures of the data “center” are the mean (average), the median, and the mode
The arithmetic mean (average) is simply a quotient of the sum of all values and the number of observations. Distinction between the sample mean (“x bar”) and population mean (Greek letter “mew”) is given
\[\begin{equation} \bar{x}=\displaystyle \frac{\sum_{i=1}^n x_i}{n}~~~~~~~~\mu=\frac{\sum_{i=1}^N x_i}{N} \tag{1.1} \end{equation}\]
The median is a measure of locatin which presents the “center” of the data. You can think of the median as the “middle point”, but it does not actually have to be one of the observed values. It is a number that separates ordered data into halves
Another measure of the center is the mode. The mode is the most frequent value. There can be more than one mode in a data set as long as those values have the same frequency and that frequency is the highest
The median is generally a better measure of the center when there are extreme values or outliers because it is not affected by the precise numerical values of the outliers.
An important characteristic of any set of data is the variation in the data. In some data sets, the data values are concentrated closely near the mean; in other data sets, the data values are more widely spread out from the mean. The most common measure of variation, or spread, is the standard deviation.