# Chapter 1 Introductory Statistics

Following packages and functions are used in this chapter:

## basic packages
library(knitr)
library(kableExtra)
library(tidyverse)
library(conflicted)
library(magrittr)
library(broom)
## paticular packages for this project
library(lmtest)
library(corrr)
library(tseries)
library(corrplot)
library(car)
library(perturb)
library(modelr)
source("../src/funcs.R")
source("../src/tests.R")
## Data and PSMs
source("../docs/census.R")
source("../docs/recs.R")
source("../docs/part.R")

### 1.0.1 Dataset

Example 1.1 (part) We will now look at the 0.01% subsample of the 1980 US census sample intro- duced in §3.1, with a view to explaining labor force participation. Here, we simply select all women of age 18 to 65, including those who work as well as those who do not work. This gives a sample of 7184 women.
Example 1.2 (census) We will investigate a 0.01% subsample of the 1980 US census. From this subsample, we select all women of age 18 to 65 who are in the labor force, and who gave positive values in response to questions about weeks worked, usual hours worked per week, and wage income for the year 1979. This reduces the sample to 3877 observations. For the moment, we will focus on just one variable: wage income. For data protection reasons, wage income has been top–coded at US\$ 75000 annually, meaning that any income higher than 75000 is truncated at that amount. As this truncation concerns only two women in the data, we will ignore it. We will look at the weekly wage, defined as the top-coded annual wage income divided by the number of weeks worked in 1979.

Example 1.3 (RECS) EIA administers the Residential Energy Consumption Survey (RECS) to a nationally representative sample of housing units. Traditionally, specially trained interviewers collect energy characteristics on the housing unit, usage patterns, and household demographics. For the 2015 survey cycle, EIA used Web and mail forms, in addition to in-person interviews, to collect detailed information on household energy characteristics. This information is combined with data from energy suppliers to these homes to estimate energy costs and usage for heating, cooling, appliances and other end uses — information critical to meeting future energy demand and improving efficiency and building design.