Our data are composed by four databases:
electricity_state_infobrings us all the information about electricity in each state of US. Those information are dated from 2002 to 2018.
generation_stateis a database containing the type of producer, the energy source and the quantity of electricity generation (in Megawatthours) from 1990 to 2018.
state_infois giving us all the econonomic aspects and other characteristics of each American state.
all_breakdownprovides all information about the power production from various power sources such as geothermal, biomass, biogas, hydro, wind and solar energy.
First, we need to clean those different databases.
state_info and in
electricity_state_info, we remove the dot at the beginning of each observation of the variable “state”. Then, we create a new variable in
electricity_state_infothat will tell us if the state is energy self-sufficient.
In the table
generation_state, there are only state abbreviations so we transform it into names and we add the region relative to the state.
We also add the region relative to each state in the
Again, we do the same operation to the
We create a new dataframe which is the jointure between
electricity_state_info in order to have all economic, social and energy information per state.
all_breakdown, we add new columns such as the year, the month, the day and the hour of the observations.