2 Data/Operation Abstraction Design

Five datasets were used in the data visualization design project. The “champions” dataset was created from researching the winning countries for each year of the Women’s World Cup (Credit: Wikipedia). The other data sets were gathered from github. Three files, wwc_outcomes, squads, and codes, were used in a TidyTuesday submission by Amanda Peterson Plunkett, originating from www.data.world. The fourth github file was obtained through a web search for country longitudes and latitudes.

Due to the nature of the data visualization designs used, there were transformations and manipulations, using dplyr, and joins necessary to be able to use the data productively for the data tables, line graph, and bar graph:

  • transformed and manipulated the squads data (Appendix E.1 and E.2)
  • transformed wwc_outcomes data to include country(Appendix E.3 and E.4)
  • transformed critique_wwc_outcomes_wcodes data to show countries and the summation of all their goals combined across all years of participation (Appendix E.5)
  • joined critique_wwc_outcomes_wcodes data with critique_top_countries data and changed the names of China PR and Chinese Taipei (Appendix E.6)
  • transformed critique_joined_wwc data to reflect total tournament goals by country and year (Appendix E.7)
  • transformed critique_plot data to reflect total goals over all years per country (Appendix E.8)
  • merged critique_plot data and critique_plot1 data, change a few of the column names, and arrange countries in alphabetical order (Appendix E.9)
  • manipulated critique_wwc_outcomes_wcodes data yet again and joined it with the champions data to reflect how many wins a country had a particular tournament and whether they were the champion (Appendix 10)

To develop the dataset needed for our map visualization, a dataset was filtered down and grouped to show the total number of goals that a country scored for a given year. The resulting data frame called “scores_by_team” was right joined to a mutated version of our “wwc_outcomes_wcodes” data frame to create the “scores_by_year” data frame. To fix an NA issue in the original data for Nigeria in the “Country” column, we replaced the NA value with “Nigeria” in records where the country’s abbreviation was “NGA”. Our “scores_by_year” data frame was then left joined with the “country_lats_longs” dataset pulled from Github to include the latitudes and longitudes to build the borders of the countries on the map. Certain country names were recoded to fall in line with the values present in the latitude and longitude dataset to complete the join properly. This resulting dataframe called “wwc_outcomes_wlatsandlongs” was finally left joined with the “world” data from the rgeos package to complete the accurate longitude and latitude data to map out our graph. A similar example generated by Sharp Sight labs in regard to oil production served as an effective springboard in the generation of our map.