# 4 Data Representation

Data representation refers to the process of presenting data in a visual or graphical format that makes it easier to understand and interpret. It is crucial to effectively represent data to communicate research findings, identify trends, and explore relationships between variables.

Examples of data representation techniques in educational psychology might include creating line graphs to show changes in student performance over time, using pie charts to compare student achievement across different demographic groups, or using scatter plots to explore the relationship between two or more variables in a study.

ggplot2 is a powerful data visualization package in R, created by Hadley Wickham. It is based on the Grammar of Graphics, a framework that allows you to build complex and customizable plots by layering components. ggplot2 enables the creation of a wide variety of visually appealing and informative graphics with a relatively concise and consistent syntax.

You can import or load your data as we discussed in Chapter 3. In these examples, we will be loading gapminder data set which is available as a package in R.

# Load required packages
library(ggplot2)
library(gapminder)

# Preview the dataset
head(gapminder)

## 4.2 Frequency Tables

A frequency table displays the number of occurrences (frequencies) for each category or value in a data set. It is particularly useful for summarizing categorical data or discrete numerical data.

# Load required packages
library(ggplot2)
#> Warning: package 'ggplot2' was built under R version 4.2.3
library(gapminder)
#> Warning: package 'gapminder' was built under R version
#> 4.2.3
# Create a frequency table for continent

## 4.6 Box Plots:

Box plots are used for visualizing the distribution of continuous or discrete numerical data. They show the median, quartiles, and outliers of the data, providing a compact and informative representation of the data distribution.

# Box plot of life expectancy by continent using ggplot2
ggplot(gapminder, aes(x = continent, y = lifeExp)) + geom_boxplot()

## 4.7 Scatter Plots

Scatter plots are used to display the relationship between two continuous variables. They can be particularly helpful in identifying trends, correlations, and potential outliers in the data.

# Scatter plot of life expectancy vs. GDP per capita using ggplot2
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) + geom_point() + scale_x_log10()

## 4.8 Line Graphs

Line graphs are used to display the relationship between a continuous variable and a discrete or ordinal variable, often representing change over time. They can be particularly useful for identifying trends and patterns in time-series data.

# Line graph of average life expectancy over time using ggplot2
gapminder_agg <- aggregate(lifeExp ~ year, data = gapminder, mean)
ggplot(gapminder_agg, aes(x = year, y = lifeExp)) + geom_line()