# 3 How to explore a “new” data set

By a “new” data set, I mean it is new to us, that is, we have never seen it before. To explore this new data set, we can follow these steps.

- Read the data into
**R**. - Find the dimensions of this data set by using
**dim()**. - Understand the structure of the data by using
**str()**. - See the first 6 rows of the data using
**head()**; see the last 6 rows of the data using**tail()**. - Find out the names of all the (column) variables in the data set. Pay attention to the variable with “ID” (or “id”) as part of its name, since this variable will be used when we want to
*join*this data set with another one. - Figure out the variables that of interest by reading the names. If the interesting variable is of
*categorical*type, then use**unique()**to find out all the possible values that the variable can take. If the interesting variable is of*continuous*type, then use**summary()**to look at the 5-number summary. - Use
**View()**to have a quick look at the whole data set.

Example:

```
rm(list=ls())
# Firstly, we must read data into R
# Here I will use fake data
fk_data <- data.frame(ob_id = 1:100,
l_lower_case = sample(letters, 100, replace = TRUE),
rand_number = rnorm(100),
l_upper_case = sample(LETTERS, 100, replace = TRUE),
true_or_false = sample(c("T", "F"), 100, replace = TRUE)
)
# Find the dimensions
d <- dim(fk_data)
# Find the structure
str(fk_data)
# See the first 6 rows
head(fk_data)
# See the last 6 rows
tail(fk_data)
# Find the column names
the_names <- names(fk_data)
# Find possible values of "l_lower_case"
possible_values <- unique(fk_data$l_lower_case)
# Find summary of "rand_number"
the_summary <- summary(fk_data$rand_number)
# View the data set
View(fk_data)
```