4.5 Look at the Top and the Bottom of your Data
It’s often useful to look at the “beginning” and “end” of a dataset right after you check the packaging. This lets you know if the data were read in properly, things are properly formatted, and that everything is there. If your data are time series data, then make sure the dates at the beginning and end of the dataset match what you expect the beginning and ending time period to be.
In R, you can peek at the top and bottom of the data with the head()
and tail()
functions.
Here’s the top.
> head(ozone[, c(6:7, 10)])
Latitude Longitude Date.Local
1 30.498 -87.88141 2014-03-01
2 30.498 -87.88141 2014-03-01
3 30.498 -87.88141 2014-03-01
4 30.498 -87.88141 2014-03-01
5 30.498 -87.88141 2014-03-01
6 30.498 -87.88141 2014-03-01
For brevity I’ve only taken a few columns. And here’s the bottom.
> tail(ozone[, c(6:7, 10)])
Latitude Longitude Date.Local
7147879 18.17794 -65.91548 2014-09-30
7147880 18.17794 -65.91548 2014-09-30
7147881 18.17794 -65.91548 2014-09-30
7147882 18.17794 -65.91548 2014-09-30
7147883 18.17794 -65.91548 2014-09-30
7147884 18.17794 -65.91548 2014-09-30
The tail()
function can be particularly useful because often there will be some problem reading the end of a dataset and if you don’t check that specifically you’d never know. Sometimes there’s weird formatting at the end or some extra comment lines that someone decided to stick at the end. This is particularly common with data that are exported from Microsoft Excel spreadsheets.
Make sure to check all the columns and verify that all of the data in each column looks the way it’s supposed to look. This isn’t a foolproof approach, because we’re only looking at a few rows, but it’s a decent start.