9 Introduction to R

You’ve made it this far. In theory you know how to collect your data now. You might have done that by conducting interviews or running a survey, or just by visiting an archive like the General Social Survey website. Having data is worth something, but it’s not worth everything. You have to do something with the data in order to answer any questions with it.

The rest of this book is focused on that goal - using the data you collected in order to answer the questions you want to be able to ask. A lot of the time we use statistics to answer those questions, at least partially. Sometimes we’ll use the basic calculations you probably did in a high school stats class and sometimes using something more complicated. Statistics is a substantial part of how social scientists know anything about the world. But this book wont focus on how to calculate a standard deviation by hand, because you don’t have to. It’s good for understanding what the measure means, but software can do that work for you lightning fast - the more important skill is knowing what to do with it once it is calculated.

The next set of chapters will all be structured the same way. The first half of the chapter will introduce a topic (in this chapter R and programming) and the second half will focus on examples and practice. You can read the first half without being concerned about the second half, and you can just go practice the second half if you already know everything you need to about the topic.

The second half of the chapter will generally repeat the material in two forms. I’ll describe all the steps involved in whatever we’re learning, and I’ll walk you through those steps using videos too. That gives you a few opportunities to see the material. If you get stuck practicing it’ll be frustrating. I still get frustrated pretty often when coding. What I would recommend is slowing down, looking back at what you did and just trying to reproduce exactly what is in the book as closely as possible.

9.1 Concepts

9.1.1 What is R

R is a programming language and environment for data analysis that is popular with researchers from many disciplines. R refers both to the computer program one runs, as well as the language one uses to alter data within the environment. R only speaks R, and so like traveling to a foreign country it is useful to learn the local language in order to communicate. You call yell at R in English as long as you want, but it can’t produce your data unless you ask correctly. Fortunately, R’s language is based on English and it wants to be as straightforward as possible

9.1.2 Why Use R?

There are other statistical packages that similar research methods classes use, including Stata and SPSS. One of the greatest benefits of R is the price: free. Access to Stata for a one semester class costs $45-125, and extended access costs more. And like Apple they update the software periodically, which means purchasing a new license. R is an open source software that anyone can use free of charge forever. That means whatever skills you learn you can continue to develop after the class ends.

Many people have access to Excel as a spreadsheet program through Microsoft Office, but R is faster and more flexible for data analysis. Excel is a drag-and-drop program that does not produce reproducible analysis. R, as a programming language, allows users to create a ‘script’ that the computer runs in order to output analysis. That means the script can be reusable, shareable, and iterative, which will have significant benefits if you continue with data analysis after the class. Luckily, R is a relatively straightforward introduction to programming.

Me justifying that you should learn to code because it will benefit you after the class and you can write something called a script probably sounds weird though. The majority of readers wont be interested in doing anything related to this class after the semester, and you have no idea what a script or reproducible analysis is. Using Excel would be more user friendly - there would be no language to learn, and the data you’re using is always right in front of you. I’ve done that before in a similar class, and actually using Excel as a tool is just as a difficult for beginners, and the ceiling on how useful it can be for working with data is considerably lower. Take this class as an opportunity and gentle introduction to a really valuable career skill: programming.

9.1.3 Why learn to program?

Data analytics is a quickly growing field with numerous job possibilities. The skills you learn in this class, if more fully developed, can be applicable to any industry, from Google to banks to government to a lemonade stand.

Computer programming is a flexible skill that can help you to manage laborious processes. It can stimulate creative thinking, grow your problem solving capabilities, and can help teach persistence. All of that with a valuable on the job market.

Data Scientist has been called the sexiest job of the 21st Century.

If you wont take my word for it, President Obama once stated that every kid should learn how to code/program.

Let’s give it a shot in this class, and see if it’s a skill you’d like to continue developing.

I’ll make one final argument in favor of coding. It’s a bit like doing magic of casting spells. You get to speak an arcane language that not everyone understands and when you do speak it things happen. If I write a statement like “a graph that shows the relationship between murder and assault rates for US states” the sentence does nothing. It just sits there, and you can read that sentence, but nothing happens. If I write a spell though like plot(USArrests$Murder, USArrests$Assault) suddenly it transforms into what I want.

Do you remember the movie the Sorcerer’s Apprentice? Mickey could have mopped the floor on his own, but that would be tedious. Instead he used magic to do it and because of that magic he was able to do hundreds of hours of manual work with the wave of a wand.

Unfortunately, that went badly in the movie. We have to be careful while coding or casting spells because having something get mistranslated might have unintended consequences. But it’s a more efficient way to use data and with a little practice you might amaze yourself with the things you can create.

9.2 Downloading R

9.2.1 The R Programming environment

R can be downloaded from the r-project.org

There is a link on the left. You’ll need to select a ‘mirror’ to download from. Don’t worry too much about that, the code for R is housed at multiple locations around the world so that it’s always available even if one site gets knocked off line. Generally, you should download from the location that is closest to you, but I have never noticed a difference. For New Orleans, that’s either Oak Ridge, Tennessee or Dallas, Texas. Click the link and follow the directions for installing the program.

9.2.2 R studio

The R package you just downloaded can fully operate on its own, but we want to download a second program (an additional integrated development environment) in order to make using R a little more straight forward. R Studio uses the R language while organizing our data sets, scripts, and outputs in a more user-friendly format. Luckily, it’s free too and can be downloaded from rstudio.com. Click the link for R studio Desktop and follow the prompts to install.

Note: in order for R Studio to operate R must also be loaded on the computer too. R can operate on its own, and you’re welcome to use it, but class examples will be shown using R Studio.

The following will walk you through all of those steps again.

9.2.3 Getting started in R studio

Let’s open R Studio and see what we have downloaded.

The program opens with 3 sections (or boxes) displayed, although there are four. If you click the small green button in the upper left, you can create several types of documents in R. Let’s open a script, which should now add our fourth section.

The upper left quadrant is called the script, which is where we can write out codes to be executed. You can enter the code without writing it out first, but by writing in the script we can be preserve and reuse it. If you’re going to use a line of code multiple times, it’s good to have it written because then you can re-execute it without re-writing it. Because scripts can get up into thousands of lines, it’s good to have everything written out so that it can be reviewed and checked. These are like the directions for a recipe we used in baking our data.

In order to execute code that you write in the script, you need to press the ‘run’ button in the upper-center of R studio. That’ll send the code to be executed and provide output below.

The bottom left is called the output. If you write the command 2+5 in your script and run it, that line of code and the result will appear in the output: 7. Any code you run will display itself processing in that section, and any statistics you produce will come at the end (like the answer to 2+5).

The upper right is called the environment, which is where data you have available to you in R will appear. You won’t see the data itself, but the environment gives you a record of everything that is available in R Studio for you at that moment. If you do want to see the actually data you have saved, you can type View() with the name of the data set.

The bottom right section actually has a few different uses, but we can concentrate now on the graphical output. If we produce a plot or graph of our data, that is where it will appear once the data has executed.

The picture below shows all 4 sections in use. You can see the brief script I wrote, the output of that script, the data I’m using (cars) and the graph I’ve created.

You can see those four sections described again below in this video.

9.3 Practice

So that’s an introduction to R Studio, and now you hopefully have it installed on your computer and have it open. They say that practice makes perfect, and that’s just as true with coding as anything else. No one is born knowing how to code. Its one of the best example I’ve seen of Malcolm Gladwell’s 10,000 hours, where the only way to get good is to keep trying. We’re almost at the end of your first hour (assuming you watched both those videos all the way through).

What we describe in this chapter wont be exciting. We can’t jump right into the type of coding that is going to instantly give you answers to researcher questions like what makes people happy, but these are the basic building blocks necessary to answer those questions. We’re at the point in learning a new language when we’re practicing words like “blue” and “shirt” and “I am”. One week of Spanish class doesn’t make you fluent, and this chapter wont make you a data scientist. But it is a necessary first step.

Let’s quickly review some of the things we can do in R that are most useful. I would recommend creating a new script (hint: top left corner) in R Studio and entering and running these commands as they are outlined. That applies to future chapters as well. As I run each command in the textbook you’ll both see the code that I ran and the output.

This chapter essentially describes in writing the contents of a video that is at the end of the chapter. You can use either source to learn the material, or both. However you prefer to learn, seeing it more directly demonstrated or reading the steps, the choice is yours. My goal is to provide as many resources as I can.

Reading a description of operations is a good start, but much of coding is muscle memory and takes practice at the syntax and structure of commands. As you enter the commands, try to tweak them and break them. Figure out what’s optional in what I’m writing and what’s necessary.

There is a tradition that you should first introduce yourself by saying “hello world”. In R, you can do that by writing it directly into the script and executing (clicking on the run button.

## [1] "hello world!"

Or you can save it by giving it a name. When you save something you create or read data into R it creates an object, which will appear under the environment on the top right of the screen.

## [1] "hello world!"

When you try to execute each command by hitting run, make sure that you’ve highlighted all of the code you want to enter. You can run thousands of line of code at any given time just by highlighting it. But unless you tell R that is the line of code you want to run, it wont do anything.

Let’s take a second to really break down both lines of code we just ran.

In the first one we’re saving or assigning the value “hello world!”. Each part of that code is playing a part, there’s nothing wasted when we’re writing code. hello is going to become an object, that’s what we’re creating with the line of code. The arrow (<-) tells R what we’re assigning to the “hello”. And the right side of the arrow is the value we’re creating. We can’t flip around the arrow, we can’t save values from the left side of an arrow to the right side, we’ll always use the same structure of working from left to right.

Each line of code generally includes a command, which is something you can tell R to do. For instance, print() is a command. R has it built into its system what to do if you tell it print(). But you also have to tell it what to do that to - what do you want to print? What you want to print is the object, in this case hello. You use a command to do an action to an object. You can think of commands as verbs and objects as nouns. Timmy runs. Print hello. runs(Timmy). print(hello).

We can also save numbers or anything else as an object in R. For instance, we can create an object named x and give it the value of 2.

## [1] 2

Or we can create an object y with multiple values. We can store lists as an object by placing c in front of parentheses. the letter ‘c’ stands for concatenate or combine.

## [1] 1 2 3

We can use R as a calculator by entering math equations:

## [1] 5

The reason to create objects is because we can then use them later without having to reenter their components. For instance we can multiply x by y using the values we supplied earlier.

## [1] 2 4 6

That’s sort of a brief introduction to creating data in R, the goal of this class isn’t to have you entering numbers line by line into .The whole benefit of using R is that it can work with entire data sets really efficiently. So let’s jump forward and talk about how to read data that we have external of R into R.

R actually comes with a lot of data sets built into it’s software. We’ll sometimes use those data to create examples. You can see the data that is present in R by writing data(). You can call one of those data sets out of the background into being used by writing data() with the name of the desired data set inside.

##                Murder Assault UrbanPop Rape
## Alabama          13.2     236       58 21.2
## Alaska           10.0     263       48 44.5
## Arizona           8.1     294       80 31.0
## Arkansas          8.8     190       50 19.5
## California        9.0     276       91 40.6
## Colorado          7.9     204       78 38.7
## Connecticut       3.3     110       77 11.1
## Delaware          5.9     238       72 15.8
## Florida          15.4     335       80 31.9
## Georgia          17.4     211       60 25.8
## Hawaii            5.3      46       83 20.2
## Idaho             2.6     120       54 14.2
## Illinois         10.4     249       83 24.0
## Indiana           7.2     113       65 21.0
## Iowa              2.2      56       57 11.3
## Kansas            6.0     115       66 18.0
## Kentucky          9.7     109       52 16.3
## Louisiana        15.4     249       66 22.2
## Maine             2.1      83       51  7.8
## Maryland         11.3     300       67 27.8
## Massachusetts     4.4     149       85 16.3
## Michigan         12.1     255       74 35.1
## Minnesota         2.7      72       66 14.9
## Mississippi      16.1     259       44 17.1
## Missouri          9.0     178       70 28.2
## Montana           6.0     109       53 16.4
## Nebraska          4.3     102       62 16.5
## Nevada           12.2     252       81 46.0
## New Hampshire     2.1      57       56  9.5
## New Jersey        7.4     159       89 18.8
## New Mexico       11.4     285       70 32.1
## New York         11.1     254       86 26.1
## North Carolina   13.0     337       45 16.1
## North Dakota      0.8      45       44  7.3
## Ohio              7.3     120       75 21.4
## Oklahoma          6.6     151       68 20.0
## Oregon            4.9     159       67 29.3
## Pennsylvania      6.3     106       72 14.9
## Rhode Island      3.4     174       87  8.3
## South Carolina   14.4     279       48 22.5
## South Dakota      3.8      86       45 12.8
## Tennessee        13.2     188       59 26.9
## Texas            12.7     201       80 25.5
## Utah              3.2     120       80 22.9
## Vermont           2.2      48       32 11.2
## Virginia          8.5     156       63 20.7
## Washington        4.0     145       73 26.2
## West Virginia     5.7      81       39  9.3
## Wisconsin         2.6      53       66 10.8
## Wyoming           6.8     161       60 15.6

We can view all of the data by writing View() and the name of the data set. View() is a command in R. It’s telling R what I want it to do, but I have to tell it what to do it to. I already have multiple objects loaded into R, specifically x and y and USArrests. I don’t want to see them all, I just want to see USArrests, so I have to enter that name into the command View(USA)

Not all data is built into R though. In fact, most of the data you’ll want to use in the real world isn’t already built in - the stuff R contains is really only useful for examples and basic practice. I’ll cover how to read in data from your computer in a later chapter, everything you need to run the examples in this book is saved online in an open directory on Github. Github is a free and open source website where people can share data. I post the data you’ll need there to make it easy to access. You can see all of the data that is currently available by following this link: https://github.com/ejvanholm/DataProjects

You can read in one of those data sets with the command read.csv() and the web link to the raw data.

##     X district                          school         county grades
## 1   1    75119              Sunol Glen Unified        Alameda  KK-08
## 2   2    61499            Manzanita Elementary          Butte  KK-08
## 3   3    61549     Thermalito Union Elementary          Butte  KK-08
## 4   4    61457 Golden Feather Union Elementary          Butte  KK-08
## 5   5    61523        Palermo Union Elementary          Butte  KK-08
## 6   6    62042         Burrel Union Elementary         Fresno  KK-08
## 7   7    68536           Holt Union Elementary    San Joaquin  KK-08
## 8   8    63834             Vineland Elementary           Kern  KK-08
## 9   9    62331        Orange Center Elementary         Fresno  KK-08
## 10 10    67306     Del Paso Heights Elementary     Sacramento  KK-06
## 11 11    65722       Le Grand Union Elementary         Merced  KK-08
## 12 12    62174          West Fresno Elementary         Fresno  KK-08
## 13 13    71795          Allensworth Elementary         Tulare  KK-08
## 14 14    72181      Sunnyside Union Elementary         Tulare  KK-08
## 15 15    72298            Woodville Elementary         Tulare  KK-08
## 16 16    72041         Pixley Union Elementary         Tulare  KK-08
## 17 17    63594     Lost Hills Union Elementary           Kern  KK-08
## 18 18    63370   Buttonwillow Union Elementary           Kern  KK-08
## 19 19    64709               Lennox Elementary    Los Angeles  KK-08
## 20 20    63560               Lamont Elementary           Kern  KK-08
## 21 21    63230    Westmorland Union Elementary       Imperial  KK-08
## 22 22    72058        Pleasant View Elementary         Tulare  KK-08
## 23 23    63842          Wasco Union Elementary           Kern  KK-08
## 24 24    71811           Alta Vista Elementary         Tulare  KK-08
## 25 25    65748     Livingston Union Elementary         Merced  KK-08
## 26 26    72272       Woodlake Union Elementary         Tulare  KK-08
## 27 27    65961         Alisal Union Elementary       Monterey  KK-06
## 28 28    63313          Arvin Union Elementary           Kern  KK-08
## 29 29    72199    Terra Bella Union Elementary         Tulare  KK-08
## 30 30    72215               Tipton Elementary         Tulare  KK-08
## 31 31    68379           San Ysidro Elementary      San Diego  KK-08
## 32 32    75440                 Soledad Unified       Monterey  KK-08
## 33 33    64816        Mountain View Elementary    Los Angeles  KK-08
## 34 34    66050      King City Union Elementary       Monterey  KK-08
## 35 35    67819    Ontario-Montclair Elementary San Bernardino  KK-08
## 36 36    64758           Los Nietos Elementary    Los Angeles  KK-08
## 37 37    65870               Winton Elementary         Merced  KK-08
## 38 38    62380          Raisin City Elementary         Fresno  KK-08
## 39 39    68999      Ravenswood City Elementary      San Mateo  KK-08
## 40 40    63578 Richland-Lerdo Union Elementary           Kern  KK-08
## 41 41    72538               Oxnard Elementary        Ventura  KK-08
## 42 42    65680              El Nido Elementary         Merced  KK-08
## 43 43    63461              Fairfax Elementary           Kern  KK-08
## 44 44    63404         Delano Union Elementary           Kern  KK-08
## 45 45    67199               Perris Elementary      Riverside  KK-06
## 46 46    65078          Valle Lindo Elementary    Los Angeles  KK-08
## 47 47    69369      Alum Rock Union Elementary    Santa Clara  KK-08
## 48 48    63438               Edison Elementary           Kern  KK-08
## 49 49    63321     Bakersfield City Elementary           Kern  KK-08
## 50 50    69450    Franklin-McKinley Elementary    Santa Clara  KK-08
## 51 51    64592            Hawthorne Elementary    Los Angeles  KK-08
## 52 52    65193           Chowchilla Elementary         Madera  KK-08
## 53 53    66142         Salinas City Elementary       Monterey  KK-06
## 54 54    69120   Santa Maria-Bonita Elementary  Santa Barbara  KK-08
## 55 55    65110        Whittier City Elementary    Los Angeles  KK-08
## 56 56    64477       Eastside Union Elementary    Los Angeles  KK-08
## 57 57    64691             Lawndale Elementary    Los Angeles  KK-08
## 58 58    67421                Robla Elementary     Sacramento  KK-06
## 59 59    66191     Santa Rita Union Elementary       Monterey  KK-08
## 60 60    72561                  Rio Elementary        Ventura  KK-08
## 61 61    72157     Strathmore Union Elementary         Tulare  KK-08
## 62 62    67397     North Sacramento Elementary     Sacramento  KK-06
## 63 63    66423              Anaheim Elementary         Orange  KK-06
## 64 64    63974        Lemoore Union Elementary          Kings  KK-08
## 65 65    63875         Armona Union Elementary          Kings  KK-08
## 66 66    63339            Beardsley Elementary           Kern  KK-08
##    students teachers calworks    lunch computer expenditure    income
## 1       195   10.900   0.5102   2.0408       67    6384.911 22.690001
## 2       240   11.150  15.4167  47.9167      101    5099.381  9.824000
## 3      1550   82.900  55.0323  76.3226      169    5501.955  8.978000
## 4       243   14.000  36.4754  77.0492       85    7101.831  8.978000
## 5      1335   71.500  33.1086  78.4270      171    5235.988  9.080333
## 6       137    6.400  12.3188  86.9565       25    5580.147 10.415000
## 7       195   10.000  12.9032  94.6237       28    5253.331  6.577000
## 8       888   42.500  18.8063 100.0000       66    4565.746  8.174000
## 9       379   19.000  32.1900  93.1398       35    5355.548  7.385000
## 10     2247  108.000  78.9942  87.3164        0    5036.211 11.613333
## 11      446   21.000  18.6099  85.8744       86    4547.692  8.931000
## 12      987   47.000  71.7131  98.6056       56    5447.345  7.385000
## 13      103    5.000  22.4299  98.1308       25    6567.149  5.335000
## 14      487   24.340  24.6094  77.1484        0    4818.613  8.279000
## 15      649   36.000  14.6379  76.2712       31    5621.456  9.630000
## 16      852   42.070  24.2142  94.2957       80    6026.360  7.454000
## 17      491   28.920  11.2016  97.7597      100    6723.238  6.216000
## 18      421   25.500   8.5511  77.9097       50    5589.885  7.764000
## 19     6880  303.030  21.2824  94.9712      960    5064.616  7.022000
## 20     2688  135.000  23.4375  93.2292      139    5433.593  5.699000
## 21      440   24.000  34.7727 100.0000       69    5725.563  7.941000
## 22      475   21.000  21.6495  91.5464       53    4542.105  9.630000
## 23     2538  130.500  18.9111  70.8167      169    5107.086  7.405000
## 24      476   19.000  43.8559 100.0000        0    4659.662  9.630000
## 25     2357  114.000  16.8010  90.6237      216    4555.464  8.019000
## 26     1588   85.000  22.4072  85.1472      198    5415.153  8.523000
## 27     7306  319.800  17.0015  88.0349      742    4997.872  7.983181
## 28     2601  135.000  15.0711  92.1953      269    5223.912  7.305000
## 29      847   44.000  16.2928  90.2007       67    5139.165  8.934000
## 30      452   22.000  14.4989  81.0235       55    4614.252  8.554000
## 31     4142  201.000  35.5625  81.5065      569    5342.233  6.613000
## 32     2102   99.750  15.3199  90.2849      224    5347.458 12.409000
## 33    10012  464.900  29.7639  91.5934      721    5036.459  8.126616
## 34     2488  125.000  12.6920  55.0930      202    5117.142 11.431000
## 35    25151 1186.700  17.4426  80.1956     1713    5117.040 11.722225
## 36     2267  103.680  19.1517  84.4338      177    5272.192 11.332500
## 37     1657   90.400  28.8473  84.7314      204    5225.719  9.598000
## 38      284   17.500  14.5270  94.9324       18    6516.533 14.558000
## 39     5370  280.000  19.5717  81.1173      562    4559.177 22.059999
## 40     2471  121.860  23.7960  87.7782      275    5119.158  9.709000
## 41    15386  669.360  12.5064  71.4331     1762    5338.186 11.482944
## 42      184    9.000  22.2826  85.8696       40    5090.045  8.178000
## 43     1217   61.400  33.9080  88.0952       78    5485.496  8.174000
## 44     6219  268.000  21.4986 100.0000      571    4793.370  7.500000
## 45     4258  221.000  24.6595  92.7431      324    5092.917 10.050500
## 46     1235   53.000   8.2183  62.1302      175    4359.521  7.332000
## 47    16244  766.650  17.4531  69.7399     1423    5645.496 12.581577
## 48      814   39.000  19.7789  67.4447       85    4518.016 15.177000
## 49    27176 1429.000  39.2184  84.2950     3324    5864.366 12.109128
## 50    10696  487.970  22.1578  70.8957     1306    5257.997 11.785000
## 51     8935  444.500  36.9451  79.5120      786    5016.692 14.062000
## 52     1600   74.500  27.2614  79.7255      242    4720.086 10.472000
## 53     9028  449.920  13.9885  69.9425      669    5470.562 13.405117
## 54    10625  521.470   7.4975  77.5256      896    5615.442 12.301800
## 55     7151  318.580  18.5481  61.7382      560    5245.440 15.404071
## 56     2404  105.000  20.9274  52.9435      202    4838.175 13.762000
## 57     5804  283.150  28.3425  81.1165      480    5367.988 14.184000
## 58     2253  112.650  43.4976  84.5539      196    5526.237  8.865000
## 59     2807  126.120  15.4595  52.3964      152    4353.020 12.997000
## 60     3074  142.550  11.2898  66.1949      249    5034.290 11.592000
## 61      723   37.120  25.9211  83.1579       45    4692.494  8.279000
## 62     5138  290.775  58.7522  84.9990      560    5606.782 10.905643
## 63    20927  953.500  10.9114  82.3926     1048    4969.181 13.400625
## 64     3017  138.500  14.9768  56.0636      496    4675.675 11.081000
## 65      957   50.000  27.1682  82.1317      149    5306.133  9.082000
## 66     1639   90.500  46.8882  80.6042      287    5693.883 13.390000
##      english  read  math
## 1   0.000000 691.6 690.0
## 2   4.583333 660.5 661.9
## 3  30.000002 636.3 650.9
## 4   0.000000 651.9 643.5
## 5  13.857677 641.8 639.9
## 6  12.408759 605.7 605.4
## 7  68.717949 604.5 609.0
## 8  46.959461 605.5 612.5
## 9  30.079157 608.9 616.1
## 10 40.275921 611.9 613.4
## 11 52.914799 612.8 618.7
## 12 54.609932 616.6 616.0
## 13 42.718445 612.8 619.8
## 14 20.533880 610.0 622.6
## 15 80.123260 611.9 621.0
## 16 49.413143 614.8 619.9
## 17 85.539719 611.7 624.4
## 18 58.907364 614.9 621.7
## 19 77.005814 619.1 620.5
## 20 49.813988 621.3 619.3
## 21 40.681820 615.6 625.4
## 22 16.210526 619.9 622.9
## 23 45.074863 622.9 620.6
## 24 39.075630 620.7 623.4
## 25 76.665253 619.5 625.7
## 26 40.491184 625.0 621.2
## 27 73.720230 620.4 626.0
## 28 70.011536 616.5 630.4
## 29 55.962215 620.1 627.1
## 30 11.061947 627.9 620.4
## 31 80.420090 620.4 628.7
## 32 63.130356 623.0 626.9
## 33 65.121857 620.8 629.8
## 34 53.416401 626.1 625.6
## 35 49.823071 625.4 626.8
## 36 35.465370 625.4 628.2
## 37 56.125526 623.6 630.2
## 38 32.394367 628.9 625.3
## 39 65.512100 624.4 630.1
## 40 53.055443 627.5 627.1
## 41 49.642532 627.8 628.7
## 42 45.108696 621.6 635.2
## 43 30.320459 629.4 627.7
## 44 52.243126 621.1 636.2
## 45 36.801315 626.5 631.0
## 46 30.283400 630.2 629.4
## 47 49.864567 629.5 631.2
## 48 13.759213 631.9 628.9
## 49 28.863705 631.6 629.5
## 50 52.804787 628.5 632.6
## 51 44.085060 628.4 633.7
## 52 35.250000 635.7 627.1
## 53 37.494461 633.0 630.7
## 54 50.390591 629.6 634.2
## 55 31.072578 634.2 629.7
## 56 18.261232 633.5 630.5
## 57 34.700207 631.4 633.0
## 58 33.288948 637.5 627.0
## 59 33.487709 637.3 627.6
## 60 38.158752 633.2 632.5
## 61 36.929462 629.2 636.7
## 62 32.989491 630.3 635.8
## 63 58.216656 629.6 636.7
## 64 17.003647 634.4 632.9
## 65 17.659353 634.7 633.1
## 66  7.321537 638.4 629.6
##  [ reached 'max' / getOption("max.print") -- omitted 354 rows ]

We can also see the top few lines of a data set using head(), with the general default being to show 5 or 6 lines of the data. I can also see the bottom using tail()

##   X district                          school  county grades students
## 1 1    75119              Sunol Glen Unified Alameda  KK-08      195
## 2 2    61499            Manzanita Elementary   Butte  KK-08      240
## 3 3    61549     Thermalito Union Elementary   Butte  KK-08     1550
## 4 4    61457 Golden Feather Union Elementary   Butte  KK-08      243
## 5 5    61523        Palermo Union Elementary   Butte  KK-08     1335
## 6 6    62042         Burrel Union Elementary  Fresno  KK-08      137
##   teachers calworks   lunch computer expenditure    income   english  read
## 1    10.90   0.5102  2.0408       67    6384.911 22.690001  0.000000 691.6
## 2    11.15  15.4167 47.9167      101    5099.381  9.824000  4.583333 660.5
## 3    82.90  55.0323 76.3226      169    5501.955  8.978000 30.000002 636.3
## 4    14.00  36.4754 77.0492       85    7101.831  8.978000  0.000000 651.9
## 5    71.50  33.1086 78.4270      171    5235.988  9.080333 13.857677 641.8
## 6     6.40  12.3188 86.9565       25    5580.147 10.415000 12.408759 605.7
##    math
## 1 690.0
## 2 661.9
## 3 650.9
## 4 643.5
## 5 639.9
## 6 605.4
##       X district                    school      county grades students
## 415 415    69682 Saratoga Union Elementary Santa Clara  KK-08     2341
## 416 416    68957    Las Lomitas Elementary   San Mateo  KK-08      984
## 417 417    69518      Los Altos Elementary Santa Clara  KK-08     3724
## 418 418    72611    Somis Union Elementary     Ventura  KK-08      441
## 419 419    72744         Plumas Elementary        Yuba  KK-08      101
## 420 420    72751      Wheatland Elementary        Yuba  KK-08     1778
##     teachers calworks   lunch computer expenditure   income   english
## 415   124.09   0.1709  0.5980      286    5392.639 40.40200  2.050406
## 416    59.73   0.1016  3.5569      195    7290.339 28.71700  5.995935
## 417   208.48   1.0741  1.5038      721    5741.463 41.73411  4.726101
## 418    20.15   3.5635 37.1938       45    4402.832 23.73300 24.263039
## 419     5.00  11.8812 59.4059       14    4776.336  9.95200  2.970297
## 420    93.40   6.9235 47.5712      313    5993.393 12.50200  5.005624
##      read  math
## 415 698.9 701.7
## 416 700.9 707.7
## 417 704.0 709.5
## 418 648.3 641.7
## 419 667.9 676.5
## 420 660.5 651.0

When you’re first learning to write code it’s useful to write it in an R script, as I’ve demonstrated above. That gives you the opportunity to practice and re-write your code, and copy what you did earlier for later projects. I often recycle code, borrowing it from one project to the next where it’s necessary. Coding very much follows the adage that ‘the only rule is that it has to work’. There aren’t awards for writing the most new code, just that the code you use works. Steal from yourself. Look on the internet and find code that can answer your problems. Coding is all about getting the most output for the least work.

9.3.1 Getting Help

One of my favorite things about R is how much information there is online to help someone with problems. If you feel stuck, googling “introduction to R Studio” will produce thousands of links, and if you want to search “how to create a plot in R” you’ll find lots of help from a really engaged community. If this is your first time opening R it’s probably overwhelming, but the best way to move forward is to practice. As the semester goes we will get more comfortable.

I hardly make it through a project without searching for an answer to something. And there are some commands I just haven’t memorized. There are a few on post-it notes stuck around my computer screen, and there are others I have to search every few weeks (“how to remove duplicate results”). The goal of learning R isn’t to immediately memorize every command, it’s to know what’s possible in R. And as you get more comfortable, more will become possible.

A few things that will probably trip you up. R is finicky about spelling and capitalization. It only knows how to do things if you spell them exactly right. It’s not going to interpret what you say if you forget a letter in the command. When you get error messages, and you will, read the command you wrote closely. The error message probably wont make sense to you, but it’s trying to tell you that it doesn’t understand what you want it to do.

Here is all of the code we executed in this chapter.