第 7 章 Data Frame

7.2 Develop a Project

目的:建立全班線上通訊錄並進行基本分析

Lay out your strategy

Keys: 全班, 線上,通訊錄

Key Strategy
全班 使用Google Form給大家自己填)
線上 使用Google Form
通訊錄 欄位:姓名,連絡email,…,etc
基本分析 使用R

Google G Suite

A brand of cloud computing, productivity and collaboration tools, software and products developed by Google.

In this course, we will learn to use R to interact with:

  • Google Form

    • Google Sheets

    • Google Calendar

    • More if time allows.

Google Form是常用的線上調查表單功具(建立方法,詳見Google Form製作)。

請連到以下表單填寫個人資訊: Google Form通訊錄調查單

Retrieve useful packages

要在R裡分析Google調查結果你需要R做哪些事? Try google search: google sheets R.

Import Google Sheets

library(googlesheets)

This package only works with those google sheets that belong to you, which should be in your Google Drive. So you need to

  1. Open https://docs.google.com/spreadsheets/d/1mC9bnxj11NCNoOCw0Vmn4nxERbHtLjeGo9v9C9b2GDE/edit#gid=1783332305

  2. File -> Add to My Drive…

To get google sheet link:

Gain access authorization to your sheets

gs_auth(new_user = TRUE)

Download Google sheets

gsSurvey<-gs_key("1mC9bnxj11NCNoOCw0Vmn4nxERbHtLjeGo9v9C9b2GDE")  #download sheet information as an object
classSurvey<-gs_read(gsSurvey,ws=1) #access the 1st worksheet

Check the help of the above two functions: gs_key() and gs_read()

If you want to share your google sheet for others to access, you need to do the following two things in your google sheets:

  1. File -> Publish to the web…

  2. File -> Share…

7.3 Understand your packages

From RStudio help

Clik the package under Packages tab.

Look for User guides, package vignettes and other documentation.

Google the package

Google package name and R together.

Try google search: googlesheets R.

7.4 Data Frame

  • Like a spreadsheet (試算表). Each row represents an observation, while each column represents a variable (變數, such as Timestamp, Email Address, 姓名, 學號, 居住地行政區, 性別, 本學期學分數, 本學期目前已參加之課外活動).
class(classSurvey)

7.4.1 Create a data frame

Manual creation

  • data.frame()

    • (...,stringsAsFactors) : All character vectors are defaulted to be parsed as factors.
StuDF <- data.frame(
  StuID=c(1,2,3,4,5),
  name=c("小明","大雄","胖虎","小新","大白"),
  score=c(80,60,90,70,50)
  )
StuDF 

7.4.2 Column/Row names

Column names

names(StuDF) 
colnames(StuDF)

names() is more general than colnames, which can be applied to objects other than data frames.

檢查classSurvey的變數名稱。

Row names

rownames(StuDF)

檢查classSurvey的rownames。

7.4.3 Extract observations: numerical/logical index

by matrix index

StuDF[1,2]
StuDF[,2]
StuDF[1,]
StuDF[c(1,4),]
StuDF[c(1,4),c(2,3)]

Compare the following two

StuDF[c(1,4),]
StuDF[-c(1,4),]

If you are selecting consecutive rows/columns, you can use : operation.

c(1:3) # same as c(1,2,3)
c(5:7) # same as c(5,6,7)
c(1,5:7,10) #same as c(1,5,6,7,10)

請使用matrix index來產生head(classSurvey)的結果。

by TRUE/FALSE

StuDF[c(T,F,F,F,F),c(F,T,F)]
StuDF[c(T,F,F,T,F),]
  • You can write TRUE as T, and FALSE as F; but not True, true, TRue, False, false, FAlse, etc.

  • When apply sum() to a logical vector, it gives you the total number of TRUEs.

a<-c(T,T,F,F,T,F,F)
sum(a)

Reproduce StuDF[1,] and StuDF[c(1,4),c(2,3)] using T/F method.

7.4.4 Relational operators

It is common to ask the relation between two values, such as “is A larger than B?” In R, the followings are available relational operators:

Relational Operators
Operator Description
< Less than
> Greater than
<= Less than or equal to
>= Greater than or equal to
== Equal to
!= Not equal to
範例

“誰的分數大於等於80?” “小新的分數幾分?”

(StuDF$score >= 80)
(StuDF$name == "小新")
  1. 找出StuDF分數大於等於80的人名。
  2. 找出小新的分數。

which() return the locations of TRUE elements in a logical vector.

which(StuDF$score >= 80)
which(StuDF$name == "小新")

7.4.5 Logical operators

Sometimes we want to join several relational results together. It can be done through the following logical operators:

Logical Operators:
Operator Description
! Logical NOT
& Element-wise logical AND
&& Logical AND
| Element-wise logical OR
|| Logical OR
範例
(classSurvey$性別 == "男")

(classSurvey$本學期學分數 > 20)

(classSurvey$性別 == "男" | 
    classSurvey$本學期學分數 > 20)
  1. 找出「性別為男」且「本學期學分數>26」的人名。
  2. 找出住在台北市的人。
  3. 找出住在新北市的人。

7.4.6 Object extraction: $ and [ ]

$ and [] can be used to extract parts of an object, not limiting to data frames.

To select elements:

  • $ applied to element with names.

    • 只能用於有「名字」的元素,且一次只能取一個元素。

    • 若物件(object)本身為vector,則無法用來取出它的元素。

  • [] applied to element with or without names.

    • 物件為data frame或矩陣matrix時,用[i,j]來取,其中i,j分別代表元素(們)所在行(row)及列(column)。

    • 若為一維向量,用[i]來取。

    • 若為多維度的陣列(array)則會有更多元素位置指標,如三維空間會有[i,j,k]等。

    • i, j, k …可用數字,也可用名稱。

StuDF$StuID
StuDF[,c("StuID")]
StuDF$name
StuDF[,c("name")]
StuDF[,c("StuID","name")]

## 無法用$一次取兩個元素
# StuDF$c("StuID","name")

**一次(joint)取 或 疊代(recursive) 取

StuDF[c(1,4),c(2)]
StuDF[,c(2)][c(1,4)]

## the second one is equivalent to 
StuDF[,c(2)] -> aa
aa[c(1,4)] 

請問可以這樣取元素嗎?

StuDF[,c(1,3)][c(1,4)]

疊代(recursive) 時可以$[]混搭。 範例

StuDF$name[c(1,4)]
StuDF[c(1,4),]$name

Try

StuDF$c("StuID","name")
StuDF$[c("StuID","name")]

7.4.7 Generic replacement

  • 有些函數除了可用來查詢物件的某些原生屬性值外,也可用來改變替換原生屬性值。如names(x)除了顯示x所含元素名稱外,也可以用names(x)<- ...來取代其本元素名稱。
names(StuDF)
names(StuDF) <- c("學號","姓名","成績")
names(StuDF)

範例 執行以下程式

library(readr)
student <- read_csv("https://raw.githubusercontent.com/tpemartin/course-107-1-programming-for-data-science/master/data/student.csv")
library(dplyr)
library(magrittr)
student %<>% mutate(
  身高級距=cut(身高,c(0,150,155,160,165,170,175,180,185,200)))

levels()除了顯示factor物件的類別外,也可以用來進行generic replacement。

請把身高級距改成“小個子”,“中等個子”,“高個子”三類,其中 170公分以下的都稱為“小個子”,170-180為“中等個子”,180以上為“高個子”。

levels(student$身高級距)
  • 經由$[]取出的元素均可進行原生值替換。
StuDF$成績[c(4)]
StuDF$成績[c(4)] <- 75

若只要改變一部份的原生屬性,可以用$[]來選出原生屬性值的位置再替換,如:

先重新執行資料引入,接著執行

student$新身高級距 <- student$身高級距

請善用levels()[]元素選取,讓新身高級距(0,160]為最小級距(即把原生的三個級距值,(0,150],(150,155],(155,160],合併成一類。

7.5 練習題

課堂練習

1.

以下各題以課堂調查的classSurvey資料框(data frame)為主:

1.1 使用dim()查詢classSurvey有多少觀察值?有多少變數?(前者也可以用nrow(), 後者也可以用ncol())

1.2 新增一個變數叫年級,它必需是個factor,且有四個levels:大一,大二,大三,大四及以上。(hint: 取出適當的學號碼,透過as.factor(),levels()來達成。)

1.3 各年級有多少人?

1.4 在課堂調查的classSurvey裡大二(含)以上的人有多少比例為男性? (hint: length()可計算vector(如一個變數)有多少元素,dim()計算矩陣或data frame有多少個row(即觀測值個數)及多少個column(即變數數目))

1.5 大一的男性比例又有多少呢?

1.6 學生參加最多的課外活動是什麼?請適當利用table()呈現。

1.7 學生來自的縣市分佈如何?請適當利用table()呈現。

作業

請連到作業repo取得作業內容: