第 7 章 Data Frame
7.1 主要參考資料
7.2 Develop a Project
目的:建立全班線上通訊錄並進行基本分析
Lay out your strategy
Keys: 全班, 線上,通訊錄
Key | Strategy |
---|---|
全班 | 使用Google Form給大家自己填) |
線上 | 使用Google Form |
通訊錄 | 欄位:姓名,連絡email,…,etc |
基本分析 | 使用R |
Google G Suite
A brand of cloud computing, productivity and collaboration tools, software and products developed by Google.
In this course, we will learn to use R to interact with:
Google Form
Google Sheets
Google Calendar
More if time allows.
Google Form是常用的線上調查表單功具(建立方法,詳見Google Form製作)。
請連到以下表單填寫個人資訊: Google Form通訊錄調查單
Retrieve useful packages
要在R裡分析Google調查結果你需要R做哪些事?
Try google search: google sheets R
.
Import Google Sheets
library(googlesheets)
This package only works with those google sheets that belong to you, which should be in your Google Drive. So you need to
File -> Add to My Drive…
To get google sheet link:
Gain access authorization to your sheets
gs_auth(new_user = TRUE)
Download Google sheets
gsSurvey<-gs_key("1mC9bnxj11NCNoOCw0Vmn4nxERbHtLjeGo9v9C9b2GDE") #download sheet information as an object
classSurvey<-gs_read(gsSurvey,ws=1) #access the 1st worksheet
Check the help of the above two functions: gs_key()
and gs_read()
If you want to share your google sheet for others to access, you need to do the following two things in your google sheets:
File -> Publish to the web…
- File -> Share…
7.3 Understand your packages
From RStudio help
Clik the package under Packages tab.
Look for User guides, package vignettes and other documentation.
Google the package
Google package name and R together.
Try google search: googlesheets R
.
7.4 Data Frame
- Like a spreadsheet (試算表). Each row represents an observation, while each column represents a variable (變數, such as Timestamp, Email Address, 姓名, 學號, 居住地行政區, 性別, 本學期學分數, 本學期目前已參加之課外活動).
class(classSurvey)
7.4.1 Create a data frame
Manual creation
data.frame()
(...,stringsAsFactors)
: All character vectors are defaulted to be parsed as factors.
StuDF <- data.frame(
StuID=c(1,2,3,4,5),
name=c("小明","大雄","胖虎","小新","大白"),
score=c(80,60,90,70,50)
)
StuDF
7.4.2 Column/Row names
Column names
names(StuDF)
colnames(StuDF)
names()
is more general than colnames, which can be applied to objects other than data frames.
檢查classSurvey的變數名稱。
Row names
rownames(StuDF)
檢查classSurvey的rownames。
7.4.3 Extract observations: numerical/logical index
by matrix index
StuDF[1,2]
StuDF[,2]
StuDF[1,]
StuDF[c(1,4),]
StuDF[c(1,4),c(2,3)]
Compare the following two
StuDF[c(1,4),]
StuDF[-c(1,4),]
If you are selecting consecutive rows/columns, you can use :
operation.
c(1:3) # same as c(1,2,3)
c(5:7) # same as c(5,6,7)
c(1,5:7,10) #same as c(1,5,6,7,10)
請使用matrix index來產生head(classSurvey)
的結果。
by TRUE/FALSE
StuDF[c(T,F,F,F,F),c(F,T,F)]
StuDF[c(T,F,F,T,F),]
You can write TRUE as T, and FALSE as F; but not True, true, TRue, False, false, FAlse, etc.
When apply
sum()
to a logical vector, it gives you the total number of TRUEs.
a<-c(T,T,F,F,T,F,F)
sum(a)
Reproduce StuDF[1,]
and StuDF[c(1,4),c(2,3)]
using T/F method.
7.4.4 Relational operators
It is common to ask the relation between two values, such as “is A larger than B?” In R, the followings are available relational operators:
Operator | Description |
---|---|
< | Less than |
> | Greater than |
<= | Less than or equal to |
>= | Greater than or equal to |
== | Equal to |
!= | Not equal to |
範例
“誰的分數大於等於80?” “小新的分數幾分?”
(StuDF$score >= 80)
(StuDF$name == "小新")
- 找出StuDF分數大於等於80的人名。
- 找出小新的分數。
which()
return the locations of TRUE elements in a logical vector.
which(StuDF$score >= 80)
which(StuDF$name == "小新")
7.4.5 Logical operators
Sometimes we want to join several relational results together. It can be done through the following logical operators:
Operator | Description |
---|---|
! | Logical NOT |
& | Element-wise logical AND |
&& | Logical AND |
| | Element-wise logical OR |
|| | Logical OR |
範例
(classSurvey$性別 == "男")
(classSurvey$本學期學分數 > 20)
(classSurvey$性別 == "男" |
classSurvey$本學期學分數 > 20)
- 找出「性別為男」且「本學期學分數>26」的人名。
- 找出住在台北市的人。
- 找出住在新北市的人。
7.4.6 Object extraction: $
and [ ]
$
and []
can be used to extract parts of an object, not limiting to data frames.
To select elements:
$
applied to element with names.只能用於有「名字」的元素,且一次只能取一個元素。
若物件(object)本身為vector,則無法用來取出它的元素。
[]
applied to element with or without names.物件為data frame或矩陣matrix時,用[i,j]來取,其中i,j分別代表元素(們)所在行(row)及列(column)。
若為一維向量,用[i]來取。
若為多維度的陣列(array)則會有更多元素位置指標,如三維空間會有[i,j,k]等。
i, j, k …可用數字,也可用名稱。
StuDF$StuID
StuDF[,c("StuID")]
StuDF$name
StuDF[,c("name")]
StuDF[,c("StuID","name")]
## 無法用$一次取兩個元素
# StuDF$c("StuID","name")
**一次(joint)取 或 疊代(recursive) 取
StuDF[c(1,4),c(2)]
StuDF[,c(2)][c(1,4)]
## the second one is equivalent to
StuDF[,c(2)] -> aa
aa[c(1,4)]
請問可以這樣取元素嗎?
StuDF[,c(1,3)][c(1,4)]
疊代(recursive) 時可以$
或[]
混搭。
範例
StuDF$name[c(1,4)]
StuDF[c(1,4),]$name
Try
StuDF$c("StuID","name")
StuDF$[c("StuID","name")]
7.4.7 Generic replacement
- 有些函數除了可用來查詢物件的某些原生屬性值外,也可用來改變替換原生屬性值。如
names(x)
除了顯示x所含元素名稱外,也可以用names(x)<- ...
來取代其本元素名稱。
names(StuDF)
names(StuDF) <- c("學號","姓名","成績")
names(StuDF)
範例 執行以下程式
library(readr)
student <- read_csv("https://raw.githubusercontent.com/tpemartin/course-107-1-programming-for-data-science/master/data/student.csv")
library(dplyr)
library(magrittr)
student %<>% mutate(
身高級距=cut(身高,c(0,150,155,160,165,170,175,180,185,200)))
levels()
除了顯示factor物件的類別外,也可以用來進行generic replacement。
請把身高級距改成“小個子”,“中等個子”,“高個子”三類,其中 170公分以下的都稱為“小個子”,170-180為“中等個子”,180以上為“高個子”。
levels(student$身高級距)
- 經由
$
與[]
取出的元素均可進行原生值替換。
StuDF$成績[c(4)]
StuDF$成績[c(4)] <- 75
若只要改變一部份的原生屬性,可以用$
或[]
來選出原生屬性值的位置再替換,如:
先重新執行資料引入,接著執行
student$新身高級距 <- student$身高級距
請善用levels()
及[]
元素選取,讓新身高級距(0,160]為最小級距(即把原生的三個級距值,(0,150],(150,155],(155,160],合併成一類。
7.5 練習題
課堂練習
1.
以下各題以課堂調查的classSurvey資料框(data frame)為主:
1.1 使用dim()
查詢classSurvey有多少觀察值?有多少變數?(前者也可以用nrow()
, 後者也可以用ncol()
)
1.2 新增一個變數叫年級,它必需是個factor,且有四個levels:大一,大二,大三,大四及以上。(hint: 取出適當的學號碼,透過as.factor()
,levels()
來達成。)
1.3 各年級有多少人?
1.4 在課堂調查的classSurvey裡大二(含)以上的人有多少比例為男性?
(hint: length()
可計算vector(如一個變數)有多少元素,dim()
計算矩陣或data frame有多少個row(即觀測值個數)及多少個column(即變數數目))
1.5 大一的男性比例又有多少呢?
1.6 學生參加最多的課外活動是什麼?請適當利用table()
呈現。
1.7 學生來自的縣市分佈如何?請適當利用table()
呈現。