Chapter 4 挑選適當的欄位Select()
第一部分
Select 是針對欄位(variables)做子集合
- 基本取法
#若我想看每一個航班的基本資料,包含起飛與目的地
flights %>% select(year,month,day,carrier,flight,tailnum,origin,dest)
## # A tibble: 336,776 x 8
## year month day carrier flight tailnum origin
## <int> <int> <int> <chr> <int> <chr> <chr>
## 1 2013 1 1 UA 1545 N14228 EWR
## 2 2013 1 1 UA 1714 N24211 LGA
## 3 2013 1 1 AA 1141 N619AA JFK
## 4 2013 1 1 B6 725 N804JB JFK
## 5 2013 1 1 DL 461 N668DN LGA
## 6 2013 1 1 UA 1696 N39463 EWR
## 7 2013 1 1 B6 507 N516JB EWR
## 8 2013 1 1 EV 5708 N829AS LGA
## 9 2013 1 1 B6 79 N593JB JFK
## 10 2013 1 1 AA 301 N3ALAA LGA
## # ... with 336,766 more rows, and 1 more variable:
## # dest <chr>
- 取連續某幾行的欄位
#跟上面的寫法,有什麼異同?
flights %>% select(year:day,carrier:dest)
## # A tibble: 336,776 x 8
## year month day carrier flight tailnum origin
## <int> <int> <int> <chr> <int> <chr> <chr>
## 1 2013 1 1 UA 1545 N14228 EWR
## 2 2013 1 1 UA 1714 N24211 LGA
## 3 2013 1 1 AA 1141 N619AA JFK
## 4 2013 1 1 B6 725 N804JB JFK
## 5 2013 1 1 DL 461 N668DN LGA
## 6 2013 1 1 UA 1696 N39463 EWR
## 7 2013 1 1 B6 507 N516JB EWR
## 8 2013 1 1 EV 5708 N829AS LGA
## 9 2013 1 1 B6 79 N593JB JFK
## 10 2013 1 1 AA 301 N3ALAA LGA
## # ... with 336,766 more rows, and 1 more variable:
## # dest <chr>
- 取特定類型的欄位
#若我有想看有關起飛與抵達的資料,contains(),就能派上用場,當然也有其他內容,請參考cheatsheet
flights %>% select(contains("dep"), contains("arr"))
## # A tibble: 336,776 x 7
## dep_time sched_dep_time dep_delay arr_time
## <int> <int> <dbl> <int>
## 1 517 515 2 830
## 2 533 529 4 850
## 3 542 540 2 923
## 4 544 545 -1 1004
## 5 554 600 -6 812
## 6 554 558 -4 740
## 7 555 600 -5 913
## 8 557 600 -3 709
## 9 557 600 -3 838
## 10 558 600 -2 753
## # ... with 336,766 more rows, and 3 more variables:
## # sched_arr_time <int>, arr_delay <dbl>,
## # carrier <chr>
- 去掉某些變數(drop certain variables)
#若我留意有些變數,對後續分析沒有幫助...
flights %>% select(-c(time_hour))
## # A tibble: 336,776 x 18
## year month day dep_time sched_dep_time dep_delay
## <int> <int> <int> <int> <int> <dbl>
## 1 2013 1 1 517 515 2
## 2 2013 1 1 533 529 4
## 3 2013 1 1 542 540 2
## 4 2013 1 1 544 545 -1
## 5 2013 1 1 554 600 -6
## 6 2013 1 1 554 558 -4
## 7 2013 1 1 555 600 -5
## 8 2013 1 1 557 600 -3
## 9 2013 1 1 557 600 -3
## 10 2013 1 1 558 600 -2
## # ... with 336,766 more rows, and 12 more variables:
## # arr_time <int>, sched_arr_time <int>,
## # arr_delay <dbl>, carrier <chr>, flight <int>,
## # tailnum <chr>, origin <chr>, dest <chr>,
## # air_time <dbl>, distance <dbl>, hour <dbl>,
## # minute <dbl>
#一次可以去掉多個變數嗎?
flights %>% select(-c(time_hour,tailnum,flight))
## # A tibble: 336,776 x 16
## year month day dep_time sched_dep_time dep_delay
## <int> <int> <int> <int> <int> <dbl>
## 1 2013 1 1 517 515 2
## 2 2013 1 1 533 529 4
## 3 2013 1 1 542 540 2
## 4 2013 1 1 544 545 -1
## 5 2013 1 1 554 600 -6
## 6 2013 1 1 554 558 -4
## 7 2013 1 1 555 600 -5
## 8 2013 1 1 557 600 -3
## 9 2013 1 1 557 600 -3
## 10 2013 1 1 558 600 -2
## # ... with 336,766 more rows, and 10 more variables:
## # arr_time <int>, sched_arr_time <int>,
## # arr_delay <dbl>, carrier <chr>, origin <chr>,
## # dest <chr>, air_time <dbl>, distance <dbl>,
## # hour <dbl>, minute <dbl>
- everthing()
#everything(),意思是all variables的意思。
#稍等與mutate()一起使用可以提升資料整理的效率性。
flights %>% select(everything())
## # A tibble: 336,776 x 19
## year month day dep_time sched_dep_time dep_delay
## <int> <int> <int> <int> <int> <dbl>
## 1 2013 1 1 517 515 2
## 2 2013 1 1 533 529 4
## 3 2013 1 1 542 540 2
## 4 2013 1 1 544 545 -1
## 5 2013 1 1 554 600 -6
## 6 2013 1 1 554 558 -4
## 7 2013 1 1 555 600 -5
## 8 2013 1 1 557 600 -3
## 9 2013 1 1 557 600 -3
## 10 2013 1 1 558 600 -2
## # ... with 336,766 more rows, and 13 more variables:
## # arr_time <int>, sched_arr_time <int>,
## # arr_delay <dbl>, carrier <chr>, flight <int>,
## # tailnum <chr>, origin <chr>, dest <chr>,
## # air_time <dbl>, distance <dbl>, hour <dbl>,
## # minute <dbl>, time_hour <dttm>
範例
若完成,請直接貼到open chat
- 若有一位買家對於這32台車子很有興趣,特別是在省油表現(Miles/(US) gallon, mpg),以及馬力表現(hp, Gross horsepower)有很大的興趣,你要怎麼整理資料給他?
mtcars
mtcars %>% select(car_name, mpg, hp)
自主練習
- 若有一名球隊總管,很重視防守,他想要查一下聯盟中誰防守比較好。需要其名字(Name)、所屬隊伍(Team)、守備位置(Position), 他搶了幾個籃板球(TotalRebounds),有幾次抄球 (Steals),請問應該怎麼準備資料?