18 Exerciții

Cu tabelul gapminder

  1. Calculati pentru fiecare continent si fiecare an media pentru pop, lifeExp si gdpPercap intr-un tabel nou.
tabel1  <- gapminder %>%
  group_by(continent, year) %>% 
  summarise(
    mediaPop = mean(pop),
    mediaLife = mean(lifeExp),
    mediaGdp = mean(gdpPercap),
    n=n()
  )
## `summarise()` has grouped output by 'continent'. You can override using the
## `.groups` argument.
head(tabel1)
## # A tibble: 6 × 6
## # Groups:   continent [1]
##   continent  year mediaPop mediaLife mediaGdp     n
##   <fct>     <int>    <dbl>     <dbl>    <dbl> <int>
## 1 Africa     1952 4570010.      39.1    1253.    52
## 2 Africa     1957 5093033.      41.3    1385.    52
## 3 Africa     1962 5702247.      43.3    1598.    52
## 4 Africa     1967 6447875.      45.3    2050.    52
## 5 Africa     1972 7305376.      47.5    2340.    52
## 6 Africa     1977 8328097.      49.6    2586.    52
  1. Convertiti tabelul in format lat in care pe coloane avem continentele si pentru fiecare media pop pentru fiecare an.
tabel1  <- gapminder %>%
  group_by(continent, year) %>% 
  summarise(
    mediaPop = mean(pop),
    # mediaLife = mean(lifeExp),
    # mediaGdp = mean(gdpPercap),
    # n=n()
  )
## `summarise()` has grouped output by 'continent'. You can override using the
## `.groups` argument.
head(tabel1)
## # A tibble: 6 × 3
## # Groups:   continent [1]
##   continent  year mediaPop
##   <fct>     <int>    <dbl>
## 1 Africa     1952 4570010.
## 2 Africa     1957 5093033.
## 3 Africa     1962 5702247.
## 4 Africa     1967 6447875.
## 5 Africa     1972 7305376.
## 6 Africa     1977 8328097.
tabel1_lat_pop <- spread(tabel1, continent, mediaPop)
tabel1_lat_pop
## # A tibble: 12 × 6
##     year    Africa  Americas       Asia    Europe   Oceania
##    <int>     <dbl>     <dbl>      <dbl>     <dbl>     <dbl>
##  1  1952  4570010. 13806098.  42283556. 13937362.  5343003 
##  2  1957  5093033. 15478157.  47356988. 14596345.  5970988 
##  3  1962  5702247. 17330810.  51404763. 15345172.  6641759 
##  4  1967  6447875. 19229865.  57747361. 16039299.  7300207 
##  5  1972  7305376. 21175368.  65180977. 16687835.  8053050 
##  6  1977  8328097. 23122708.  72257987. 17238818.  8619500 
##  7  1982  9602857. 25211637.  79095018. 17708897.  9197425 
##  8  1987 11054502. 27310159.  87006690. 18103139.  9787208.
##  9  1992 12674645. 29570964.  94948248. 18604760. 10459826.
## 10  1997 14304480. 31876016. 102523803. 18964805. 11120715 
## 11  2002 16033152. 33990910. 109145521. 19274129. 11727414.
## 12  2007 17875763. 35954847. 115513752. 19536618. 12274974.
  • alta varianta: filtrez tabel 1
tabel1  <- gapminder %>%
  group_by(continent, year) %>% 
  summarise(
    mediaPop = mean(pop),
    mediaLife = mean(lifeExp),
    mediaGdp = mean(gdpPercap),
    n=n()
  )
## `summarise()` has grouped output by 'continent'. You can override using the
## `.groups` argument.
head(tabel1)
## # A tibble: 6 × 6
## # Groups:   continent [1]
##   continent  year mediaPop mediaLife mediaGdp     n
##   <fct>     <int>    <dbl>     <dbl>    <dbl> <int>
## 1 Africa     1952 4570010.      39.1    1253.    52
## 2 Africa     1957 5093033.      41.3    1385.    52
## 3 Africa     1962 5702247.      43.3    1598.    52
## 4 Africa     1967 6447875.      45.3    2050.    52
## 5 Africa     1972 7305376.      47.5    2340.    52
## 6 Africa     1977 8328097.      49.6    2586.    52
tabel1_lat_pop <- spread(tabel1[, c(1,2,3)], continent, mediaPop)
tabel1_lat_pop
## # A tibble: 12 × 6
##     year    Africa  Americas       Asia    Europe   Oceania
##    <int>     <dbl>     <dbl>      <dbl>     <dbl>     <dbl>
##  1  1952  4570010. 13806098.  42283556. 13937362.  5343003 
##  2  1957  5093033. 15478157.  47356988. 14596345.  5970988 
##  3  1962  5702247. 17330810.  51404763. 15345172.  6641759 
##  4  1967  6447875. 19229865.  57747361. 16039299.  7300207 
##  5  1972  7305376. 21175368.  65180977. 16687835.  8053050 
##  6  1977  8328097. 23122708.  72257987. 17238818.  8619500 
##  7  1982  9602857. 25211637.  79095018. 17708897.  9197425 
##  8  1987 11054502. 27310159.  87006690. 18103139.  9787208.
##  9  1992 12674645. 29570964.  94948248. 18604760. 10459826.
## 10  1997 14304480. 31876016. 102523803. 18964805. 11120715 
## 11  2002 16033152. 33990910. 109145521. 19274129. 11727414.
## 12  2007 17875763. 35954847. 115513752. 19536618. 12274974.
  1. Convertiti tabelul de la punctul 1 in format lat in care pe coloane avem continentele si pentru fiecare media lifeExp pentru fiecare an.
tabel1_lat_life <- spread(tabel1[, c(1,2,4)], continent, mediaLife)
tabel1_lat_life
## # A tibble: 12 × 6
##     year Africa Americas  Asia Europe Oceania
##    <int>  <dbl>    <dbl> <dbl>  <dbl>   <dbl>
##  1  1952   39.1     53.3  46.3   64.4    69.3
##  2  1957   41.3     56.0  49.3   66.7    70.3
##  3  1962   43.3     58.4  51.6   68.5    71.1
##  4  1967   45.3     60.4  54.7   69.7    71.3
##  5  1972   47.5     62.4  57.3   70.8    71.9
##  6  1977   49.6     64.4  59.6   71.9    72.9
##  7  1982   51.6     66.2  62.6   72.8    74.3
##  8  1987   53.3     68.1  64.9   73.6    75.3
##  9  1992   53.6     69.6  66.5   74.4    76.9
## 10  1997   53.6     71.2  68.0   75.5    78.2
## 11  2002   53.3     72.4  69.2   76.7    79.7
## 12  2007   54.8     73.6  70.7   77.6    80.7
  1. convertiti tabelul de la punctul 1 in format lat in care pe coloane avem continentele si pentru fiecare media gdpPercap pentru fiecare an.
tabel1_lat_gdp <- spread(tabel1[, c(1,2,5)], continent, mediaGdp)
tabel1_lat_gdp
## # A tibble: 12 × 6
##     year Africa Americas   Asia Europe Oceania
##    <int>  <dbl>    <dbl>  <dbl>  <dbl>   <dbl>
##  1  1952  1253.    4079.  5195.  5661.  10298.
##  2  1957  1385.    4616.  5788.  6963.  11599.
##  3  1962  1598.    4902.  5729.  8365.  12696.
##  4  1967  2050.    5668.  5971. 10144.  14495.
##  5  1972  2340.    6491.  8187. 12480.  16417.
##  6  1977  2586.    7352.  7791. 14284.  17284.
##  7  1982  2482.    7507.  7434. 15618.  18555.
##  8  1987  2283.    7793.  7608. 17214.  20448.
##  9  1992  2282.    8045.  8640. 17062.  20894.
## 10  1997  2379.    8889.  9834. 19077.  24024.
## 11  2002  2599.    9288. 10174. 21712.  26939.
## 12  2007  3089.   11003. 12473. 25054.  29810.
  1. Convertiti tabelul tabel in forma lata in care pe fiecare coloana avem anul, pe linii continentele, iar valorile din celule sa reprezinte media populatie de pe fiecare continent din anul respectiv.
tabel_lat_an <- spread(tabel1[,c('continent','year','mediaPop')], year, mediaPop)
tabel_lat_an
## # A tibble: 5 × 13
## # Groups:   continent [5]
##   contin…¹ `1952` `1957` `1962` `1967` `1972` `1977` `1982` `1987` `1992` `1997`
##   <fct>     <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
## 1 Africa   4.57e6 5.09e6 5.70e6 6.45e6 7.31e6 8.33e6 9.60e6 1.11e7 1.27e7 1.43e7
## 2 Americas 1.38e7 1.55e7 1.73e7 1.92e7 2.12e7 2.31e7 2.52e7 2.73e7 2.96e7 3.19e7
## 3 Asia     4.23e7 4.74e7 5.14e7 5.77e7 6.52e7 7.23e7 7.91e7 8.70e7 9.49e7 1.03e8
## 4 Europe   1.39e7 1.46e7 1.53e7 1.60e7 1.67e7 1.72e7 1.77e7 1.81e7 1.86e7 1.90e7
## 5 Oceania  5.34e6 5.97e6 6.64e6 7.30e6 8.05e6 8.62e6 9.20e6 9.79e6 1.05e7 1.11e7
## # … with 2 more variables: `2002` <dbl>, `2007` <dbl>, and abbreviated variable
## #   name ¹​continent
  1. Construiti o variabila noua in tabel care sa identifice tarile cu lifeExp peste 80.
gapminder$durataViata <- gapminder$lifeExp > 80
  1. Cate astfel de tari sunt?
table(gapminder$durataViata)
## 
## FALSE  TRUE 
##  1683    21

? cate tari sunt de fapt?

  1. Convertiti variabila in tip factor cu etichete la alegere pentru cele doua categorii de tari.
gapminder$durataViata <-  factor(gapminder$durataViata, labels=c('mai mica de 80','mai mare de 80'))
table(gapminder$durataViata)
## 
## mai mica de 80 mai mare de 80 
##           1683             21
  1. Construiti un tabel care sa contina media gdpPercap pentru fiecare continent, an, si tip de tara.
tabel2  <-  gapminder %>%
  group_by(continent, year, durataViata) %>%
  summarise(
    medieGdp = mean(gdpPercap)
  )
## `summarise()` has grouped output by 'continent', 'year'. You can override using
## the `.groups` argument.
head(tabel2)
## # A tibble: 6 × 4
## # Groups:   continent, year [6]
##   continent  year durataViata    medieGdp
##   <fct>     <int> <fct>             <dbl>
## 1 Africa     1952 mai mica de 80    1253.
## 2 Africa     1957 mai mica de 80    1385.
## 3 Africa     1962 mai mica de 80    1598.
## 4 Africa     1967 mai mica de 80    2050.
## 5 Africa     1972 mai mica de 80    2340.
## 6 Africa     1977 mai mica de 80    2586.
  1. reprezentati grafic media gdp pentru fiecare continent si tip de tara in parte
ggplot(tabel2, aes(x=continent, y=medieGdp))+
  geom_bar(stat='identity', aes(fill=year))
ggplot(tabel2, aes(x=continent, y=medieGdp))+
  geom_bar(stat='identity', aes(fill=durataViata))
ggplot(tabel2, aes(x=continent, y=medieGdp))+
  geom_bar(stat='identity', aes(fill=durataViata),  position = 'dodge')
  1. Convertiti tabelul gapminder in forma lunga in care cei trei indicatori apar pe linii diferite, iar valorile lor pe o singura coloana. In locul celor trei coloane cu indicatori vor aparea doar doua, una cu numele indicatorului si a doua cu valoarea lui.
head(gapminder)
## # A tibble: 6 × 10
##   country   conti…¹  year lifeExp    pop gdpPe…² tara_…³ tara_…⁴ tara_…⁵ durat…⁶
##   <fct>     <fct>   <int>   <dbl>  <int>   <dbl> <lgl>   <fct>   <ord>   <fct>  
## 1 Afghanis… Asia     1952    28.8 8.43e6    779. FALSE   mare    mare    mai mi…
## 2 Afghanis… Asia     1957    30.3 9.24e6    821. FALSE   mare    mare    mai mi…
## 3 Afghanis… Asia     1962    32.0 1.03e7    853. FALSE   mare    mare    mai mi…
## 4 Afghanis… Asia     1967    34.0 1.15e7    836. FALSE   mare    mare    mai mi…
## 5 Afghanis… Asia     1972    36.1 1.31e7    740. FALSE   mare    mare    mai mi…
## 6 Afghanis… Asia     1977    38.4 1.49e7    786. FALSE   mare    mare    mai mi…
## # … with abbreviated variable names ¹​continent, ²​gdpPercap, ³​tara_mica,
## #   ⁴​tara_mica_f, ⁵​tara_mica_f_o, ⁶​durataViata
tabel3 <- gather(gapminder, indicator, valoare, lifeExp:gdpPercap)
head(tabel3)
## # A tibble: 6 × 9
##   country     continent  year tara_mica tara_m…¹ tara_…² durat…³ indic…⁴ valoare
##   <fct>       <fct>     <int> <lgl>     <fct>    <ord>   <fct>   <chr>     <dbl>
## 1 Afghanistan Asia       1952 FALSE     mare     mare    mai mi… lifeExp    28.8
## 2 Afghanistan Asia       1957 FALSE     mare     mare    mai mi… lifeExp    30.3
## 3 Afghanistan Asia       1962 FALSE     mare     mare    mai mi… lifeExp    32.0
## 4 Afghanistan Asia       1967 FALSE     mare     mare    mai mi… lifeExp    34.0
## 5 Afghanistan Asia       1972 FALSE     mare     mare    mai mi… lifeExp    36.1
## 6 Afghanistan Asia       1977 FALSE     mare     mare    mai mi… lifeExp    38.4
## # … with abbreviated variable names ¹​tara_mica_f, ²​tara_mica_f_o, ³​durataViata,
## #   ⁴​indicator