Poglavlje 4 Upravljanje varijablama i podacima

4.1 Operacije u R-u, sintaksa

U ovom poglavlju prikazati će se osnovne operacije sa primjerima sintakse čije razumijevanje je ključno za rad u R sučelju.

Logičke operacije:

& - znak za logičku operaciju i tj. x & y -

Aritmetički i logički operatori.

Operator | Description |
——–:| ———- :|
< | less than |
<= | less than or equal to |

|> | greater than | |>= | greater than or equal to | |== | exactly equal to | |!= | not equal to | |!x | Not x | |x | y | x OR y | |x & y | x AND y | |isTRUE(x) | test if X is TRUE |

4.2 Skalari, vektori i matrice

Skalari su pojedinačni brojevi. Vektori su jednodimenzionalni skup vrijednosti ili možemo reći da su vektori ujedno i polja. Vektori se definiraju pomoću funkcije konkatenacije (concatenate) tj. pomoću funkcije c. Tako npr. vektor a koji je definiran pomoću brojeva 5,6,7,8,9.

## vektor a definiran je  
a=c(5,6,7,8,9)

## tako npr. možemo izračunati prosječnu vrijednost vektora a pomoću funkcije mean
mean(a)
## [1] 7
## R konzola odgovara kako je jednodimenzionalno polje [1] gdje je prosječna vrijednost jednaka 7

Matrice predočavamo kao tablice tj. dvodimenzionalne tablice koje se sastoje od određenog broja redova i stupaca.

4.3 Predočavanje modela podataka, varijabli i ispitanika

Pojedinačna varijabla predstavljena je stupcem dok su ispitanici predstavljeni redom te je konačno pojedinačna vrijednost određena pojedinim ispitanikom i varijablom. U R sustavu nazivamo tidy data (Wickham and Henry 2019).

Wickham and Henry (2019) jasno definiraju što znači uređeni format ili model prikupljenih podataka nekog istraživanja tj. prikaza rezultata istraživanja: A dataset is a collection of values, usually either numbers (if quantitative) or strings AKA text data (if qualitative). Values are organised in two ways. Every value belongs to a variable and an observation. A variable contains all values that measure the same underlying attribute (like height, temperature, duration) across units. An observation contains all values measured on the same unit (like a person, or a day, or a city) across attributes.

Tidy data is a standard way of mapping the meaning of a dataset to its structure. A dataset is messy or tidy depending on how rows, columns and tables are matched up with observations, variables and types. In tidy data:

Each variable forms a column. Each observation forms a row. Each type of observational unit forms a table.

Na slijedećem slikovnom prikazu možemo shematski vidjeti što znači tidy (dostupno s: http://r4ds.had.co.nz/tidy-data.html).

Pravilno predočavanje modela podataka i rezultata [@R-tidyr]

Figure 4.1: Pravilno predočavanje modela podataka i rezultata (Wickham and Henry 2019)

4.4 Obrada rezultata na odabranim ispitanicima

Subsetting Data

R has powerful indexing features for accessing object elements. These features can be used to select and exclude variables and observations. The following code snippets demonstrate ways to keep or delete variables and observations and to take random samples from a dataset. Selecting (Keeping) Variables

#select variables v1, v2, v3
#myvars <- c("v1", "v2", "v3")
#newdata <- mydata[myvars]

another method myvars <- paste(“v”, 1:3, sep="") newdata <- mydata[myvars]

select 1st and 5th thru 10th variables newdata <- mydata[c(1,5:10)]

To practice this interactively, try the selection of data frame elements exercises in the Data frames chapter of this introduction to R course.

Excluding (DROPPING) Variables

exclude variables v1, v2, v3 myvars <- names(mydata) %in% c(“v1”, “v2”, “v3”) newdata <- mydata[!myvars]

exclude 3rd and 5th variable newdata <- mydata[c(-3,-5)]

delete variables v3 and v5 mydata\(v3 <- mydata\)v5 <- NULL Selecting Observations

first 5 observations newdata <- mydata[1:5,]

based on variable values newdata <- mydata[ which(mydata\(gender=='F' & mydata\)age > 65), ]

or attach(mydata) newdata <- mydata[ which(gender==‘F’ & age > 65),] detach(mydata) Selection using the Subset Function

The subset( ) function is the easiest way to select variables and observations. In the following example, we select all rows that have a value of age greater than or equal to 20 or age less then 10. We keep the ID and Weight columns.

using subset function newdata <- subset(mydata, age >= 20 | age < 10, select=c(ID, Weight))

In the next example, we select all men over the age of 25 and we keep variables weight through income (weight, income and all columns between them).

using subset function (part 2) newdata <- subset(mydata, sex==“m” & age > 25, select=weight:income)

To practice the subset() function, try this this interactive exercise. on subsetting data.tables. Random Samples

Use the sample( ) function to take a random sample of size n from a dataset.

take a random sample of size 50 from a dataset mydata sample without replacement mysample <- mydata[sample(1:nrow(mydata), 50, replace=FALSE),]

4.5 Obrada i prikaz rezultata na odabranim varijablama

4.6 Stvaranje novih varijabli

We describe our methods in this chapter.

4.7 Rekodiranje varijabli

4.8 Transponiranje varijabli i položaj rezultata

https://www.r-statistics.com/tag/transpose/ Transponiranje i agregacija pomoću funkcija melt! Odlično i treba referencirati na Kabachichova!

Korištenje funkcija melt i cast [@kabacoff2015]

Figure 4.2: Korištenje funkcija melt i cast (Kabacoff 2015)

References

Wickham, Hadley, and Lionel Henry. 2019. Tidyr: Easily Tidy Data with ’Spread()’ and ’Gather()’ Functions. https://CRAN.R-project.org/package=tidyr.

Kabacoff, Robert. 2015. R in Action: Data Analysis and Graphics with R. Second Edition. Manning Publications.