A Glossary of important R commands
Basic usage
The following table contains important R commands for its basic usage.
| Description | R | Example |
|---|---|---|
| Assign values to a variable | <- |
x <- 1 |
| Compute several expressions at once | ; |
x <- 1; 2 + 2; 3 * 8 |
| Create vectors by concatenating numbers | c |
c(1, 2, -1) |
| Create sequential integer vectors | : |
1:10 |
| Create a matrix by columns | cbind |
cbind(1:3, c(0, 2, 0)) |
| Create a matrix by rows | rbind |
rbind(1:3, c(0, 2, 0)) |
| Create a data frame | data.frame |
data.frame(name1 = c(-1, 3), name2 = c(0.4, 1)) |
| Create a list | list |
list(obj1 = c(-1, 3), obj2 = -1:5, obj3 = rbind(1:2, 3:2)) |
| Access elements of a… | ||
| … vector | [] |
c(0.5, 2)[1], c(0.5, 2)[-1]; c(0.5, 2)[2:1] |
| … matrix | [, ] |
cbind(1:2, 3:4)[1, 2]; cbind(1:2, 3:4)[1, ] |
| … data frame | [, ] and $ |
data.frame(name1 = c(-1, 3), name2 = c(0.4, 1))$name1; data.frame(name1 = c(-1, 3), name2 = c(0.4, 1))[2, 1] |
| … list | $ |
list(x = 2, y = 7:0)$y |
| Summarize any object | summary |
summary(1:10) |
Linear regression
Some useful commands for performing simple and multiple linear regression are given in the next table. We assume that:
datasetis an imported dataset such thatrespis the response variablepred1is first predictorpred2is second predictor- …
predkis the last predictor
modelis the result of applyinglmnewPredsis adata.framewith variables named as the predictorsnumis1,2or3levelis a number between 0 and 1
| Description | R |
|---|---|
| Fit a simple linear model | lm(response ~ pred1, data = dataset) |
| Fit a multiple linear model… | |
| … on two predictors | lm(response ~ pred1 + pred2, data = dataset) |
| … on all predictors | lm(response ~ ., data = dataset) |
… on all predictors except pred1 |
lm(response ~ . - pred1, data = dataset) |
| Summarize linear model: coefficient estimates, standard errors, \(t\)-values, \(p\)-values for \(H_0:\beta_j=0\), \(\hat\sigma\) (Residual standard error), degrees of freedom, \(R^2\), Adjusted \(R^2\), \(F\)-test, \(p\)-value for \(H_0:\beta_1=\ldots=\beta_k=0\) | summary(model) |
| ANOVA decomposition | anova(model) |
| CIs coefficients | confint(model, level = level) |
| Prediction | predict(model, newdata = new) |
| CIs predicted mean | predict(model, newdata = new, interval = "confidence", level = level) |
| CIs predicted response | predict(model, newdata = new, interval = "prediction", level = level) |
| Variable selection | stepwise(model) |
| Multicollinearity detection | vif(model) |
| Compare model coefficients | compareCoefs(model1, model2) |
| Diagnostic plots | plot(model, num) |
More basic usage
The following table contains important R commands for its basic usage. We assume the following dataset is available:
data <- data.frame(x = 1:10, y = c(-1, 2, 3, 0, 3, 1, -1, 3, 0, -1))| Description | R | Example |
|---|---|---|
| Data frame management | ||
| variable names | names |
names(data) |
| structure | str |
str(data) |
| dimensions | dim |
dim(data) |
| beginning | head |
head(data) |
| Vector related functions | ||
| create sequences | seq |
seq(0, 1, l = 10); seq(0, 1, by = 0.25) |
| reverse a vector | rev |
rev(1:5) |
| length of a vectors | length |
length(1:5) |
| count repetitions in a vector | table |
table(c(1:5, 4:2)) |
| Logical conditions | ||
| relational operators | <, <=, >, >=, ==, != |
1 < 0; 1 <= 1; 2 > 1; 3 >= 4; 1 == 0; 1 != 0 |
| “and” | & |
TRUE & FALSE |
| “or” | | |
TRUE | FALSE |
| Subsetting | ||
| vector | data$x[data$x > 0]; data$x[data$x > 2 & data$x < 8] |
|
| data frame | data[data$x > 0, ]; data[data$x < 2 | data$x > 8, ] |
|
| Distributions | ||
| sampling | rxxxx |
rnorm(n = 10, mean = 0, sd = 1) |
| density | dxxxx |
x <- seq(-4, 4, l = 20); dnorm(x = x, mean = 0, sd = 1) |
| distribution | pxxxx |
x <- seq(-4, 4, l = 20); pnorm(q = x, mean = 0, sd = 1) |
| quantiles | qxxxx |
p <- seq(0.1, 0.9, l = 10); qnorm(p = p, mean = 0, sd = 1) |
| Plotting | ||
| scatterplot | plot |
plot(rnorm(100), rnorm(100)) |
| plot a curve | plot, seq |
x <- seq(0, 1, l = 100); plot(x, x^2, type = "l") |
| add lines | lines, |
x <- seq(0, 1, l = 100); plot(x, x^2 + rnorm(100, sd = 0.1)); lines(x, x^2, col = 2, lwd = 2) |
Logistic regression
Some useful commands for performing logistic regression are given in the next table. We assume that:
datasetis an imported dataset such thatrespis the response binary variablepred1is first predictorpred2is second predictor- …
predkis the last predictor
modelis the result of applyingglmnewPredsis adata.framewith variables named as the predictorslevelis a number between 0 and 1
| Description | R |
|---|---|
| Fit a simple logistic model | glm(response ~ pred1, data = dataset, family = "binomial") |
| Fit a multiple logistic model… | |
| … on two predictors | glm(response ~ pred1 + pred2, data = dataset, family = "binomial") |
| … on all predictors | glm(response ~ ., data = dataset, family = "binomial") |
… on all predictors except pred1 |
glm(response ~ . - pred1, data = dataset, family = "binomial") |
Summarize logistic model: coefficient estimates, standard errors, Wald statistics ('z value'), \(p\)-values for \(H_0:\beta_j=0\), Null deviance, deviance ('Residual deviance'), AIC, number of iterations |
summary(model) |
| CIs coefficients | confint(model, level = level); confint.default(model, level = level) |
| CIs exp-coefficients | exp(confint(model, level = level)); exp(confint.default(model, level = level)) |
| Prediction | predict(model, newdata = new, type = "response") |
| CIs predicted probability | Not immediate. Use predictCIsLogistic(model, newdata = new, level = level) as seen in Section 4.6 |
| Variable selection | stepwise(model) |
| Multicollinearity detection | vif(model) |
| \(R^2\) | Not immediate. Use r2Log(model = model) as seen in Section 4.8 |
| Hit matrix | table(data$resp, model$fitted.values > 0.5) |
Principal component analysis
Some useful commands for performing logistic regression are given in the next table. We assume that:
datasetis an imported dataset with several non-categorical variables (the variables must be continuous or discrete).pcais a PCA object, this is, the output ofprincomp.
| Description | R |
|---|---|
| Compute a PCA… | |
| … unnormalized (if variables have the same scale) | princomp(dataset) |
| … normalized (if variables have different scales) | princomp(dataset, cor = TRUE) |
| Summarize PCA: standard deviation explained by each PC, proportion of variance explained by each PC, cumulative proportion of variance explained up to a given component | summary(pca) |
| Weights | pca$loadings |
| Scores | pca$scores |
| Standard deviations of the PCs | pca$sdev |
| Means of the original variables | pca$center |
| Screeplot | plot(pca); plot(pca, type = "l") |
| Biplot | biplot(pca) |