4.4 RStudio calculations

To calculate the estimation of the total renewable electricity potential in La Plata we will use RStudio. If you have no prior experience with R or RStudio it is recommended to follow this Getting Started with Data in R tutorial.

The methodology, calculations and equations in this section are based on the book Sampling by Steven K. Thompson published in 2012. (See 6)

The full calculations for this section can be found here

First, we will open a new R Script, install the required packages for this analysis and set our work directory:

#install packages
install.packages("dplyr")
install.packages("sf")

#load pacakges
library(dplyr)
library(sf)

#set work directory
setwd(".../La Plata/data")

4.4.1 Random sampling

To start with our random sampling computations we will read the sample100.shp from our directory to RStudio with the read_sf function and set its CRS with the st_transform function. Both functions are from the Simple Features for R (sf) library.

sample1 <- read_sf("data/sample100.shp")
sample1 <- st_transform(sample1, "+proj=merc +lon_0=0 +k=1 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +units=m +no_defs")

Next, we will assign all the building rooftops with an area of 30 and under, the value 0 to include our density factor. In addition, we will create a column with megawatt per hour (mWh) so the results will be easier to read.

#change to 0 above and equal 30 sqm
sample1$elec_prod[sample1$area <= 30] <- 0
#create mega watt per hour (mWh) column
sample1$elec_prod_mwh <- sample1$elec_prod/1000

head(sample1)
## Simple feature collection with 6 features and 6 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: -6480273 ymin: -4138361 xmax: -6455331 ymax: -4118445
## CRS:           +proj=merc +lon_0=0 +k=1 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +units=m +no_defs
## # A tibble: 6 x 7
##      id    area X_mean usable_sr elec_prod                              geometry
##   <dbl>   <dbl>  <dbl>     <dbl>     <dbl>                         <POLYGON [m]>
## 1     3  91.0    1728.   157246.    20285. ((-6461504 -4132034, -6461500 -41320~
## 2     4 131.     1735.   227935.    29404. ((-6466451 -4118457, -6466438 -41184~
## 3     8  59.1    1736.   102639.    13240. ((-6465099 -4118684, -6465094 -41186~
## 4     1   0.189  1729.      327.        0  ((-6455332 -4132428, -6455331 -41324~
## 5     2   0.063  1725.      109.        0  ((-6480273 -4138360, -6480272 -41383~
## 6     5   0.077  1726.      133.        0  ((-6468889 -4135469, -6468889 -41354~
## # ... with 1 more variable: elec_prod_mwh <dbl>
summary(sample1)
##        id              area               X_mean       usable_sr       
##  Min.   :  1.00   Min.   :    0.002   Min.   :1721   Min.   :       3  
##  1st Qu.: 25.75   1st Qu.:    0.021   1st Qu.:1724   1st Qu.:      36  
##  Median : 50.50   Median :    0.078   Median :1727   Median :     134  
##  Mean   : 50.50   Mean   :  243.218   Mean   :1727   Mean   :  419697  
##  3rd Qu.: 75.25   3rd Qu.:    0.240   3rd Qu.:1729   3rd Qu.:     413  
##  Max.   :100.00   Max.   :10405.603   Max.   :1740   Max.   :17939051  
##    elec_prod                geometry   elec_prod_mwh    
##  Min.   :      0   POLYGON      :100   Min.   :   0.00  
##  1st Qu.:      0   epsg:NA      :  0   1st Qu.:   0.00  
##  Median :      0   +proj=merc...:  0   Median :   0.00  
##  Mean   :  54056                       Mean   :  54.06  
##  3rd Qu.:      0                       3rd Qu.:   0.00  
##  Max.   :2314138                       Max.   :2314.14

To start with the estimation we will need to compute the primary parameters of the sample. The following steps show what functions or equations are used to calculate the parameters in RStudio. Most of the functions used are from base R and their descriptions can be found here.

First, we calculate the sample mean \(\overline{y}\) and the sample variance \(s\).

n1 <- as.numeric(nrow(sample1))
y1 <- mean(sample1$elec_prod_mwh)
paste("The mean of sample 1: ", round(y1,2), "(mWh)")
## [1] "The mean of sample 1:  54.06 (mWh)"
N_all <- 2482 #all buildings in the city
s1 <- var(sample1$elec_prod_mwh)
paste("The variance of sample 1: ", round(s1,2), "(mWh)")
## [1] "The variance of sample 1:  78930.14 (mWh)"

The following equation calculates the unbiased variance of the estimator \(\overline{y}\):

\[\begin{equation} \hat{var}(\overline{y})= (\frac{N-n}{N})(\frac{s^2}{n}) \tag{4.3} \end{equation}\]

  • \(N\) is the total population size -> the number of all the buildings in La Plata (2,482).19
  • \(n\) is the sample size -> 100.
  • \(s^2\) is the sample variance.

The following equation calculates the estimated standard error of the estimator \(\overline{y}\) (SEM):

\[\begin{equation} SEM = \sqrt{\hat{var}(\overline{y})} \tag{4.4} \end{equation}\]

var_un1 <- ((N_all-n1)/N_all)*(s1/n1)
paste("The variance of the sample mean: ", round(var_un1,2) , "(mWh)")
## [1] "The variance of the sample mean:  757.5 (mWh)"
ese1 = sqrt(var_un1) #estimated standard error of variance
paste("The estimated standard error of the sample mean: ", round(ese1,2), "(mWh)")
## [1] "The estimated standard error of the sample mean:  27.52 (mWh)"

The following equation calculates an unbiased estimator of the population total \(\hat{t}\): \[\begin{equation} \hat{t} = N{\overline{y}} \tag{4.5} \end{equation}\]

The following equation calculates the unbiased variance of the estimator \(\hat{t}\):

\[\begin{equation} \hat{var}(\hat{t})= N^2\hat{var}(\overline{y}) \tag{4.6} \end{equation}\]

The following equation calculates the estimated standard error of the estimator \(\hat{t}\) (SET): \[\begin{equation} SET = \sqrt{\hat{var}(\hat{t})} \tag{4.7} \end{equation}\]

t_un1 = N_all*y1 
paste("The estimation of the renewable electricity production potential by all the buildings in the city:", round(t_un1,2), "(mWh)")
## [1] "The estimation of the renewable electricity production potential by all the buildings in the city: 134167.17 (mWh)"
t_var_un1 = (N_all^2) * var_un1 #unbiased variance with the estimate of the total
paste("The variance of the estimated total:", round(t_var_un1,2), "(mWh)")
## [1] "The variance of the estimated total: 4666447948.84 (mWh)"
ese_t1 = sqrt(t_var_un1) #estimated standard error of total
paste("The estimated standard error of the total:", round(ese_t1,2), "(mWh)")
## [1] "The estimated standard error of the total: 68311.4 (mWh)"

The following code calculates a 95% confidence interval for the sample.

#confidence interval with 95% 
CI1 <- t_un1 + qt(c(0.05,0.95), df=99) * ese_t1
paste("The 95% confidence interval estimation for sample 1 is: ", "(",round(CI1[1],2)," (mWh), ",round(CI1[2],2)," (mwh))", sep = "")
## [1] "The 95% confidence interval estimation for sample 1 is: (20743.51 (mWh), 247590.82 (mwh))"

We repeat these calculations for the sample with 300 building rooftops and both samples combined.

4.4.2 Equal size stratification

For the equal size strata, we have 60 building rooftops sampled for the built up area and non built up area. We combine the two strata to one sf object using the rbind function and compute the same calculations as in 4.4.1.

bu <- read_sf("strata_bu.shp")
bu <- st_transform(bu, "+proj=merc +lon_0=0 +k=1 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +units=m +no_defs")
#add a landuse column to differ between strata when combined
bu$landuse <- "Built up"
nonbu <- read_sf("strata_nonbu.shp")
nonbu <- st_transform(nonbu, "+proj=merc +lon_0=0 +k=1 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +units=m +no_defs")
nonbu$landuse <- "Non built up"

#combine strata and change to 0 above and equal 30 sqm
strat <- rbind(bu, nonbu)
strat$elec_prod[strat$area <= 30] <- 0
strat$elec_prod_mwh <- strat$elec_prod/1000

4.4.3 Optimal allocation stratification

The sample calculations from the built up area present more variable strata. To achieve a better estimation we would like to divide the city into strata using an optimal allocation design. The optimum scheme allocates a larger sample size to the more variable strata and a smaller sample size to the more difficult-to-sample strata.

Therefore, we will follow section 4.2 and compute more spatial random points in the built up strata and less in the non built up strata.

After, we will add again the optimally allocated strata to RStudio and calculate the parameters for this sample, as described in 4.4.1 and 4.4.2.

In this analysis for the built up strata 95 building rooftops were sampled, and for the non built up strata 5.


  1. DATA FROM 2019, CONTRIBUTED BY JULIETA FREDIANI FROM THE INSTITUTE OF RESEARCH AND POLICIES OF THE BUILT ENVIRONMENT - FACULTY OF ARCHITECTURE AND URBANISM - NATIONAL UNIVERSITY OF LA PLATA.↩︎