4.4 RStudio calculations
To calculate the estimation of the total renewable electricity potential in La Plata we will use RStudio. If you have no prior experience with R or RStudio it is recommended to follow this Getting Started with Data in R tutorial.
The methodology, calculations and equations in this section are based on the book Sampling by Steven K. Thompson published in 2012. (See 6)
The full calculations for this section can be found here
First, we will open a new R Script, install the required packages for this analysis and set our work directory:
#install packages
install.packages("dplyr")
install.packages("sf")
#load pacakges
library(dplyr)
library(sf)
#set work directory
setwd(".../La Plata/data")
4.4.1 Random sampling
To start with our random sampling computations we will read the sample100.shp from our directory to RStudio with the read_sf
function and set its CRS with the st_transform
function. Both functions are from the Simple Features for R (sf) library.
<- read_sf("data/sample100.shp")
sample1 <- st_transform(sample1, "+proj=merc +lon_0=0 +k=1 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +units=m +no_defs") sample1
Next, we will assign all the building rooftops with an area of 30 and under, the value 0 to include our density factor. In addition, we will create a column with megawatt per hour (mWh) so the results will be easier to read.
#change to 0 above and equal 30 sqm
$elec_prod[sample1$area <= 30] <- 0
sample1#create mega watt per hour (mWh) column
$elec_prod_mwh <- sample1$elec_prod/1000
sample1
head(sample1)
## Simple feature collection with 6 features and 6 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: -6480273 ymin: -4138361 xmax: -6455331 ymax: -4118445
## CRS: +proj=merc +lon_0=0 +k=1 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +units=m +no_defs
## # A tibble: 6 x 7
## id area X_mean usable_sr elec_prod geometry
## <dbl> <dbl> <dbl> <dbl> <dbl> <POLYGON [m]>
## 1 3 91.0 1728. 157246. 20285. ((-6461504 -4132034, -6461500 -41320~
## 2 4 131. 1735. 227935. 29404. ((-6466451 -4118457, -6466438 -41184~
## 3 8 59.1 1736. 102639. 13240. ((-6465099 -4118684, -6465094 -41186~
## 4 1 0.189 1729. 327. 0 ((-6455332 -4132428, -6455331 -41324~
## 5 2 0.063 1725. 109. 0 ((-6480273 -4138360, -6480272 -41383~
## 6 5 0.077 1726. 133. 0 ((-6468889 -4135469, -6468889 -41354~
## # ... with 1 more variable: elec_prod_mwh <dbl>
summary(sample1)
## id area X_mean usable_sr
## Min. : 1.00 Min. : 0.002 Min. :1721 Min. : 3
## 1st Qu.: 25.75 1st Qu.: 0.021 1st Qu.:1724 1st Qu.: 36
## Median : 50.50 Median : 0.078 Median :1727 Median : 134
## Mean : 50.50 Mean : 243.218 Mean :1727 Mean : 419697
## 3rd Qu.: 75.25 3rd Qu.: 0.240 3rd Qu.:1729 3rd Qu.: 413
## Max. :100.00 Max. :10405.603 Max. :1740 Max. :17939051
## elec_prod geometry elec_prod_mwh
## Min. : 0 POLYGON :100 Min. : 0.00
## 1st Qu.: 0 epsg:NA : 0 1st Qu.: 0.00
## Median : 0 +proj=merc...: 0 Median : 0.00
## Mean : 54056 Mean : 54.06
## 3rd Qu.: 0 3rd Qu.: 0.00
## Max. :2314138 Max. :2314.14
To start with the estimation we will need to compute the primary parameters of the sample. The following steps show what functions or equations are used to calculate the parameters in RStudio. Most of the functions used are from base R and their descriptions can be found here.
First, we calculate the sample mean \(\overline{y}\) and the sample variance \(s\).
<- as.numeric(nrow(sample1))
n1 <- mean(sample1$elec_prod_mwh)
y1 paste("The mean of sample 1: ", round(y1,2), "(mWh)")
## [1] "The mean of sample 1: 54.06 (mWh)"
<- 2482 #all buildings in the city
N_all <- var(sample1$elec_prod_mwh)
s1 paste("The variance of sample 1: ", round(s1,2), "(mWh)")
## [1] "The variance of sample 1: 78930.14 (mWh)"
The following equation calculates the unbiased variance of the estimator \(\overline{y}\):
\[\begin{equation} \hat{var}(\overline{y})= (\frac{N-n}{N})(\frac{s^2}{n}) \tag{4.3} \end{equation}\]
- \(N\) is the total population size -> the number of all the buildings in La Plata (2,482).19
- \(n\) is the sample size -> 100.
- \(s^2\) is the sample variance.
The following equation calculates the estimated standard error of the estimator \(\overline{y}\) (SEM):
\[\begin{equation} SEM = \sqrt{\hat{var}(\overline{y})} \tag{4.4} \end{equation}\]
<- ((N_all-n1)/N_all)*(s1/n1)
var_un1 paste("The variance of the sample mean: ", round(var_un1,2) , "(mWh)")
## [1] "The variance of the sample mean: 757.5 (mWh)"
= sqrt(var_un1) #estimated standard error of variance
ese1 paste("The estimated standard error of the sample mean: ", round(ese1,2), "(mWh)")
## [1] "The estimated standard error of the sample mean: 27.52 (mWh)"
The following equation calculates an unbiased estimator of the population total \(\hat{t}\): \[\begin{equation} \hat{t} = N{\overline{y}} \tag{4.5} \end{equation}\]
The following equation calculates the unbiased variance of the estimator \(\hat{t}\):
\[\begin{equation} \hat{var}(\hat{t})= N^2\hat{var}(\overline{y}) \tag{4.6} \end{equation}\]
The following equation calculates the estimated standard error of the estimator \(\hat{t}\) (SET): \[\begin{equation} SET = \sqrt{\hat{var}(\hat{t})} \tag{4.7} \end{equation}\]
= N_all*y1
t_un1 paste("The estimation of the renewable electricity production potential by all the buildings in the city:", round(t_un1,2), "(mWh)")
## [1] "The estimation of the renewable electricity production potential by all the buildings in the city: 134167.17 (mWh)"
= (N_all^2) * var_un1 #unbiased variance with the estimate of the total
t_var_un1 paste("The variance of the estimated total:", round(t_var_un1,2), "(mWh)")
## [1] "The variance of the estimated total: 4666447948.84 (mWh)"
= sqrt(t_var_un1) #estimated standard error of total
ese_t1 paste("The estimated standard error of the total:", round(ese_t1,2), "(mWh)")
## [1] "The estimated standard error of the total: 68311.4 (mWh)"
The following code calculates a 95% confidence interval for the sample.
#confidence interval with 95%
<- t_un1 + qt(c(0.05,0.95), df=99) * ese_t1
CI1 paste("The 95% confidence interval estimation for sample 1 is: ", "(",round(CI1[1],2)," (mWh), ",round(CI1[2],2)," (mwh))", sep = "")
## [1] "The 95% confidence interval estimation for sample 1 is: (20743.51 (mWh), 247590.82 (mwh))"
We repeat these calculations for the sample with 300 building rooftops and both samples combined.
4.4.2 Equal size stratification
For the equal size strata, we have 60 building rooftops sampled for the built up area and non built up area. We combine the two strata to one sf
object using the rbind
function and compute the same calculations as in 4.4.1.
<- read_sf("strata_bu.shp")
bu <- st_transform(bu, "+proj=merc +lon_0=0 +k=1 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +units=m +no_defs")
bu #add a landuse column to differ between strata when combined
$landuse <- "Built up"
bu<- read_sf("strata_nonbu.shp")
nonbu <- st_transform(nonbu, "+proj=merc +lon_0=0 +k=1 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +units=m +no_defs")
nonbu $landuse <- "Non built up"
nonbu
#combine strata and change to 0 above and equal 30 sqm
<- rbind(bu, nonbu)
strat $elec_prod[strat$area <= 30] <- 0
strat$elec_prod_mwh <- strat$elec_prod/1000 strat
4.4.3 Optimal allocation stratification
The sample calculations from the built up area present more variable strata. To achieve a better estimation we would like to divide the city into strata using an optimal allocation design. The optimum scheme allocates a larger sample size to the more variable strata and a smaller sample size to the more difficult-to-sample strata.
Therefore, we will follow section 4.2 and compute more spatial random points in the built up strata and less in the non built up strata.
After, we will add again the optimally allocated strata to RStudio and calculate the parameters for this sample, as described in 4.4.1 and 4.4.2.
In this analysis for the built up strata 95 building rooftops were sampled, and for the non built up strata 5.
DATA FROM 2019, CONTRIBUTED BY JULIETA FREDIANI FROM THE INSTITUTE OF RESEARCH AND POLICIES OF THE BUILT ENVIRONMENT - FACULTY OF ARCHITECTURE AND URBANISM - NATIONAL UNIVERSITY OF LA PLATA.↩︎