1 Simple Random Spatial Sampling

1.1 La Plata sample size calculation

The following formula can help decide the most suitable size for a sample:

\[\begin{equation} n_0 = \frac{Z^2p(1-p)}{e^2} \tag{1.1} \end{equation}\]

Where \(n_0\) is the sample size, \(Z^2\) is the confidence level Z-score (can be found in this table), \(p\) is the estimated proportion of variability in the population and \(e^2\) is the margin of error, a.k.a. a confidence interval.

**Usually with a large population where there is no knowledge about the proportion of variability in the population -> \(p=0.5\) (the maximum variability).

La Plata suitable sample size with a 5% confidence interval:

\[\begin{equation} n_0 = \frac{1.96^2*0.5*(1-0.5)}{0.05^2} = 384.16 \tag{1.2} \end{equation}\]

Therefore the most suitable sample to collect for La Plata would be 384 building rooftops.

In this analysis two sets of random spatial samples have been drawn from the city of La Plata:

  1. A sample of 100 building rooftops
  2. A sample of 300 building rooftops
  3. And both samples combined

The samples were created using the Vector Research Tools -> Random points inside a polygon in QGIS1. Then the building rooftops were digitized in QGIS using Google Satellite Hybrid2, a Tile Map Service (TMS) layer.

For every sample the rooftop area \(m^2\), the mean global horizontal irradiation \((\frac{kWh}{m^2})\), the usable solar radiation \((kWh)\) and renewable electricity production \((kWh)\) were calculated.

Buildings rooftops area that are equal and under 30 \(m^2\) are defined as 0. Some of the sample points were computed on non built-up areas/roads/parks etc., therefore they were given a 0 to include a density factor in the calculation.

## Simple feature collection with 6 features and 6 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: -6480273 ymin: -4138361 xmax: -6455331 ymax: -4118445
## CRS:           +proj=merc +lon_0=0 +k=1 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +units=m +no_defs
## # A tibble: 6 x 7
##      id    area X_mean usable_sr elec_prod                              geometry
##   <dbl>   <dbl>  <dbl>     <dbl>     <dbl>                         <POLYGON [m]>
## 1     3  91.0    1728.   157246.    20285. ((-6461504 -4132034, -6461500 -41320~
## 2     4 131.     1735.   227935.    29404. ((-6466451 -4118457, -6466438 -41184~
## 3     8  59.1    1736.   102639.    13240. ((-6465099 -4118684, -6465094 -41186~
## 4     1   0.189  1729.      327.        0  ((-6455332 -4132428, -6455331 -41324~
## 5     2   0.063  1725.      109.        0  ((-6480273 -4138360, -6480272 -41383~
## 6     5   0.077  1726.      133.        0  ((-6468889 -4135469, -6468889 -41354~
## # ... with 1 more variable: elec_prod_mwh <dbl>
##        id              area               X_mean       usable_sr       
##  Min.   :  1.00   Min.   :    0.002   Min.   :1721   Min.   :       3  
##  1st Qu.: 25.75   1st Qu.:    0.021   1st Qu.:1724   1st Qu.:      36  
##  Median : 50.50   Median :    0.078   Median :1727   Median :     134  
##  Mean   : 50.50   Mean   :  243.218   Mean   :1727   Mean   :  419697  
##  3rd Qu.: 75.25   3rd Qu.:    0.240   3rd Qu.:1729   3rd Qu.:     413  
##  Max.   :100.00   Max.   :10405.603   Max.   :1740   Max.   :17939051  
##    elec_prod                geometry   elec_prod_mwh    
##  Min.   :      0   POLYGON      :100   Min.   :   0.00  
##  1st Qu.:      0   epsg:NA      :  0   1st Qu.:   0.00  
##  Median :      0   +proj=merc...:  0   Median :   0.00  
##  Mean   :  54056                       Mean   :  54.06  
##  3rd Qu.:      0                       3rd Qu.:   0.00  
##  Max.   :2314138                       Max.   :2314.14
## Simple feature collection with 6 features and 6 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: -6459285 ymin: -4157358 xmax: -6437314 ymax: -4133646
## CRS:           +proj=merc +lon_0=0 +k=1 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +units=m +no_defs
## # A tibble: 6 x 7
##      id   area X_mean usable_sr elec_prod                               geometry
##   <dbl>  <dbl>  <dbl>     <dbl>     <dbl>                          <POLYGON [m]>
## 1     1 90.4    1732.  156456.     20183. ((-6444962 -4133747, -6444957 -413374~
## 2     2  0.036  1722.      62.0        0  ((-6459285 -4157357, -6459285 -415735~
## 3     3  0.144  1730.     249.         0  ((-6444164 -4138267, -6444164 -413826~
## 4     4  0.065  1731.     112.         0  ((-6447967 -4135898, -6447966 -413589~
## 5     5  0.23   1732.     398.         0  ((-6445041 -4133647, -6445040 -413364~
## 6     6  0.212  1732.     367.         0  ((-6437315 -4138498, -6437314 -413849~
## # ... with 1 more variable: elec_prod_mwh <dbl>
##        id              area               X_mean       usable_sr       
##  Min.   :  1.00   Min.   :    0.007   Min.   :1721   Min.   :      12  
##  1st Qu.: 75.75   1st Qu.:    0.057   1st Qu.:1725   1st Qu.:      98  
##  Median :150.50   Median :    0.117   Median :1727   Median :     204  
##  Mean   :150.50   Mean   :  365.856   Mean   :1728   Mean   :  631475  
##  3rd Qu.:225.25   3rd Qu.:    0.405   3rd Qu.:1730   3rd Qu.:     699  
##  Max.   :300.00   Max.   :13225.659   Max.   :1740   Max.   :22824922  
##    elec_prod                geometry   elec_prod_mwh   
##  Min.   :      0   POLYGON      :300   Min.   :   0.0  
##  1st Qu.:      0   epsg:NA      :  0   1st Qu.:   0.0  
##  Median :      0   +proj=merc...:  0   Median :   0.0  
##  Mean   :  81401                       Mean   :  81.4  
##  3rd Qu.:      0                       3rd Qu.:   0.0  
##  Max.   :2944415                       Max.   :2944.4

1.2 Sample 1 computations

Calculation of the sample mean is represented by \(\overline{y}\). Calculation of the sample variance is represented by \(s^2\).

## [1] "The mean of sample 1:  54.06 (mWh)"
## [1] "The variance of sample 1:  78930.14 (mWh)"

The following equation calculates the unbiased variance of the estimator \(\overline{y}\):

\[\begin{equation} \hat{var}(\overline{y})= (\frac{N-n}{N})(\frac{s^2}{n}) \tag{1.3} \end{equation}\]

  • \(N\) is the total population size -> the number of all the buildings in La Plata (2,482).
  • \(n\) is the sample size -> 100.
  • \(s^2\) is the sample variance.

The following equation calculates the estimated standard error of the estimator \(\overline{y}\):

\[\begin{equation} SEM = \sqrt{\hat{var}(\overline{y})} \tag{1.4} \end{equation}\]

## [1] "The variance of the sample mean:  757.5 (mWh)"
## [1] "The estimated standard error of the sample mean:  27.52 (mWh)"

The following equation calculates an unbiased estimator of the population total \(\hat{t}\): \[\begin{equation} \hat{t} = N{\overline{y}} \tag{1.5} \end{equation}\]

The following equation calculates the unbiased variance of the estimator \(\hat{t}\):

\[\begin{equation} \hat{var}(\hat{t})= N^2\hat{var}(\overline{y}) \tag{1.6} \end{equation}\]

The following equation calculates the estimated standard error of the estimator \(\hat{t}\): \[\begin{equation} SET = \sqrt{\hat{var}(\hat{t})} \tag{1.7} \end{equation}\]

## [1] "The estimation of the renewable electricity production potential by all the buildings in the city: 134167.17 (mWh)"
## [1] "The variance of the estimated total: 4666447948.84 (mWh)"
## [1] "The estimated standard error of the total: 68311.4 (mWh)"
## [1] "The 95% confidence interval estimation for sample 1 is: (20743.51 (mWh), 247590.82 (mwh))"

1.3 Sample 2 computations

Calculation of the sample mean is represented by \(\overline{y}\). Calculation of the sample variance is represented by \(s\).

## [1] "The mean of sample 2:  81.4 (mWh)"
## [1] "The variance of sample 2:  109098.9 (mWh)"

Equation (1.3) calculates the unbiased variance of the estimator \(\overline{y}\).

Equation (1.4) calculates the estimated standard error of the estimator \(\overline{y}\).

## [1] "The variance of the sample mean:  319.71 (mWh)"
## [1] "The estimated standard error of the sample mean:  17.88 (mWh)"

Equation (1.5) calculates an unbiased estimator of the population total \(\hat{t}\).

Equation (1.6) calculates the unbiased variance of the estimator \(\hat{t}\).

Equation (1.7) calculates the estimated standard error of the estimator \(\hat{t}\).

## [1] "The estimation of the renewable electricity production potential by all the buildings in the city: 202036.58 (mWh)"
## [1] "The variance of the estimated total: 1969498517.02 (mWh)"
## [1] "The estimated standard error of the total: 44379.03 (mWh)"
## [1] "The 95% confidence interval estimation for sample 2 is: (128812.7 (mWh), 275260.47 (mwh))"

1.4 Sample 1 and 2 combined computations

The following calculations are for both sample 1 and sample 2 together, resulting in a total sample of 400 building rooftops. We will call this sample, sample 3.

Calculation of the sample mean is represented by \(\overline{y}\). Calculation of the sample variance is represented by \(s\).

## [1] "The mean of sample 3:  74.56 (mWh)"
## [1] "The variance of sample 3:  101480.54 (mWh)"

Equation (1.3) calculates the unbiased variance of the estimator \(\overline{y}\).

Equation (1.4) calculates the estimated standard error of the estimator \(\overline{y}\).

## [1] "The variance of the sample mean:  212.81 (mWh)"
## [1] "The estimated standard error of the sample mean:  14.59 (mWh)"

Equation (1.5) calculates an unbiased estimator of the population total \(\hat{t}\).

Equation (1.6) calculates the unbiased variance of the estimator \(\hat{t}\).

Equation (1.7) calculates the estimated standard error of the estimator \(\hat{t}\).

## [1] "The estimation of the renewable electricity production potential by all the buildings in the city: 185069.23 (mWh)"
## [1] "The variance of the estimated total: 1311007843.95 (mWh)"
## [1] "The estimated standard error of the total: 36207.84 (mWh)"
## [1] "The 95% confidence interval estimation for sample 3 is: (125374.03 (mWh), 244764.43 (mwh))"

2 Stratified Random Spatial Sampling

2.1 Equal sample size stratification

La Plata is divided to 2 strata based on satellite imagery provided by the Copernicus Land Monitoring Service global maps of land cover & cover changes and related surface area statistics3.

Strata 1 represents built up area in the city and Strata 2 represents non built up area.

For each strata 60 random points were computed and then 20 building rooftops were digitized using the same methods as in chapter 1.

## Simple feature collection with 6 features and 7 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: -6476517 ymin: -4140116 xmax: -6455221 ymax: -4125471
## CRS:           +proj=merc +lon_0=0 +k=1 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +units=m +no_defs
## # A tibble: 6 x 8
##      id    area X_mean  usable_sr elec_prod                     geometry landuse
##   <dbl>   <dbl>  <dbl>      <dbl>     <dbl>                <POLYGON [m]> <chr>  
## 1     1   0.004  1725.       6.90        0  ((-6467786 -4139229, -64677~ Built ~
## 2     2 834.     1727. 1440503.     185825. ((-6476517 -4129667, -64764~ Built ~
## 3     3   0.143  1727.     247.          0  ((-6459586 -4135615, -64595~ Built ~
## 4     4  69.1    1728.  119366.      15398. ((-6455235 -4134598, -64552~ Built ~
## 5     5   0.057  1726.      98.4         0  ((-6464188 -4140116, -64641~ Built ~
## 6     6   0.06   1728.     104.          0  ((-6471555 -4125471, -64715~ Built ~
## # ... with 1 more variable: elec_prod_mwh <dbl>
##        id             area               X_mean       usable_sr       
##  Min.   : 1.00   Min.   :    0.001   Min.   :1721   Min.   :       2  
##  1st Qu.:15.75   1st Qu.:    0.016   1st Qu.:1727   1st Qu.:      27  
##  Median :30.50   Median :    0.039   Median :1729   Median :      68  
##  Mean   :30.50   Mean   :  143.529   Mean   :1730   Mean   :  247773  
##  3rd Qu.:45.25   3rd Qu.:    0.078   3rd Qu.:1734   3rd Qu.:     134  
##  Max.   :60.00   Max.   :11988.234   Max.   :1740   Max.   :20680615  
##    elec_prod                geometry     landuse          elec_prod_mwh    
##  Min.   :      0   POLYGON      :120   Length:120         Min.   :   0.00  
##  1st Qu.:      0   epsg:NA      :  0   Class :character   1st Qu.:   0.00  
##  Median :      0   +proj=merc...:  0   Mode  :character   Median :   0.00  
##  Mean   :  31870                                          Mean   :  31.87  
##  3rd Qu.:      0                                          3rd Qu.:   0.00  
##  Max.   :2667799                                          Max.   :2667.80

After stratification, the strata are combined to one sample and the computations are the same as for random sampling.

##        landuse elec_prod_mwh
## 1     Built up      63.73959
## 2 Non built up       0.00000
## [1] "The mean of the stratfied sample is:  31.87 (mWh)"
## [1] "The variance of the stratfied sample is:  62862.69 (mWh)"

Equation (1.3) calculates the unbiased variance of the estimator \(\overline{y}\).

Equation (1.4) calculates the estimated standard error of the estimator \(\overline{y}\).

## [1] "The variance of the sample mean:  498.53 (mWh)"
## [1] "The estimated standard error of the sample mean:  22.33 (mWh)"

Equation (1.5) calculates an unbiased estimator of the population total \(\hat{t}\).

Equation (1.6) calculates the unbiased variance of the estimator \(\hat{t}\).

Equation (1.7) calculates the estimated standard error of the estimator \(\hat{t}\).

## [1] "The estimation of the renewable electricity production potential by all the buildings in the city: 79100.84 (mWh)"
## [1] "The variance of the estimated total: 3071095766.73 (mWh)"
## [1] "The estimated standard error of the total: 55417.47 (mWh)"
## [1] "The 95% confidence interval estimation for the stratified sample is: (-12767.99 (mWh), 170969.66 (mwh))"

2.2 Optimal allocation stratification

When dividing the city of La Plata into built up and non built up area, the sample calculations from the built up area present more variable strata. The optimum scheme allocates larger sample size to the more variable strata and smaller sample size to the more difficult-to-sample strata.4

Therefore, 2 new strata are computed -

  1. Optimally allocated strata in the built up area - a sample of 95 building rooftops
  2. Optimally allocated strata in the non built up area - a sample of 5 building rooftops.
## Simple feature collection with 6 features and 7 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: -6463597 ymin: -4130860 xmax: -6448816 ymax: -4123499
## CRS:           +proj=merc +lon_0=0 +k=1 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +units=m +no_defs
## # A tibble: 6 x 8
##      id    area X_mean usable_sr elec_prod                      geometry landuse
##   <dbl>   <dbl>  <dbl>     <dbl>     <dbl>                 <POLYGON [m]> <chr>  
## 1     1 175.     1732.  303058.     39094. ((-6452163 -4130836, -645215~ Built ~
## 2     2  91.7    1735.  159086.     20522. ((-6448832 -4127833, -644882~ Built ~
## 3     3   0.066  1739.     115.         0  ((-6457547 -4125248, -645754~ Built ~
## 4     4   0.006  1739.      10.4        0  ((-6457997 -4123499, -645799~ Built ~
## 5     5  54.8    1731.   94900.     12242. ((-6463593 -4124949, -646358~ Built ~
## 6     6  56.2    1735.   97564.     12586. ((-6456051 -4129352, -645604~ Built ~
## # ... with 1 more variable: elec_prod_mwh <dbl>
##        id             area              X_mean       usable_sr       
##  Min.   : 1.00   Min.   :   0.001   Min.   :   0   Min.   :       0  
##  1st Qu.:20.75   1st Qu.:   0.006   1st Qu.:1729   1st Qu.:      10  
##  Median :45.50   Median :   0.021   Median :1733   Median :      36  
##  Mean   :45.75   Mean   : 131.517   Mean   :1716   Mean   :  227277  
##  3rd Qu.:70.25   3rd Qu.:  53.038   3rd Qu.:1736   3rd Qu.:   91834  
##  Max.   :95.00   Max.   :6109.904   Max.   :1741   Max.   :10546739  
##    elec_prod                geometry     landuse          elec_prod_mwh    
##  Min.   :      0   POLYGON      :100   Length:100         Min.   :   0.00  
##  1st Qu.:      0   epsg:NA      :  0   Class :character   1st Qu.:   0.00  
##  Median :      0   +proj=merc...:  0   Mode  :character   Median :   0.00  
##  Mean   :  29316                                          Mean   :  29.32  
##  3rd Qu.:  11847                                          3rd Qu.:  11.85  
##  Max.   :1360529                                          Max.   :1360.53

After stratification, the strata are combined to one sample and the computations are the same as for random sampling.

##        landuse elec_prod_mwh
## 1     Built up      30.85859
## 2 Non built up       0.00000
## [1] "The mean of the stratfied sample is:  29.32 (mWh)"
## [1] "The variance of the stratfied sample is:  25362.42 (mWh)"

Equation (1.3) calculates the unbiased variance of the estimator \(\overline{y}\).

Equation (1.4) calculates the estimated standard error of the estimator \(\overline{y}\).

## [1] "The variance of the sample mean:  498.53 (mWh)"
## [1] "The estimated standard error of the sample mean:  15.6 (mWh)"

Equation (1.5) calculates an unbiased estimator of the population total \(\hat{t}\).

Equation (1.6) calculates the unbiased variance of the estimator \(\hat{t}\).

Equation (1.7) calculates the estimated standard error of the estimator \(\hat{t}\).

## [1] "The estimation of the renewable electricity production potential by all the buildings in the city: 72761.47 (mWh)"
## [1] "The variance of the estimated total: 1499457748.65 (mWh)"
## [1] "The estimated standard error of the total: 38722.83 (mWh)"
## [1] "The 95% confidence interval estimation for the stratified sample is: (8466.42 (mWh), 137056.52 (mwh))"

3 95% Confidence Intervals

4 Summarizing Table

Table 4.1: Summary of Computation Results (mWh)
Sample Mean Var of Mean SEM Total Estimation Var of Total SET Lower CI Upper CI
Sample 1 - 100 54.06 757.50 27.52 134,167.17 4,666,447,949 68,311.40 20,743.51 247,590.8
Sample 2 - 300 81.40 319.71 17.88 202,036.58 1,969,498,517 44,379.03 128,812.70 275,260.5
Both Samples - 400 74.56 212.81 14.59 185,069.23 1,311,007,844 36,207.84 125,374.03 244,764.4
Stratified Sample - 120 31.87 498.53 22.33 79,100.84 3,071,095,767 55,417.47 -12,767.99 170,969.7
Allocated Stratified Sample - 100 29.32 243.41 15.60 72,761.47 1,499,457,749 38,722.83 8,466.42 137,056.5

  1. https://www.qgis.org/en/site/↩︎

  2. https://qms.nextgis.com/geoservices/1135/↩︎

  3. https://lcviewer.vito.be/about↩︎

  4. Thompson, S. (2012). Sampling. Hoboken, N.J.: Wiley.p.147↩︎