1 Simple Random Spatial Sampling

1.1 Rosario sample size calculation

The following formula can help decide the most suitable size for a sample:

\[\begin{equation} n_0 = \frac{Z^2p(1-p)}{e^2} \tag{1.1} \end{equation}\]

Where \(n_0\) is the sample size, \(Z^2\) is the confidence level Z-score (can be found in this table), \(p\) is the estimated proportion of variability in the population and \(e^2\) is the margin of error, a.k.a. a confidence interval.

**Usually with a large population where there is no knowledge about the proportion of variability in the population -> \(p=0.5\) (the maximum variability).

La Plata suitable sample size with a 5% confidence interval:

\[\begin{equation} n_0 = \frac{1.96^2*0.5*(1-0.5)}{0.05^2} = 384.16 \tag{1.2} \end{equation}\]

Therefore the most suitable sample to collect for Rosario would be 384 building rooftops.

In this analysis two sets of random spatial samples have been drawn from the city of Rosario:

  1. A sample of 100 building rooftops
  2. A sample of 300 building rooftops
  3. And both samples combined

The samples were created using the Vector Research Tools -> Random points inside a polygon in QGIS1. Then the building rooftops were digitized in QGIS using Google Satellite Hybrid2, a Tile Map Service (TMS) layer.

For every sample the rooftop area \(m^2\), the mean global horizontal irradiation \((\frac{kWh}{m^2})\), the usable solar radiation \((kWh)\) and renewable electricity production \((kWh)\) were calculated.

Buildings rooftops area that are equal and under 30 \(m^2\) are defined as 0. Some of the sample points were computed on non built-up areas/roads/parks etc., therefore they were given a 0 to include a density factor in the calculation.

## Simple feature collection with 6 features and 6 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: -6763699 ymin: -3875027 xmax: -6755168 ymax: -3855457
## CRS:           +proj=merc +lon_0=0 +k=1 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +units=m +no_defs
## # A tibble: 6 x 7
##      id  area X_mean usable_sr elec_prod                  geometry elec_prod_mwh
##   <dbl> <dbl>  <dbl>     <dbl>     <dbl>             <POLYGON [m]>         <dbl>
## 1     1 0.764  1804.    1379.          0 ((-6755170 -3875027, -67~             0
## 2     2 0.089  1819.     162.          0 ((-6756412 -3858792, -67~             0
## 3     3 0.094  1809.     170.          0 ((-6763699 -3857453, -67~             0
## 4     4 0.043  1804.      77.6         0 ((-6758804 -3871917, -67~             0
## 5     5 0.07   1810.     127.          0 ((-6763218 -3855457, -67~             0
## 6     6 0.121  1807.     219.          0 ((-6761505 -3863212, -67~             0
##        id              area              X_mean       usable_sr       
##  Min.   :  1.00   Min.   :   0.033   Min.   :1802   Min.   :      60  
##  1st Qu.: 25.75   1st Qu.:   0.117   1st Qu.:1806   1st Qu.:     212  
##  Median : 50.50   Median :   0.239   Median :1808   Median :     431  
##  Mean   : 50.50   Mean   : 139.849   Mean   :1810   Mean   :  253442  
##  3rd Qu.: 75.25   3rd Qu.:  66.111   3rd Qu.:1812   3rd Qu.:  119567  
##  Max.   :100.00   Max.   :7062.816   Max.   :1823   Max.   :12808176  
##    elec_prod                geometry   elec_prod_mwh    
##  Min.   :      0   POLYGON      :100   Min.   :   0.00  
##  1st Qu.:      0   epsg:NA      :  0   1st Qu.:   0.00  
##  Median :      0   +proj=merc...:  0   Median :   0.00  
##  Mean   :  32615                       Mean   :  32.62  
##  3rd Qu.:  15424                       3rd Qu.:  15.42  
##  Max.   :1652255                       Max.   :1652.25
## Simple feature collection with 6 features and 6 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: -6763874 ymin: -3872301 xmax: -6749869 ymax: -3856997
## CRS:           +proj=merc +lon_0=0 +k=1 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +units=m +no_defs
## # A tibble: 6 x 7
##      id X_mean   area usable_sr elec_prod                               geometry
##   <dbl>  <dbl>  <dbl>     <dbl>     <dbl>                          <POLYGON [m]>
## 1     1  1809.  0.171     309.         0  ((-6749870 -3868427, -6749869 -386842~
## 2     2  1808.  0.301     544.         0  ((-6750369 -3872301, -6750369 -387230~
## 3     3  1804. 99.9    180316.     23261. ((-6760088 -3868114, -6760077 -386811~
## 4     4  1812.  0.038      68.8        0  ((-6762068 -3856998, -6762068 -385699~
## 5     5  1809.  0.096     174.         0  ((-6753258 -3871456, -6753257 -387145~
## 6     6  1808.  0.077     139.         0  ((-6763874 -3858031, -6763874 -385803~
## # ... with 1 more variable: elec_prod_mwh <dbl>
##        id            X_mean          area            usable_sr       
##  Min.   :  1.0   Min.   :1803   Min.   :   0.009   Min.   :      16  
##  1st Qu.: 75.5   1st Qu.:1806   1st Qu.:   0.056   1st Qu.:     101  
##  Median :150.0   Median :1809   Median :   0.099   Median :     179  
##  Mean   :150.1   Mean   :1810   Mean   : 108.857   Mean   :  196903  
##  3rd Qu.:224.5   3rd Qu.:1813   3rd Qu.:  54.150   3rd Qu.:   98422  
##  Max.   :300.0   Max.   :1823   Max.   :6030.465   Max.   :10880979  
##  NA's   :1                                                           
##    elec_prod                geometry   elec_prod_mwh    
##  Min.   :      0   POLYGON      :300   Min.   :   0.00  
##  1st Qu.:      0   epsg:NA      :  0   1st Qu.:   0.00  
##  Median :      0   +proj=merc...:  0   Median :   0.00  
##  Mean   :  25360                       Mean   :  25.36  
##  3rd Qu.:  12696                       3rd Qu.:  12.70  
##  Max.   :1403646                       Max.   :1403.65  
## 

1.2 Sample 1 computations

Calculation of the sample mean is represented by \(\overline{y}\). Calculation of the sample variance is represented by \(s^2\).

## [1] "The mean of sample 1:  32.62 (mWh)"
## [1] "The variance of sample 1:  28445.82 (mWh)"

The following equation calculates the unbiased variance of the estimator \(\overline{y}\):

\[\begin{equation} \hat{var}(\overline{y})= (\frac{N-n}{N})(\frac{s^2}{n}) \tag{1.3} \end{equation}\]

  • \(N\) is the total population size -> the number of all the buildings in Rosario (357,126).
  • \(n\) is the sample size -> 100.
  • \(s^2\) is the sample variance.

The following equation calculates the estimated standard error of the estimator \(\overline{y}\):

\[\begin{equation} SEM = \sqrt{\hat{var}(\overline{y})} \tag{1.4} \end{equation}\]

## [1] "The variance of the sample mean:  284.38 (mWh)"
## [1] "The estimated standard error of the sample mean:  16.86 (mWh)"

The following equation calculates an unbiased estimator of the population total \(\hat{t}\): \[\begin{equation} \hat{t} = N{\overline{y}} \tag{1.5} \end{equation}\]

The following equation calculates the unbiased variance of the estimator \(\hat{t}\):

\[\begin{equation} \hat{var}(\hat{t})= N^2\hat{var}(\overline{y}) \tag{1.6} \end{equation}\]

The following equation calculates the estimated standard error of the estimator \(\hat{t}\): \[\begin{equation} SET = \sqrt{\hat{var}(\hat{t})} \tag{1.7} \end{equation}\]

## [1] "The estimation of the renewable electricity production potential by all the buildings in the city: 11647736.41 (mWh)"
## [1] "The variance of the estimated total: 36269344306960.9 (mWh)"
## [1] "The estimated standard error of the total: 6022403.53 (mWh)"
## [1] "The 95% confidence interval estimation for sample 1 is: (1648190.85 (mWh), 21647281.97 (mwh))"

1.3 Sample 2 computations

Calculation of the sample mean is represented by \(\overline{y}\). Calculation of the sample variance is represented by \(s\).

## [1] "The mean of sample 2:  25.36 (mWh)"
## [1] "The variance of sample 2:  15541.21 (mWh)"

Equation (1.3) calculates the unbiased variance of the estimator \(\overline{y}\).

Equation (1.4) calculates the estimated standard error of the estimator \(\overline{y}\).

## [1] "The variance of the sample mean:  51.76 (mWh)"
## [1] "The estimated standard error of the sample mean:  7.19 (mWh)"

Equation (1.5) calculates an unbiased estimator of the population total \(\hat{t}\).

Equation (1.6) calculates the unbiased variance of the estimator \(\hat{t}\).

Equation (1.7) calculates the estimated standard error of the estimator \(\hat{t}\).

## [1] "The estimation of the renewable electricity production potential by all the buildings in the city: 9056833.33 (mWh)"
## [1] "The variance of the estimated total: 6601484325570.02 (mWh)"
## [1] "The estimated standard error of the total: 2569335.39 (mWh)"
## [1] "The 95% confidence interval estimation for sample 2 is: (4817517.9 (mWh), 13296148.76 (mwh))"

1.4 Sample 1 and 2 combined computations

The following calculations are for both sample 1 and sample 2 together, resulting in a total sample of 400 building rooftops. We will call this sample, sample 3.

Calculation of the sample mean is represented by \(\overline{y}\). Calculation of the sample variance is represented by \(s\).

## [1] "The mean of sample 3:  27.17 (mWh)"
## [1] "The variance of sample 3:  18714.05 (mWh)"

Equation (1.3) calculates the unbiased variance of the estimator \(\overline{y}\).

Equation (1.4) calculates the estimated standard error of the estimator \(\overline{y}\).

## [1] "The variance of the sample mean:  46.73 (mWh)"
## [1] "The estimated standard error of the sample mean:  6.84 (mWh)"

Equation (1.5) calculates an unbiased estimator of the population total \(\hat{t}\).

Equation (1.6) calculates the unbiased variance of the estimator \(\hat{t}\).

Equation (1.7) calculates the estimated standard error of the estimator \(\hat{t}\).

## [1] "The estimation of the renewable electricity production potential by all the buildings in the city: 9704559.1 (mWh)"
## [1] "The variance of the estimated total: 5960243624377.49 (mWh)"
## [1] "The estimated standard error of the total: 2441361.02 (mWh)"
## [1] "The 95% confidence interval estimation for sample 3 is: (5679532.27 (mWh), 13729585.93 (mwh))"

2 Stratified Random Spatial Sampling

Rosario is comprised by 6 districts:

  1. The northern district (Distrito Norte)
  2. The northwest district (Distrito Noroeste)
  3. The western district (Distrito Oeste)
  4. The southwest district (Distrito Sudoeste)
  5. The southern district (Distrito Sur)
  6. The central district (Distrito Centro)

The division of each municipal district in Rosario was framed in a general structuring that contemplated the construction and connection of routes from north to south that allowed to create a new metropolitan axis. The formation of the districts had an important impact on the urban structure, by connecting the agriculture and peripheral districts (north, northwest and west districts) with the city. This enabled opening large avenues, constructing important road junctions, creating public, community and service spaces, and in some districts building new residential homes.

For each district 20 random points were computed and then 20 building rooftops were digitized using the same methods as in chapter 1.

## Simple feature collection with 6 features and 8 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: -6765175 ymin: -3857893 xmax: -6760942 ymax: -3856008
## CRS:           +proj=merc +lon_0=0 +k=1 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +units=m +no_defs
## # A tibble: 6 x 9
##      id   area usable_sr elec_prod X_mean                         geometry group
##   <dbl>  <dbl>     <dbl>     <dbl>  <dbl>                    <POLYGON [m]> <dbl>
## 1     1  2.37      4284.        0   1809. ((-6765175 -3856011, -6765173 -~     1
## 2     2  2.66      4820.        0   1809. ((-6764898 -3856632, -6764898 -~     1
## 3     3  6.22     11248.        0   1809. ((-6764207 -3856125, -6764204 -~     1
## 4     4  7.70     13922.        0   1809. ((-6763204 -3857893, -6763201 -~     1
## 5     5 76.0     137933.    17793.  1815. ((-6760955 -3857635, -6760945 -~     1
## 6     6  0.990     1795.        0   1813. ((-6761655 -3856098, -6761654 -~     1
## # ... with 2 more variables: district <chr>, elec_prod_mwh <dbl>
##        id             area             usable_sr          elec_prod      
##  Min.   : 1.00   Min.   :    0.040   Min.   :      72   Min.   :      0  
##  1st Qu.: 5.75   1st Qu.:    0.909   1st Qu.:    1644   1st Qu.:      0  
##  Median :10.50   Median :    8.923   Median :   17760   Median :      0  
##  Mean   :10.50   Mean   :  525.832   Mean   :  955700   Mean   : 122105  
##  3rd Qu.:15.25   3rd Qu.:  156.555   3rd Qu.:  293250   3rd Qu.:  36554  
##  Max.   :20.00   Max.   :27683.743   Max.   :50023871   Max.   :6453079  
##      X_mean              geometry       group       district        
##  Min.   :1802   POLYGON      :120   Min.   :1.0   Length:120        
##  1st Qu.:1806   epsg:NA      :  0   1st Qu.:2.0   Class :character  
##  Median :1808   +proj=merc...:  0   Median :3.5   Mode  :character  
##  Mean   :1809                       Mean   :3.5                     
##  3rd Qu.:1812                       3rd Qu.:5.0                     
##  Max.   :1822                       Max.   :6.0                     
##  elec_prod_mwh    
##  Min.   :   0.00  
##  1st Qu.:   0.00  
##  Median :   0.00  
##  Mean   : 122.11  
##  3rd Qu.:  36.55  
##  Max.   :6453.08

After stratification, the strata are combined to one sample and the computations are the same as for random sampling.

##            district elec_prod_mwh
## 1   Distrito Centro    299.390164
## 2 Distrito Noroeste      8.515353
## 3    Distrito Norte     17.158441
## 4    Distrito Oesto     24.763423
## 5 Distrito Sudoeste    353.656280
## 6      Distrito Sur     29.146892
## [1] "The mean of the stratfied sample is:  122.11 (mWh)"
## [1] "The variance of the stratfied sample is:  436446.63 (mWh)"

Equation (1.3) calculates the unbiased variance of the estimator \(\overline{y}\).

Equation (1.4) calculates the estimated standard error of the estimator \(\overline{y}\).

## [1] "The variance of the sample mean:  3635.83 (mWh)"
## [1] "The estimated standard error of the sample mean:  60.3 (mWh)"

Equation (1.5) calculates an unbiased estimator of the population total \(\hat{t}\).

Equation (1.6) calculates the unbiased variance of the estimator \(\hat{t}\).

Equation (1.7) calculates the estimated standard error of the estimator \(\hat{t}\).

## [1] "The estimation of the renewable electricity production potential by all the buildings in the city: 43606903.1 (mWh)"
## [1] "The variance of the estimated total: 463710453937805 (mWh)"
## [1] "The estimated standard error of the total: 21533937.26 (mWh)"
## [1] "The 95% confidence interval estimation for the stratified sample is: (7908818.67 (mWh), 79304987.54 (mwh))"

3 95% Confidence Intervals

4 Summarizing Table

Table 4.1: Summary of Computation Results (mWh)
Sample Mean Var of Mean SEM Total Estimation Var of Total SET Lower CI Upper CI
Sample 1 - 100 32.62 284.38 16.86 11,647,736 36,269,344,306,961 6,022,404 1,648,191 21,647,282
Sample 2 - 300 25.36 51.76 7.19 9,056,833 6,601,484,325,570 2,569,335 4,817,518 13,296,149
Both Samples - 400 27.17 46.73 6.84 9,704,559 5,960,243,624,377 2,441,361 5,679,532 13,729,586
Stratified Sample - 120 122.11 3,635.83 60.30 43,606,903 463,710,453,937,805 21,533,937 7,908,819 79,304,988

  1. https://www.qgis.org/en/site/↩︎

  2. https://qms.nextgis.com/geoservices/1135/↩︎