The following formula can help decide the most suitable size for a sample:
\[\begin{equation} n_0 = \frac{Z^2p(1-p)}{e^2} \tag{1.1} \end{equation}\]
Where \(n_0\) is the sample size, \(Z^2\) is the confidence level Z-score (can be found in this table), \(p\) is the estimated proportion of variability in the population and \(e^2\) is the margin of error, a.k.a. a confidence interval.
**Usually with a large population where there is no knowledge about the proportion of variability in the population -> \(p=0.5\) (the maximum variability).
La Plata suitable sample size with a 5% confidence interval:
\[\begin{equation} n_0 = \frac{1.96^2*0.5*(1-0.5)}{0.05^2} = 384.16 \tag{1.2} \end{equation}\]
Therefore the most suitable sample to collect for Rosario would be 384 building rooftops.
In this analysis two sets of random spatial samples have been drawn from the city of Rosario:
The samples were created using the Vector Research Tools -> Random points inside a polygon in QGIS1. Then the building rooftops were digitized in QGIS using Google Satellite Hybrid2, a Tile Map Service (TMS) layer.
For every sample the rooftop area \(m^2\), the mean global horizontal irradiation \((\frac{kWh}{m^2})\), the usable solar radiation \((kWh)\) and renewable electricity production \((kWh)\) were calculated.
Buildings rooftops area that are equal and under 30 \(m^2\) are defined as 0. Some of the sample points were computed on non built-up areas/roads/parks etc., therefore they were given a 0 to include a density factor in the calculation.
## Simple feature collection with 6 features and 6 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: -6763699 ymin: -3875027 xmax: -6755168 ymax: -3855457
## CRS: +proj=merc +lon_0=0 +k=1 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +units=m +no_defs
## # A tibble: 6 x 7
## id area X_mean usable_sr elec_prod geometry elec_prod_mwh
## <dbl> <dbl> <dbl> <dbl> <dbl> <POLYGON [m]> <dbl>
## 1 1 0.764 1804. 1379. 0 ((-6755170 -3875027, -67~ 0
## 2 2 0.089 1819. 162. 0 ((-6756412 -3858792, -67~ 0
## 3 3 0.094 1809. 170. 0 ((-6763699 -3857453, -67~ 0
## 4 4 0.043 1804. 77.6 0 ((-6758804 -3871917, -67~ 0
## 5 5 0.07 1810. 127. 0 ((-6763218 -3855457, -67~ 0
## 6 6 0.121 1807. 219. 0 ((-6761505 -3863212, -67~ 0
## id area X_mean usable_sr
## Min. : 1.00 Min. : 0.033 Min. :1802 Min. : 60
## 1st Qu.: 25.75 1st Qu.: 0.117 1st Qu.:1806 1st Qu.: 212
## Median : 50.50 Median : 0.239 Median :1808 Median : 431
## Mean : 50.50 Mean : 139.849 Mean :1810 Mean : 253442
## 3rd Qu.: 75.25 3rd Qu.: 66.111 3rd Qu.:1812 3rd Qu.: 119567
## Max. :100.00 Max. :7062.816 Max. :1823 Max. :12808176
## elec_prod geometry elec_prod_mwh
## Min. : 0 POLYGON :100 Min. : 0.00
## 1st Qu.: 0 epsg:NA : 0 1st Qu.: 0.00
## Median : 0 +proj=merc...: 0 Median : 0.00
## Mean : 32615 Mean : 32.62
## 3rd Qu.: 15424 3rd Qu.: 15.42
## Max. :1652255 Max. :1652.25
## Simple feature collection with 6 features and 6 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: -6763874 ymin: -3872301 xmax: -6749869 ymax: -3856997
## CRS: +proj=merc +lon_0=0 +k=1 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +units=m +no_defs
## # A tibble: 6 x 7
## id X_mean area usable_sr elec_prod geometry
## <dbl> <dbl> <dbl> <dbl> <dbl> <POLYGON [m]>
## 1 1 1809. 0.171 309. 0 ((-6749870 -3868427, -6749869 -386842~
## 2 2 1808. 0.301 544. 0 ((-6750369 -3872301, -6750369 -387230~
## 3 3 1804. 99.9 180316. 23261. ((-6760088 -3868114, -6760077 -386811~
## 4 4 1812. 0.038 68.8 0 ((-6762068 -3856998, -6762068 -385699~
## 5 5 1809. 0.096 174. 0 ((-6753258 -3871456, -6753257 -387145~
## 6 6 1808. 0.077 139. 0 ((-6763874 -3858031, -6763874 -385803~
## # ... with 1 more variable: elec_prod_mwh <dbl>
## id X_mean area usable_sr
## Min. : 1.0 Min. :1803 Min. : 0.009 Min. : 16
## 1st Qu.: 75.5 1st Qu.:1806 1st Qu.: 0.056 1st Qu.: 101
## Median :150.0 Median :1809 Median : 0.099 Median : 179
## Mean :150.1 Mean :1810 Mean : 108.857 Mean : 196903
## 3rd Qu.:224.5 3rd Qu.:1813 3rd Qu.: 54.150 3rd Qu.: 98422
## Max. :300.0 Max. :1823 Max. :6030.465 Max. :10880979
## NA's :1
## elec_prod geometry elec_prod_mwh
## Min. : 0 POLYGON :300 Min. : 0.00
## 1st Qu.: 0 epsg:NA : 0 1st Qu.: 0.00
## Median : 0 +proj=merc...: 0 Median : 0.00
## Mean : 25360 Mean : 25.36
## 3rd Qu.: 12696 3rd Qu.: 12.70
## Max. :1403646 Max. :1403.65
##
Calculation of the sample mean is represented by \(\overline{y}\). Calculation of the sample variance is represented by \(s^2\).
## [1] "The mean of sample 1: 32.62 (mWh)"
## [1] "The variance of sample 1: 28445.82 (mWh)"
The following equation calculates the unbiased variance of the estimator \(\overline{y}\):
\[\begin{equation} \hat{var}(\overline{y})= (\frac{N-n}{N})(\frac{s^2}{n}) \tag{1.3} \end{equation}\]
The following equation calculates the estimated standard error of the estimator \(\overline{y}\):
\[\begin{equation} SEM = \sqrt{\hat{var}(\overline{y})} \tag{1.4} \end{equation}\]
## [1] "The variance of the sample mean: 284.38 (mWh)"
## [1] "The estimated standard error of the sample mean: 16.86 (mWh)"
The following equation calculates an unbiased estimator of the population total \(\hat{t}\): \[\begin{equation} \hat{t} = N{\overline{y}} \tag{1.5} \end{equation}\]
The following equation calculates the unbiased variance of the estimator \(\hat{t}\):
\[\begin{equation} \hat{var}(\hat{t})= N^2\hat{var}(\overline{y}) \tag{1.6} \end{equation}\]
The following equation calculates the estimated standard error of the estimator \(\hat{t}\): \[\begin{equation} SET = \sqrt{\hat{var}(\hat{t})} \tag{1.7} \end{equation}\]
## [1] "The estimation of the renewable electricity production potential by all the buildings in the city: 11647736.41 (mWh)"
## [1] "The variance of the estimated total: 36269344306960.9 (mWh)"
## [1] "The estimated standard error of the total: 6022403.53 (mWh)"
## [1] "The 95% confidence interval estimation for sample 1 is: (1648190.85 (mWh), 21647281.97 (mwh))"
Calculation of the sample mean is represented by \(\overline{y}\). Calculation of the sample variance is represented by \(s\).
## [1] "The mean of sample 2: 25.36 (mWh)"
## [1] "The variance of sample 2: 15541.21 (mWh)"
Equation (1.3) calculates the unbiased variance of the estimator \(\overline{y}\).
Equation (1.4) calculates the estimated standard error of the estimator \(\overline{y}\).
## [1] "The variance of the sample mean: 51.76 (mWh)"
## [1] "The estimated standard error of the sample mean: 7.19 (mWh)"
Equation (1.5) calculates an unbiased estimator of the population total \(\hat{t}\).
Equation (1.6) calculates the unbiased variance of the estimator \(\hat{t}\).
Equation (1.7) calculates the estimated standard error of the estimator \(\hat{t}\).
## [1] "The estimation of the renewable electricity production potential by all the buildings in the city: 9056833.33 (mWh)"
## [1] "The variance of the estimated total: 6601484325570.02 (mWh)"
## [1] "The estimated standard error of the total: 2569335.39 (mWh)"
## [1] "The 95% confidence interval estimation for sample 2 is: (4817517.9 (mWh), 13296148.76 (mwh))"
The following calculations are for both sample 1 and sample 2 together, resulting in a total sample of 400 building rooftops. We will call this sample, sample 3.
Calculation of the sample mean is represented by \(\overline{y}\). Calculation of the sample variance is represented by \(s\).
## [1] "The mean of sample 3: 27.17 (mWh)"
## [1] "The variance of sample 3: 18714.05 (mWh)"
Equation (1.3) calculates the unbiased variance of the estimator \(\overline{y}\).
Equation (1.4) calculates the estimated standard error of the estimator \(\overline{y}\).
## [1] "The variance of the sample mean: 46.73 (mWh)"
## [1] "The estimated standard error of the sample mean: 6.84 (mWh)"
Equation (1.5) calculates an unbiased estimator of the population total \(\hat{t}\).
Equation (1.6) calculates the unbiased variance of the estimator \(\hat{t}\).
Equation (1.7) calculates the estimated standard error of the estimator \(\hat{t}\).
## [1] "The estimation of the renewable electricity production potential by all the buildings in the city: 9704559.1 (mWh)"
## [1] "The variance of the estimated total: 5960243624377.49 (mWh)"
## [1] "The estimated standard error of the total: 2441361.02 (mWh)"
## [1] "The 95% confidence interval estimation for sample 3 is: (5679532.27 (mWh), 13729585.93 (mwh))"
Rosario is comprised by 6 districts:
The division of each municipal district in Rosario was framed in a general structuring that contemplated the construction and connection of routes from north to south that allowed to create a new metropolitan axis. The formation of the districts had an important impact on the urban structure, by connecting the agriculture and peripheral districts (north, northwest and west districts) with the city. This enabled opening large avenues, constructing important road junctions, creating public, community and service spaces, and in some districts building new residential homes.
For each district 20 random points were computed and then 20 building rooftops were digitized using the same methods as in chapter 1.
## Simple feature collection with 6 features and 8 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: -6765175 ymin: -3857893 xmax: -6760942 ymax: -3856008
## CRS: +proj=merc +lon_0=0 +k=1 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +units=m +no_defs
## # A tibble: 6 x 9
## id area usable_sr elec_prod X_mean geometry group
## <dbl> <dbl> <dbl> <dbl> <dbl> <POLYGON [m]> <dbl>
## 1 1 2.37 4284. 0 1809. ((-6765175 -3856011, -6765173 -~ 1
## 2 2 2.66 4820. 0 1809. ((-6764898 -3856632, -6764898 -~ 1
## 3 3 6.22 11248. 0 1809. ((-6764207 -3856125, -6764204 -~ 1
## 4 4 7.70 13922. 0 1809. ((-6763204 -3857893, -6763201 -~ 1
## 5 5 76.0 137933. 17793. 1815. ((-6760955 -3857635, -6760945 -~ 1
## 6 6 0.990 1795. 0 1813. ((-6761655 -3856098, -6761654 -~ 1
## # ... with 2 more variables: district <chr>, elec_prod_mwh <dbl>
## id area usable_sr elec_prod
## Min. : 1.00 Min. : 0.040 Min. : 72 Min. : 0
## 1st Qu.: 5.75 1st Qu.: 0.909 1st Qu.: 1644 1st Qu.: 0
## Median :10.50 Median : 8.923 Median : 17760 Median : 0
## Mean :10.50 Mean : 525.832 Mean : 955700 Mean : 122105
## 3rd Qu.:15.25 3rd Qu.: 156.555 3rd Qu.: 293250 3rd Qu.: 36554
## Max. :20.00 Max. :27683.743 Max. :50023871 Max. :6453079
## X_mean geometry group district
## Min. :1802 POLYGON :120 Min. :1.0 Length:120
## 1st Qu.:1806 epsg:NA : 0 1st Qu.:2.0 Class :character
## Median :1808 +proj=merc...: 0 Median :3.5 Mode :character
## Mean :1809 Mean :3.5
## 3rd Qu.:1812 3rd Qu.:5.0
## Max. :1822 Max. :6.0
## elec_prod_mwh
## Min. : 0.00
## 1st Qu.: 0.00
## Median : 0.00
## Mean : 122.11
## 3rd Qu.: 36.55
## Max. :6453.08
After stratification, the strata are combined to one sample and the computations are the same as for random sampling.
## district elec_prod_mwh
## 1 Distrito Centro 299.390164
## 2 Distrito Noroeste 8.515353
## 3 Distrito Norte 17.158441
## 4 Distrito Oesto 24.763423
## 5 Distrito Sudoeste 353.656280
## 6 Distrito Sur 29.146892
## [1] "The mean of the stratfied sample is: 122.11 (mWh)"
## [1] "The variance of the stratfied sample is: 436446.63 (mWh)"
Equation (1.3) calculates the unbiased variance of the estimator \(\overline{y}\).
Equation (1.4) calculates the estimated standard error of the estimator \(\overline{y}\).
## [1] "The variance of the sample mean: 3635.83 (mWh)"
## [1] "The estimated standard error of the sample mean: 60.3 (mWh)"
Equation (1.5) calculates an unbiased estimator of the population total \(\hat{t}\).
Equation (1.6) calculates the unbiased variance of the estimator \(\hat{t}\).
Equation (1.7) calculates the estimated standard error of the estimator \(\hat{t}\).
## [1] "The estimation of the renewable electricity production potential by all the buildings in the city: 43606903.1 (mWh)"
## [1] "The variance of the estimated total: 463710453937805 (mWh)"
## [1] "The estimated standard error of the total: 21533937.26 (mWh)"
## [1] "The 95% confidence interval estimation for the stratified sample is: (7908818.67 (mWh), 79304987.54 (mwh))"
Sample | Mean | Var of Mean | SEM | Total Estimation | Var of Total | SET | Lower CI | Upper CI |
---|---|---|---|---|---|---|---|---|
Sample 1 - 100 | 32.62 | 284.38 | 16.86 | 11,647,736 | 36,269,344,306,961 | 6,022,404 | 1,648,191 | 21,647,282 |
Sample 2 - 300 | 25.36 | 51.76 | 7.19 | 9,056,833 | 6,601,484,325,570 | 2,569,335 | 4,817,518 | 13,296,149 |
Both Samples - 400 | 27.17 | 46.73 | 6.84 | 9,704,559 | 5,960,243,624,377 | 2,441,361 | 5,679,532 | 13,729,586 |
Stratified Sample - 120 | 122.11 | 3,635.83 | 60.30 | 43,606,903 | 463,710,453,937,805 | 21,533,937 | 7,908,819 | 79,304,988 |