Chapter 4 SOEP-IS

4.1 Analysis

The original publication has three tables where SOEP-IS 2012 is used:

Table 15 (Appendix): Descriptive statistics of items and dimensions of the short BSJO scale (SOEP-IS 2012 panel).
Table 7 (Article): Factor structure of the justice principles on the basis of the short version of the BSJO scale
Table A6 (Supplementary materials): Reference values of the short BSJO scale by socio-demographic characteristics

The following analysis are focused on the first two tables (Table 15 and Table 7), because they contain the main resutls regarding this data source and the table from supplementary materials should not matter as long as factor loadings in Table 7 are correct.

4.1.1 Table 15

The descriptive statistics of the eight items displayed in Table 15 of the original article are reproduced from the article’s website in Table 4.1.

4.1.1.1 Original Table 15

url <- "https://link.springer.com/article/10.1007/s11205-017-1580-x"
tab.15 <- url %>% 
        read_html() %>% 
        html_nodes(xpath = '//*[@id="Tab15"]/div[2]/table') %>% 
        html_table(fill = T)


tab.15.df <- as.data.frame(tab.15)
tab.15.soep <- tab.15.df[,c(1,2, grep("SOEP", names(tab.15.df)))]
tab.15.header <- c("Dimension", "Item", as.character(tab.15.soep[1,3:6]))
tab.15.soep <- tab.15.soep[2:13,c(1,2:6)]
names(tab.15.soep) <- tab.15.header

knitr::kable(tab.15.soep, "html", row.names = F,
             caption = "Published SOEP-IS descriptive statistics") %>% 
        kable_styling() %>% 
        footnote(general = "M = mean; SD = standard deviation; Min = minimum; Max = maximum; listwise deletion in all cases; all respondents; SOEP-IS 2012: scale ranging from (1) ‘do not agree at all’ to (7) ‘agree completely,’ N = 1557")

Table 4.1: Published SOEP-IS descriptive statistics
Dimension	Item	M	SD	Min	Max
Equality	C	4.60	2.03	1	7
Equality	K	2.91	1.91	1	7
Equality	Subscale	3.75	1.60	1	7
Need	E	6.20	1.17	1	7
Need	J	6.29	1.04	1	7
Need	Subscale	6.24	0.91	1	7
Equity	B	6.20	1.20	1	7
Equity	I	5.21	1.72	1	7
Equity	Subscale	5.70	1.13	1	7
Entitlement	D	2.06	1.48	1	7
Entitlement	L	2.87	1.75	1	7
Entitlement	Subscale	2.47	1.31	1	7
Note:
M = mean; SD = standard deviation; Min = minimum; Max = maximum; listwise deletion in all cases; all respondents; SOEP-IS 2012: scale ranging from (1) ‘do not agree at all’ to (7) ‘agree completely,’ N = 1557

4.1.1.2 Reproduced Table 15

Using the dataset the same table is reproduced in Table 4.2:

bsjo.soep.sb <- with(bsjo.soep, 
                     data.frame(C, K, CK = (C+K)/2, E, J, EJ = (E+J)/2,
                                B, I, BI = (B+I)/2, D, L, DL = (D+L)/2))

mean.soep <- round(colMeans(bsjo.soep.sb),2)
sd.soep   <- round(apply(bsjo.soep.sb, 2, sd),2)
min.bsjo  <- apply(bsjo.soep.sb, 2, min)
max.bsjo  <- apply(bsjo.soep.sb, 2, max)

tab.15.rep <- data.frame(Dimension = tab.15.soep$Dimension, Item = tab.15.soep$Item, 
                         M = mean.soep, SD = sd.soep, Min = min.bsjo, Max = max.bsjo)

knitr::kable(tab.15.rep, "html", row.names = F, caption = "Reproduced SOEP-IS descriptive statistics") %>% 
        kable_styling() %>% 
        footnote(general = paste0("M = mean; SD = standard deviation; Min = minimum; Max = maximum; listwise deletion in all cases; all respondents; SOEP-IS 2012: scale ranging from (1) ‘do not agree at all’ to (7) ‘agree completely,’ N =", nrow(bsjo.soep)))

Table 4.2: Reproduced SOEP-IS descriptive statistics
Dimension	Item	M	SD	Min	Max
Equality	C	4.60	2.03	1	7
Equality	K	2.91	1.91	1	7
Equality	Subscale	3.75	1.60	1	7
Need	E	6.20	1.17	1	7
Need	J	6.29	1.04	1	7
Need	Subscale	6.24	0.91	1	7
Equity	B	6.20	1.20	1	7
Equity	I	5.21	1.72	1	7
Equity	Subscale	5.70	1.13	1	7
Entitlement	D	2.06	1.48	1	7
Entitlement	L	2.87	1.75	1	7
Entitlement	Subscale	2.47	1.31	1	7
Note:
M = mean; SD = standard deviation; Min = minimum; Max = maximum; listwise deletion in all cases; all respondents; SOEP-IS 2012: scale ranging from (1) ‘do not agree at all’ to (7) ‘agree completely,’ N =1557

4.1.1.3 Checks

The number of observations in both analysis are equal (n = 1557). Then we check if means and standard deviations from both tables are equal.

identical(as.numeric(tab.15.soep$M), tab.15.rep$M)

## [1] TRUE

identical(as.numeric(tab.15.soep$SD), tab.15.rep$SD)

## [1] TRUE

4.1.2 Table 7

In Table 7 of the original article the factor loadings from the principal component analysis (PCA) with ALLBUS-2014 are displayed. The sample for this analysis was reduced to full-time and part-time employees (n = 742), as described in the footnote of the original table.

4.1.2.1 Original Table 7

Given that Table 7 can only be directly downloaded while having access to the full article, the original table is reproduced in a comma separated values file (tab7_original.csv) and imported and displayed in Table 4.3.

tab7 <- read.csv2(paste0(data_path, "tab7_original.csv"))

knitr::kable(tab7, "html", row.names = F, caption = "Published SOEP-IS PCA results") %>% 
  kable_styling() %>% 
  footnote(general = "Factor loadings of the principal component analysis; rotation method: varimax; eigenvalues: at least 0.95; total explained variance: 67.2 percent; data: SOEP-IS 2012; full-time and part-time employees; listwise deletion; N = 742")

Table 4.3: Published SOEP-IS PCA results
Item	Equality	Need	Equity	Entitlement	Uniqueness
C	0.690	0.190	0.297	-0.251	0.336
K	0.794	-0.010	-0.106	0.153	0.334
E	0.137	0.780	0.017	-0.133	0.356
J	-0.001	0.773	-0.021	-0.074	0.396
B	-0.321	0.468	0.508	0.166	0.392
I	0.074	-0.035	0.889	-0.016	0.203
D	0.417	-0.123	-0.005	0.670	0.362
L	-0.130	-0.061	0.013	0.854	0.250
Note:
Factor loadings of the principal component analysis; rotation method: varimax; eigenvalues: at least 0.95; total explained variance: 67.2 percent; data: SOEP-IS 2012; full-time and part-time employees; listwise deletion; N = 742

4.1.2.2 Reproduced Table 7

The same table is reproduced directly from the dataset in Table 4.4.

soep.is.bio <- read_dta(paste0(data_path, "bio.dta"))
soep.is.es <- soep.is.bio %>% 
                        select(pid, l0879) %>% # Employment status
                        right_join(y = soep.is, by = "pid") %>% 
                        filter(l0879 == 1 | l0879 == 2)

bsjo.soep.rd <- data.frame(C = soep.is.es$sim0403, K = soep.is.es$sim0411, 
                        E = soep.is.es$sim0405, J = soep.is.es$sim0410,
                        B = soep.is.es$sim0402, I = soep.is.es$sim0409,
                        D = soep.is.es$sim0404, L = soep.is.es$sim0412)

bsjo.soep.rd <- sapply(bsjo.soep.rd, function(x) ifelse(x >= 1 & x <= 7, x, NA))
bsjo.soep.rd <- as.data.frame((na.omit(bsjo.soep.rd)))

pca.soep <- principal(bsjo.soep.rd, rotate="varimax", nfactors=4, 
                                normalize=FALSE,eps=1e-14)
pca.soep <- fa.organize(pca.soep, c(2,1,4,3))

tab7.rep <- round(unclass(pca.soep$loadings),3)
tab7.rep <- cbind(tab7.rep, as.numeric(1-round(pca.soep$communality,3)))
colnames(tab7.rep) <- c("Factor 1", "Factor 2", "Factor 4", "Factor 3", "Factor 1")

knitr::kable(tab7.rep, "html", align = "r", caption = "Reproduced SOEP-IS PCA results") %>% 
    kable_styling() %>% 
    add_header_above(header = c("Item", as.character(unique(tab.15.rep$Dimension)), "Uniqueness"),
                     align = "r") %>% 
   footnote(general = paste0("Factor loadings of the principal component analysis; rotation method: varimax; eigenvalues: at least 0.95; total explained variance: ", 100*round(sum(pca.soep$Vaccounted[2,]),3), " percent; data: SOEP-IS 2012; full-time and part-time employees; listwise deletion; N =", 
                             nrow(bsjo.soep.rd)))

Table 4.4: Reproduced SOEP-IS PCA results
Item	Equality	Need	Equity	Entitlement	Uniqueness
	Factor 1	Factor 2	Factor 4	Factor 3	Factor 1
C	0.690	0.190	0.297	-0.251	0.336
K	0.794	-0.010	-0.106	0.154	0.334
E	0.137	0.780	0.017	-0.133	0.356
J	-0.001	0.773	-0.021	-0.074	0.396
B	-0.321	0.468	0.508	0.166	0.392
I	0.074	-0.035	0.889	-0.016	0.203
D	0.417	-0.123	-0.005	0.670	0.362
L	-0.130	-0.061	0.013	0.854	0.250
Note:
Factor loadings of the principal component analysis; rotation method: varimax; eigenvalues: at least 0.95; total explained variance: 67.2 percent; data: SOEP-IS 2012; full-time and part-time employees; listwise deletion; N =742

4.1.2.3 Checks

The number of observations in both analysis are equal (n = 742). Then we check if factor loadings in both tables are equal

tab7.ori <- as.matrix(tab7[,2:6])
identical(as.numeric(tab7.ori[,1]), as.numeric(tab7.rep[,1])) # Equality

## [1] TRUE

identical(as.numeric(tab7.ori[,2]), as.numeric(tab7.rep[,2])) # Need

## [1] TRUE

identical(as.numeric(tab7.ori[,3]), as.numeric(tab7.rep[,3])) # Equity

## [1] TRUE

identical(as.numeric(tab7.ori[,4]), as.numeric(tab7.rep[,4])) # Entitlement

## [1] FALSE

All factor loadings are the same except for item K in factor “Entitlement”, which presents a minor difference of 0.001 with the original table. Furthermore, it can be seen in the footnotes of both tables that the cumulative variance in the four factors is equal in both cases (67.2 per cent).

4.2 Conclusion

This brief reproduction analysis shows that results from the original publication with SOEP-IS 2012 can be exactly reproduced, with a minor difference of 0.001 in the loading of item K in the factor measuring entitlement.