Chapter 21 Other quantitative methods
21.1 Activity-Based Costing (ABC)
Usually seen in the context of management accounting, ABC is a method that measures the cost and volume of inputs required to produce a fixed amount of output.
21.1.1 Theory and methods
Activity-Based Costing, at Inc.com
Robert S. Kaplan and Steven R. Anderson, Time-Driven Activity-Based Costing (November 2003). Available at SSRN: https://ssrn.com/abstract=485443 or http://dx.doi.org/10.2139/ssrn.485443
Robert S. Kaplan and Steven R. Anderson, Rethinking Activity-Based Costing, 2005-01-24
Fariborz Y.Partovi, An analytic hierarchy approach to activity-based costing, International Journal of Production Economics, 1991, 151-161
21.1.2 R
Ryan K McBain, et al., “Activity-based costing of health-care delivery, Haiti”, Bulletin of the World Health Organization, 2018; 96:10-17.
21.2 Ecological inference
Ecological inference is a method for inferring individual behavior from group-level data.
21.2.1 Theory and methods
Gary King, Ecological Inference – topic page by a leader in the field, with links to assorted research and methodology papers.
- Gary King, 1997, A Solution to the Ecological Inference Problem: Reconstructing Individual Behavior from Aggregate Data ; part 1 {PDF}
Michael Stoto “Ecological Inference in Public Health”, book review of King, Ecological Inference
21.2.2 R
Arranged by package
21.2.2.1 {ei}
package
CRAN page: ei: Ecological Inference
articles
Gary King and Margaret Roberts, EI: A(n R) Program for Ecological Inference – website with assorted resources
21.3 Gini coefficient
From the wikipedia entry:
The Gini coefficient (also known as the Gini index or Gini ratio) is a measure of statistical dispersion intended to represent the income distribution of a nation’s residents, and is the most commonly used measure of inequality. It was developed by the Italian statistician and sociologist Corrado Gini and published in his 1912 paper “Variability and Mutability” (Italian: Variabilità e mutabilità).
The Gini coefficient measures the inequality among values of a frequency distribution (for example, levels of income). A Gini coefficient of zero expresses perfect equality, where all values are the same (for example, where everyone has the same income). A Gini coefficient of 1 (or 100%) expresses maximal inequality among values (e.g., for a large number of people, where only one person has all the income or consumption, and all others have none, the Gini coefficient will be very nearly one).
Gini coefficient: wikipedia entry, 2016-05-07
21.3.0.1 Further reading
Lamb, Evelyn (2012-11-12) “Ask Gini: How to Measure Inequality”, Scientific American “The Sciences”.
World Bank (date unknown) “Measuring Inequality”
21.4 Imputation of missing data (or missing values)
Missing data can pose a challenge for a data analysis, and can limit or compromise the models and conclusions that can be drawn.
One method of dealing with missing data is through imputation.
21.4.1 Theory and methods
Missing data – wikipedia
Allison, P. (2000). Multiple Imputation for Missing Data: A Cautionary Tale, Sociological Methods and Research, 28, 301-309. (Preprint)
Fichman, Mark and Jonathon N. Cummings (2003) “Multiple Imputation for Missing Data: Making the most of What you Know”, Organizational Research Methods, Volume: 6 issue: 3, page(s): 282-308.
Gelman, Andrew and Jennifer Hill (2006) Data Analysis Using Regression and Multilevel/Hierarchical Models, Cambridge University Press. * “Chapter 25: Missing Data Imputation”
Gelman, Andrew, et al. (2014) Bayesian Data Analysis, (3rd edition). (see chapter 18, “Models for missing data”, pp.449-467)
Karen Grace-Martin (2016?) “Two Recommended Solutions for Missing Data: Multiple Imputation and Maximum Likelihood”
Karen Grace-Martin, “Two Recommended Solutions for Missing Data: Multiple Imputation and Maximum Likelihood”
Neil J Perkins, Stephan R Cole, et al. (2017) “Principled Approaches to Missing Data in Epidemiologic Studies”, American Journal of Epidemiology
Karen, The Analysis Factor, Multiple Imputation of Categorical Variables
Jeff Meyer, The Analysis Factor, Multiple Imputation for Missing Data: Indicator Variables versus Categorical Variables
21.4.2 R
Robert I. Kabacoff, (2011) [R in Action: Data analysis and graphics with R], Manning. (see chapter 15, “Advanced methods for missing data”, pp.352-372)
Joseph Rickert, “Missing Values, Data Science and R”, 2016-11-30
Thomas Leeper, Multiple imputation {tutorial for Amelia
, mi
, and mice
}
“Tutorial on 5 Powerful R Packages used for imputing missing values” {MICE
, Amelia
, missForest
, Hmisc
, mi
}
21.4.2.1 {Amelia}
package
CRAN page: Amelia: A Program for Missing Data
vignette: Amelia II: A Package for Missing Data {PDF version}
description: Amelia II: A Program for Missing Data
github page for Amelia II
21.4.2.4 {mi}
package
CRAN page: mi: Missing Data Imputation and Model Checking
articles
Su, Gelman, Hill and Yajima (2011) Multiple Imputation with Diagnostics (mi) in R: Opening Windows into the Black Box, Journal of Statistical Software, vol. 45.
Ben Goodrich and Jonathan Kropko, 2014-06-16, “An Example of mi Usage”
21.4.2.5 {mice}
package
CRAN page: mice: Multivariate Imputation by Chained Equations
see also
package miceadds
on CRAN: miceadds: Some Additional Multiple Imputation Functions, Especially for ‘mice’
articles
Stef van Buuren & Karin Groothuis-Oudshoorn, 2011-12-12, “mice: Multivariate Imputation by Chained Equations in R”, Journal of Statistical Software, Vol 45, Issue 3.
Michy Alice, “Imputing missing data with R; MICE package”
datascience+, 2015-10-04 and updated 2017-04-28, Imputing Missing Data with R; MICE package
21.4.2.6 {missMDA}
package
CRAN page: missMDA: Handling Missing Values with Multivariate Data Analysis
articles
francoishusson, 2017-08-15, Multiple imputation for continuous and categorical data
21.4.2.7 {missForest}
package
CRAN page: missForest: Nonparametric Missing Value Imputation using Random Forest
21.4.2.8 {NPBayesImpute}
CRAN page: NPBayesImpute: Non-Parametric Bayesian Multiple Imputation for Categorical Data
21.4.2.9 {VIM}
package
CRAN page: VIM: Visualization and Imputation of Missing Values
articles
Alexander Kowarik, Matthias Templ (2016) “Imputation with the R Package VIM”, Journal of Statistical Software, vol. 74.
https://www.jstatsoft.org/article/view/v074i07
21.5 Moving Window (for raster data)
21.6 Multivariate Analysis
(Not to be confused with multi_variable analysis)
21.6.1 {explor}
GitHub page – “an R package to allow interactive exploration of multivariate analysis results.”
- Covers Principal Component Analysis, Correspondence Analysis, Multiple Correspondence Analysis, among other methods.
21.7 Principal Component Analysis (PCA)
New Video!
— Luis G. Serrano (/@/luis_likes_math) February 10, 2019
PCA (Principal Component Analysis), enjoy and share if you like it!https://t.co/9jvOIE4xAh
21.8 Random walk
From wikipedia entry on random walk:
A random walk is a mathematical object, known as a stochastic or random process, that describes a path that consists of a succession of random steps on some mathematical space such as the integers.
21.8.1 Theory and methods
Karl Pearson (1905). “The Problem of the Random Walk”. Nature. 72 (1865): 294.
** The Problem of the Random Walk **
Can any of your readers refer me to a work wherein I should find a solution of the following problem, or failing the knowledge of any existing solution provide me with an original one? I should be extremely grateful for aid in the matter.
A man starts from a point O and walks l yards in a straight line; he then turns at any angle whatever and walks another l yards in a second straight line. he repeats this process n times. I require the probability that after these n stretches he is at a distance between r and r + delta-r from his starting point, O.
The problem is one of considerable interest, but I have only succeeded in obtaining an integrated solution for two stretches. I think, however, that a solution ought to be found, if only in the form of a series in powers of 1/n, where n is large.
Karl Pearson
The Gables, East Ilsley, Berks.
21.9 Raking
Also known as iterative proportional fitting procedure, or IPFP; uses include weighting survey responses to accurately match the population proportions)
Includes post-stratification weights in surveying.
21.9.1 Theory and methods
The primary method of raking is iterative proportional fitting, or IPF
LCDR Lew Anderson and Dr. Ronald D. Fricker, Jr. “Raking: An Important and Often Overlooked Survey Analysis Tool” {PDF}
Michael P. Battaglia, David Izrael, David C. Hoaglin, and Martin R. Frankel, “Tips and Tricks for Raking Survey Data (a.k.a. Sample Balancing)” {PDF}
Andrew Gelman, Tracking public opinion with biased polls, Washington Post, 2014-04-09.
Eddie Hunsinger, “Iterative Proportional Fitting For A Two-Dimensional Table”, May 2008
Sven Kurras, “Symmetric Iterative Proportional Fitting”, Appearing in Proceedings of the 18th International Conference on Artificial Intelligence and Statistics (AISTATS) 2015, San Diego, CA, USA. JMLR: W&CP volume 38.
Robin Lovelace, “Population synthesis with R”, from Spatial Microsimulation with R
21.9.2 R
DIY Solution
- Christopher Waldhauser (2014-04-13) Survey: Computing Your Own Post-Stratification Weights in R (at R-Bloggers)
21.9.2.1 {anesrake}
package
CRAN page: anesrake: ANES Raking Implementation
articles
Josh Pasek (2010-03-15) “ANES Weighting Algorithm: A Description” {PDF}
Josh Pasek, Matthew DeBell, Jon A. Krosnick (2014-07-26) “Standardizing!and!Democratizing!Survey!Weights: The ANES Weighting System and anesrake” {PDF}
21.9.2.2 {ipfp}
package
CRAN page: ipfp: Fast Implementation of the Iterative Proportional Fitting Procedure in C
github page: awblocker/ipfp
articles
21.9.2.3 {survey}
package
CRAN page: survey: analysis of complex survey samples
homepage: Survey analysis in R
articles
Lumley, Thomas (2010) Complex Surveys: A Guide to Analysis Using R, John Wiley & Sons, Inc.
21.9.2.4 rake() function in {survey}
articles
1/2 Social Science Goes R: Weighted Survey Data
2/2 Survey: Computing Your Own Post-Stratification Weights in R
21.10 Structural equation modeling (SEM)
21.10.1 R
Arranged by package
21.10.1.1 {lavaan}
package
CRAN page: lavaan: Latent Variable Analysis
articles
Yves Rosseel, 2012-05-24, “lavaan: An R Package for Structural Equation Modeling”, Journal of Statistical Software, Vol. 48, Issue 2.
Grace Charles, 2015-05-20, First Steps with Structural Equation Modeling – blog post by Noam Ross, re: Charles’ presention at Davis R Users’ Group.
21.10.1.2 {sem}
package
CRAN page: sem: Structural Equation Models
articles
Jeremy Albright, 2015-02-26, “Structural Equation Models Using the SEM Package in R”
John Fox, “Structural Equation Modeling With the sem
Package in R” {PDF}
“Structural Equation Modeling in R”
-30-