Chapter 21 Other quantitative methods

21.1 Activity-Based Costing (ABC)

Usually seen in the context of management accounting, ABC is a method that measures the cost and volume of inputs required to produce a fixed amount of output.

Wikipedia page

21.1.1 Theory and methods

Activity-Based Costing, at Inc.com

Robert S. Kaplan and Steven R. Anderson, Time-Driven Activity-Based Costing (November 2003). Available at SSRN: https://ssrn.com/abstract=485443 or http://dx.doi.org/10.2139/ssrn.485443

https://hbr.org/2004/11/time-driven-activity-based-costing

Robert S. Kaplan and Steven R. Anderson, Rethinking Activity-Based Costing, 2005-01-24

Fariborz Y.Partovi, An analytic hierarchy approach to activity-based costing, International Journal of Production Economics, 1991, 151-161

21.1.2 R

Ryan K McBain, et al., “Activity-based costing of health-care delivery, Haiti”, Bulletin of the World Health Organization, 2018; 96:10-17.

Shiny app

21.2 Ecological inference

Ecological inference is a method for inferring individual behavior from group-level data.

21.2.1 Theory and methods

Gary King, Ecological Inference – topic page by a leader in the field, with links to assorted research and methodology papers.

Gary King, 1997, A Solution to the Ecological Inference Problem: Reconstructing Individual Behavior from Aggregate Data ; part 1 {PDF}

Michael Stoto “Ecological Inference in Public Health”, book review of King, Ecological Inference

21.2.2 R

Arranged by package

21.2.2.1 `{ei}`

package

CRAN page: ei: Ecological Inference

articles

Gary King and Margaret Roberts, EI: A(n R) Program for Ecological Inference – website with assorted resources

21.3 Gini coefficient

From the wikipedia entry:

The Gini coefficient (also known as the Gini index or Gini ratio) is a measure of statistical dispersion intended to represent the income distribution of a nation’s residents, and is the most commonly used measure of inequality. It was developed by the Italian statistician and sociologist Corrado Gini and published in his 1912 paper “Variability and Mutability” (Italian: Variabilità e mutabilità).

The Gini coefficient measures the inequality among values of a frequency distribution (for example, levels of income). A Gini coefficient of zero expresses perfect equality, where all values are the same (for example, where everyone has the same income). A Gini coefficient of 1 (or 100%) expresses maximal inequality among values (e.g., for a large number of people, where only one person has all the income or consumption, and all others have none, the Gini coefficient will be very nearly one).

Gini coefficient: wikipedia entry, 2016-05-07

21.3.0.1 Further reading

Lamb, Evelyn (2012-11-12) “Ask Gini: How to Measure Inequality”, Scientific American “The Sciences”.

World Bank (date unknown) “Measuring Inequality”

21.3.1 R

21.3.1.1 `{ineq}`

package

CRAN: ineq: Measuring Inequality, Concentration, and Poverty

21.4 Imputation of missing data (or missing values)

Missing data can pose a challenge for a data analysis, and can limit or compromise the models and conclusions that can be drawn.

One method of dealing with missing data is through imputation.

21.4.1 Theory and methods

Missing data – wikipedia

Allison, P. (2000). Multiple Imputation for Missing Data: A Cautionary Tale, Sociological Methods and Research, 28, 301-309. (Preprint)

Fichman, Mark and Jonathon N. Cummings (2003) “Multiple Imputation for Missing Data: Making the most of What you Know”, Organizational Research Methods, Volume: 6 issue: 3, page(s): 282-308.

Gelman, Andrew and Jennifer Hill (2006) Data Analysis Using Regression and Multilevel/Hierarchical Models, Cambridge University Press. * “Chapter 25: Missing Data Imputation”

Gelman, Andrew, et al. (2014) Bayesian Data Analysis, (3rd edition). (see chapter 18, “Models for missing data”, pp.449-467)

Karen Grace-Martin (2016?) “Two Recommended Solutions for Missing Data: Multiple Imputation and Maximum Likelihood”

Karen Grace-Martin, “Two Recommended Solutions for Missing Data: Multiple Imputation and Maximum Likelihood”

Neil J Perkins, Stephan R Cole, et al. (2017) “Principled Approaches to Missing Data in Epidemiologic Studies”, American Journal of Epidemiology

Karen, The Analysis Factor, Multiple Imputation of Categorical Variables

Jeff Meyer, The Analysis Factor, Multiple Imputation for Missing Data: Indicator Variables versus Categorical Variables

21.4.2 R

Robert I. Kabacoff, (2011) [R in Action: Data analysis and graphics with R], Manning. (see chapter 15, “Advanced methods for missing data”, pp.352-372)

Joseph Rickert, “Missing Values, Data Science and R”, 2016-11-30

Thomas Leeper, Multiple imputation {tutorial for Amelia, mi, and mice}

“Tutorial on 5 Powerful R Packages used for imputing missing values” {MICE, Amelia, missForest, Hmisc, mi}

21.4.2.1 `{Amelia}`

package

CRAN page: Amelia: A Program for Missing Data

vignette: Amelia II: A Package for Missing Data {PDF version}

description: Amelia II: A Program for Missing Data

github page for Amelia II

21.4.2.2 `{BaBooN}`

CRAN page: BaBooN: Bayesian Bootstrap Predictive Mean Matching - Multiple and Single Imputation for Discrete Data

21.4.2.3 `{Hmisc}`

package

CRAN page: Hmisc: Harrell Miscellaneous

21.4.2.4 `{mi}`

package

CRAN page: mi: Missing Data Imputation and Model Checking

articles

Su, Gelman, Hill and Yajima (2011) Multiple Imputation with Diagnostics (mi) in R: Opening Windows into the Black Box, Journal of Statistical Software, vol. 45.

Ben Goodrich and Jonathan Kropko, 2014-06-16, “An Example of mi Usage”

21.4.2.5 `{mice}`

package

CRAN page: mice: Multivariate Imputation by Chained Equations

articles

Stef van Buuren & Karin Groothuis-Oudshoorn, 2011-12-12, “mice: Multivariate Imputation by Chained Equations in R”, Journal of Statistical Software, Vol 45, Issue 3.

Michy Alice, “Imputing missing data with R; MICE package”

original source

datascience+, 2015-10-04 and updated 2017-04-28, Imputing Missing Data with R; MICE package

21.4.2.6 `{missMDA}`

package

CRAN page: missMDA: Handling Missing Values with Multivariate Data Analysis

articles

francoishusson, 2017-08-15, Multiple imputation for continuous and categorical data

21.4.2.7 `{missForest}`

package

CRAN page: missForest: Nonparametric Missing Value Imputation using Random Forest

21.4.2.8 `{NPBayesImpute}`

CRAN page: NPBayesImpute: Non-Parametric Bayesian Multiple Imputation for Categorical Data

21.4.2.9 `{VIM}`

package

CRAN page: VIM: Visualization and Imputation of Missing Values

articles

Alexander Kowarik, Matthias Templ (2016) “Imputation with the R Package VIM”, Journal of Statistical Software, vol. 74.

https://www.jstatsoft.org/article/view/v074i07

21.5 Moving Window (for raster data)

21.5.1 `{grainchanger}`

“The grainchanger package provides functionality for data aggregation to a grid via moving-window or direct methods.”

21.6 Multivariate Analysis

(Not to be confused with multi_variable analysis)

21.6.1 `{explor}`

GitHub page – “an R package to allow interactive exploration of multivariate analysis results.”

Covers Principal Component Analysis, Correspondence Analysis, Multiple Correspondence Analysis, among other methods.

21.7 Principal Component Analysis (PCA)

New Video!
PCA (Principal Component Analysis), enjoy and share if you like it!https://t.co/9jvOIE4xAh
— Luis G. Serrano (/@/luis_likes_math) February 10, 2019

21.8 Random walk

From wikipedia entry on random walk:

A random walk is a mathematical object, known as a stochastic or random process, that describes a path that consists of a succession of random steps on some mathematical space such as the integers.

21.8.1 Theory and methods

Karl Pearson (1905). “The Problem of the Random Walk”. Nature. 72 (1865): 294.

** The Problem of the Random Walk **

Can any of your readers refer me to a work wherein I should find a solution of the following problem, or failing the knowledge of any existing solution provide me with an original one? I should be extremely grateful for aid in the matter.

A man starts from a point O and walks l yards in a straight line; he then turns at any angle whatever and walks another l yards in a second straight line. he repeats this process n times. I require the probability that after these n stretches he is at a distance between r and r + delta-r from his starting point, O.

The problem is one of considerable interest, but I have only succeeded in obtaining an integrated solution for two stretches. I think, however, that a solution ought to be found, if only in the form of a series in powers of 1/n, where n is large.

Karl Pearson

The Gables, East Ilsley, Berks.

21.8.2 R

Zhijun Yang, “Brownian Motion Simulation Project in R”

21.9 Raking

Also known as iterative proportional fitting procedure, or IPFP; uses include weighting survey responses to accurately match the population proportions)

Includes post-stratification weights in surveying.

21.9.1 Theory and methods

The primary method of raking is iterative proportional fitting, or IPF

IPF resources

LCDR Lew Anderson and Dr. Ronald D. Fricker, Jr. “Raking: An Important and Often Overlooked Survey Analysis Tool” {PDF}

Michael P. Battaglia, David Izrael, David C. Hoaglin, and Martin R. Frankel, “Tips and Tricks for Raking Survey Data (a.k.a. Sample Balancing)” {PDF}

Andrew Gelman, Tracking public opinion with biased polls, Washington Post, 2014-04-09.

Eddie Hunsinger, “Iterative Proportional Fitting For A Two-Dimensional Table”, May 2008

Sven Kurras, “Symmetric Iterative Proportional Fitting”, Appearing in Proceedings of the 18th International Conference on Artificial Intelligence and Statistics (AISTATS) 2015, San Diego, CA, USA. JMLR: W&CP volume 38.

Robin Lovelace, “Population synthesis with R”, from Spatial Microsimulation with R