Chapter 20 Other quantitative methods

20.1 Activity-Based Costing (ABC)

Usually seen in the context of management accounting, ABC is a method that measures the cost and volume of inputs required to produce a fixed amount of output.

20.1.1 Theory and methods

Activity-Based Costing, at Inc.com

Robert S. Kaplan and Steven R. Anderson, Time-Driven Activity-Based Costing (November 2003). Available at SSRN: https://ssrn.com/abstract=485443 or http://dx.doi.org/10.2139/ssrn.485443

Robert S. Kaplan and Steven R. Anderson, Rethinking Activity-Based Costing, 2005-01-24

Fariborz Y.Partovi, An analytic hierarchy approach to activity-based costing, International Journal of Production Economics, 1991, 151-161

20.1.2 R

Ryan K McBain, et al., “Activity-based costing of health-care delivery, Haiti”, Bulletin of the World Health Organization, 2018; 96:10-17.


20.2 Ecological inference

Ecological inference is a method for inferring individual behavior from group-level data.

20.2.1 Theory and methods

Gary King, Ecological Inference – topic page by a leader in the field, with links to assorted research and methodology papers.

Michael Stoto “Ecological Inference in Public Health”, book review of King, Ecological Inference

20.2.2 R

Arranged by package

20.2.2.1 {ei}

package

CRAN page: ei: Ecological Inference

articles

Gary King and Margaret Roberts, EI: A(n R) Program for Ecological Inference – website with assorted resources


20.3 Gini coefficient

From the wikipedia entry:

The Gini coefficient (also known as the Gini index or Gini ratio) is a measure of statistical dispersion intended to represent the income distribution of a nation’s residents, and is the most commonly used measure of inequality. It was developed by the Italian statistician and sociologist Corrado Gini and published in his 1912 paper “Variability and Mutability” (Italian: Variabilità e mutabilità).

The Gini coefficient measures the inequality among values of a frequency distribution (for example, levels of income). A Gini coefficient of zero expresses perfect equality, where all values are the same (for example, where everyone has the same income). A Gini coefficient of 1 (or 100%) expresses maximal inequality among values (e.g., for a large number of people, where only one person has all the income or consumption, and all others have none, the Gini coefficient will be very nearly one).

Gini coefficient: wikipedia entry, 2016-05-07

20.3.0.1 Further reading

Lamb, Evelyn (2012-11-12) “Ask Gini: How to Measure Inequality”, Scientific American “The Sciences”.

World Bank (date unknown) “Measuring Inequality”

20.3.1 R

20.4 Imputation of missing data (or missing values)

Missing data can pose a challenge for a data analysis, and can limit or compromise the models and conclusions that can be drawn.

One method of dealing with missing data is through imputation.

20.4.1 Theory and methods

Missing data – wikipedia

Allison, P. (2000). Multiple Imputation for Missing Data: A Cautionary Tale, Sociological Methods and Research, 28, 301-309. (Preprint)

Fichman, Mark and Jonathon N. Cummings (2003) “Multiple Imputation for Missing Data: Making the most of What you Know”, Organizational Research Methods, Volume: 6 issue: 3, page(s): 282-308.

Gelman, Andrew and Jennifer Hill (2006) Data Analysis Using Regression and Multilevel/Hierarchical Models, Cambridge University Press. * “Chapter 25: Missing Data Imputation”

Gelman, Andrew, et al. (2014) Bayesian Data Analysis, (3rd edition). (see chapter 18, “Models for missing data”, pp.449-467)

Karen Grace-Martin (2016?) “Two Recommended Solutions for Missing Data: Multiple Imputation and Maximum Likelihood”

Karen Grace-Martin, “Two Recommended Solutions for Missing Data: Multiple Imputation and Maximum Likelihood”

Neil J Perkins, Stephan R Cole, et al. (2017) “Principled Approaches to Missing Data in Epidemiologic Studies”, American Journal of Epidemiology

Karen, The Analysis Factor, Multiple Imputation of Categorical Variables

Jeff Meyer, The Analysis Factor, Multiple Imputation for Missing Data: Indicator Variables versus Categorical Variables

20.4.2 R

Robert I. Kabacoff, (2011) [R in Action: Data analysis and graphics with R], Manning. (see chapter 15, “Advanced methods for missing data”, pp.352-372)

Joseph Rickert, “Missing Values, Data Science and R”, 2016-11-30

Thomas Leeper, Multiple imputation {tutorial for Amelia, mi, and mice}

“Tutorial on 5 Powerful R Packages used for imputing missing values” {MICE, Amelia, missForest, Hmisc, mi}

20.4.2.1 {Amelia}

package

CRAN page: Amelia: A Program for Missing Data

vignette: Amelia II: A Package for Missing Data {PDF version}

description: Amelia II: A Program for Missing Data

github page for Amelia II

20.4.2.3 {Hmisc}

package

CRAN page: Hmisc: Harrell Miscellaneous

20.4.2.4 {mi}

package

CRAN page: mi: Missing Data Imputation and Model Checking

articles

Su, Gelman, Hill and Yajima (2011) Multiple Imputation with Diagnostics (mi) in R: Opening Windows into the Black Box, Journal of Statistical Software, vol. 45.

Ben Goodrich and Jonathan Kropko, 2014-06-16, “An Example of mi Usage”

20.4.2.5 {mice}

package

CRAN page: mice: Multivariate Imputation by Chained Equations

see also

package miceadds on CRAN: miceadds: Some Additional Multiple Imputation Functions, Especially for ‘mice’

articles

Stef van Buuren & Karin Groothuis-Oudshoorn, 2011-12-12, “mice: Multivariate Imputation by Chained Equations in R”, Journal of Statistical Software, Vol 45, Issue 3.

Michy Alice, “Imputing missing data with R; MICE package”

datascience+, 2015-10-04 and updated 2017-04-28, Imputing Missing Data with R; MICE package

20.4.2.9 {VIM}

package

CRAN page: VIM: Visualization and Imputation of Missing Values

articles

Alexander Kowarik, Matthias Templ (2016) “Imputation with the R Package VIM”, Journal of Statistical Software, vol. 74.

https://www.jstatsoft.org/article/view/v074i07


20.5 Moving Window (for raster data)

20.5.1 {grainchanger}

“The grainchanger package provides functionality for data aggregation to a grid via moving-window or direct methods.”


20.6 Multivariate Analysis

(Not to be confused with multi_variable analysis)

20.6.1 {explor}

GitHub page – “an R package to allow interactive exploration of multivariate analysis results.”

  • Covers Principal Component Analysis, Correspondence Analysis, Multiple Correspondence Analysis, among other methods.

20.7 Principal Component Analysis (PCA)


20.8 Random walk

From wikipedia entry on random walk:

A random walk is a mathematical object, known as a stochastic or random process, that describes a path that consists of a succession of random steps on some mathematical space such as the integers.

20.8.1 Theory and methods

Karl Pearson (1905). “The Problem of the Random Walk”. Nature. 72 (1865): 294.

** The Problem of the Random Walk **

Can any of your readers refer me to a work wherein I should find a solution of the following problem, or failing the knowledge of any existing solution provide me with an original one? I should be extremely grateful for aid in the matter.

A man starts from a point O and walks l yards in a straight line; he then turns at any angle whatever and walks another l yards in a second straight line. he repeats this process n times. I require the probability that after these n stretches he is at a distance between r and r + delta-r from his starting point, O.

The problem is one of considerable interest, but I have only succeeded in obtaining an integrated solution for two stretches. I think, however, that a solution ought to be found, if only in the form of a series in powers of 1/n, where n is large.

Karl Pearson

The Gables, East Ilsley, Berks.

20.9 Raking

Also known as iterative proportional fitting procedure, or IPFP; uses include weighting survey responses to accurately match the population proportions)

Includes post-stratification weights in surveying.

20.9.1 Theory and methods

The primary method of raking is iterative proportional fitting, or IPF

IPF resources

LCDR Lew Anderson and Dr. Ronald D. Fricker, Jr. “Raking: An Important and Often Overlooked Survey Analysis Tool” {PDF}

Michael P. Battaglia, David Izrael, David C. Hoaglin, and Martin R. Frankel, “Tips and Tricks for Raking Survey Data (a.k.a. Sample Balancing)” {PDF}

Andrew Gelman, Tracking public opinion with biased polls, Washington Post, 2014-04-09.

Eddie Hunsinger, “Iterative Proportional Fitting For A Two-Dimensional Table”, May 2008

Sven Kurras, “Symmetric Iterative Proportional Fitting”, Appearing in Proceedings of the 18th International Conference on Artificial Intelligence and Statistics (AISTATS) 2015, San Diego, CA, USA. JMLR: W&CP volume 38.

Robin Lovelace, “Population synthesis with R”, from Spatial Microsimulation with R

20.9.2 R

DIY Solution

20.9.2.1 {anesrake}

package

CRAN page: anesrake: ANES Raking Implementation

articles

Josh Pasek (2010-03-15) “ANES Weighting Algorithm: A Description” {PDF}

Josh Pasek, Matthew DeBell, Jon A. Krosnick (2014-07-26) “Standardizing!and!Democratizing!Survey!Weights: The ANES Weighting System and anesrake” {PDF}

Raking weights with R

20.9.2.3 {survey}

package

CRAN page: survey: analysis of complex survey samples

homepage: Survey analysis in R

articles

Lumley, Thomas (2010) Complex Surveys: A Guide to Analysis Using R, John Wiley & Sons, Inc.

20.9.2.5 {weights}

package

CRAN page: weights: Weighting and Weighted Statistics



20.10 Structural equation modeling (SEM)

20.10.1 R

Arranged by package

20.10.1.1 {lavaan}

package

CRAN page: lavaan: Latent Variable Analysis

articles

“The lavaan project”

Yves Rosseel, 2012-05-24, “lavaan: An R Package for Structural Equation Modeling”, Journal of Statistical Software, Vol. 48, Issue 2.

Grace Charles, 2015-05-20, First Steps with Structural Equation Modeling – blog post by Noam Ross, re: Charles’ presention at Davis R Users’ Group.