Chapter 17 Specific techniques

17.1 Activity-Based Costing (ABC)

Usually seen in the context of management accounting, ABC is a method that measures the cost and volume of inputs required to produce a fixed amount of output.

Wikipedia page

17.1.1 Theory and methods

Activity-Based Costing, at Inc.com

Robert S. Kaplan and Steven R. Anderson, Time-Driven Activity-Based Costing (November 2003). Available at SSRN: https://ssrn.com/abstract=485443 or http://dx.doi.org/10.2139/ssrn.485443

https://hbr.org/2004/11/time-driven-activity-based-costing

Robert S. Kaplan and Steven R. Anderson, Rethinking Activity-Based Costing, 2005-01-24

Fariborz Y.Partovi, An analytic hierarchy approach to activity-based costing, International Journal of Production Economics, 1991, 151-161

17.1.2 R

Ryan K McBain, et al., “Activity-based costing of health-care delivery, Haiti”, Bulletin of the World Health Organization, 2018; 96:10-17.

Shiny app

17.2 Ecological inference

Ecological inference is a method for inferring individual behavior from group-level data.

17.2.1 Theory and methods

Gary King, Ecological Inference – topic page by a leader in the field, with links to assorted research and methodology papers.

Gary King, 1997, A Solution to the Ecological Inference Problem: Reconstructing Individual Behavior from Aggregate Data ; part 1 {PDF}

Michael Stoto “Ecological Inference in Public Health”, book review of King, Ecological Inference

17.2.2 R

Arranged by package

17.2.2.1 {ei}

package

CRAN page: ei: Ecological Inference

articles

Gary King and Margaret Roberts, EI: A(n R) Program for Ecological Inference – website with assorted resources

17.3 Forecasting

Forecasting methods extrapolate past trends. There is a wealth of material supporting the theory and methods around this, much of it coming from econometrics.

See also Time series analysis and Seasonal adjustment

17.3.1 Theory and methods

Kamala Kanta Mishra, Selecting Forecasting Methods in Data Science (2017-02-13)

17.3.2 R

Kostiantyn Kravchuk, “Forecasting: Time Series Exploration Exercises (Part-1)” (2017-04-10)

17.3.2.1 {fable}

“…provides methods and tools for displaying and analysing univariate time series forecasts including exponential smoothing via state space models and automatic ARIMA modelling. Data, model and forecast objects are all stored in a tidy format.”

package

documentation: fable

17.3.2.2 {prophet}

package

CRAN page: prophet: Automatic Forecasting Procedure

documentation: Prophet: forecasting at scale

articles

“Prophet: How Facebook operationalizes time series forecasting at scale” at Revolutions Analytics (2017-02-24)

17.4 Gini coefficient

From the wikipedia entry:

The Gini coefficient (also known as the Gini index or Gini ratio) is a measure of statistical dispersion intended to represent the income distribution of a nation’s residents, and is the most commonly used measure of inequality. It was developed by the Italian statistician and sociologist Corrado Gini and published in his 1912 paper “Variability and Mutability” (Italian: Variabilità e mutabilità).

The Gini coefficient measures the inequality among values of a frequency distribution (for example, levels of income). A Gini coefficient of zero expresses perfect equality, where all values are the same (for example, where everyone has the same income). A Gini coefficient of 1 (or 100%) expresses maximal inequality among values (e.g., for a large number of people, where only one person has all the income or consumption, and all others have none, the Gini coefficient will be very nearly one).

Gini coefficient: wikipedia entry, 2016-05-07

17.4.0.1 Further reading

Lamb, Evelyn (2012-11-12) “Ask Gini: How to Measure Inequality”, Scientific American “The Sciences”.

World Bank (date unknown) “Measuring Inequality”

17.4.1 R

17.4.1.1 {ineq}

package

CRAN: ineq: Measuring Inequality, Concentration, and Poverty

17.5 Imputation of missing data (or missing values)

Missing data can pose a challenge for a data analysis, and can limit or compromise the models and conclusions that can be drawn.

One method of dealing with missing data is through imputation.

17.5.1 Theory and methods

Missing data – wikipedia

Allison, P. (2000). Multiple Imputation for Missing Data: A Cautionary Tale, Sociological Methods and Research, 28, 301-309. (Preprint)

Fichman, Mark and Jonathon N. Cummings (2003) “Multiple Imputation for Missing Data: Making the most of What you Know”, Organizational Research Methods, Volume: 6 issue: 3, page(s): 282-308.

Gelman, Andrew and Jennifer Hill (2006) Data Analysis Using Regression and Multilevel/Hierarchical Models, Cambridge University Press. * “Chapter 25: Missing Data Imputation”

Gelman, Andrew, et al. (2014) Bayesian Data Analysis, (3rd edition). (see chapter 18, “Models for missing data”, pp.449-467)

Karen Grace-Martin (2016?) “Two Recommended Solutions for Missing Data: Multiple Imputation and Maximum Likelihood”

Karen Grace-Martin, “Two Recommended Solutions for Missing Data: Multiple Imputation and Maximum Likelihood”

Neil J Perkins, Stephan R Cole, et al. (2017) “Principled Approaches to Missing Data in Epidemiologic Studies”, American Journal of Epidemiology

Karen, The Analysis Factor, Multiple Imputation of Categorical Variables

Jeff Meyer, The Analysis Factor, Multiple Imputation for Missing Data: Indicator Variables versus Categorical Variables

17.5.2 R

Robert I. Kabacoff, (2011) [R in Action: Data analysis and graphics with R], Manning. (see chapter 15, “Advanced methods for missing data”, pp.352-372)

Joseph Rickert, “Missing Values, Data Science and R”, 2016-11-30

Thomas Leeper, Multiple imputation {tutorial for Amelia, mi, and mice}

“Tutorial on 5 Powerful R Packages used for imputing missing values” {MICE, Amelia, missForest, Hmisc, mi}

17.5.2.1 {Amelia}

package

CRAN page: Amelia: A Program for Missing Data

vignette: Amelia II: A Package for Missing Data {PDF version}

description: Amelia II: A Program for Missing Data

github page for Amelia II

17.5.2.2 {BaBooN}

CRAN page: BaBooN: Bayesian Bootstrap Predictive Mean Matching - Multiple and Single Imputation for Discrete Data

17.5.2.3 {Hmisc}

package

CRAN page: Hmisc: Harrell Miscellaneous

17.5.2.4 {mi}

package

CRAN page: mi: Missing Data Imputation and Model Checking

articles

Su, Gelman, Hill and Yajima (2011) Multiple Imputation with Diagnostics (mi) in R: Opening Windows into the Black Box, Journal of Statistical Software, vol. 45.

Ben Goodrich and Jonathan Kropko, 2014-06-16, “An Example of mi Usage”

17.5.2.5 {mice}

package

CRAN page: mice: Multivariate Imputation by Chained Equations

articles

Stef van Buuren & Karin Groothuis-Oudshoorn, 2011-12-12, “mice: Multivariate Imputation by Chained Equations in R”, Journal of Statistical Software, Vol 45, Issue 3.

Michy Alice, “Imputing missing data with R; MICE package”

original source

datascience+, 2015-10-04 and updated 2017-04-28, Imputing Missing Data with R; MICE package

17.5.2.6 {missMDA}

package

CRAN page: missMDA: Handling Missing Values with Multivariate Data Analysis

articles

francoishusson, 2017-08-15, Multiple imputation for continuous and categorical data

17.5.2.7 {missForest}

package

CRAN page: missForest: Nonparametric Missing Value Imputation using Random Forest

17.5.2.8 {NPBayesImpute}

CRAN page: NPBayesImpute: Non-Parametric Bayesian Multiple Imputation for Categorical Data

17.5.2.9 {VIM}

package

CRAN page: VIM: Visualization and Imputation of Missing Values

articles

Alexander Kowarik, Matthias Templ (2016) “Imputation with the R Package VIM”, Journal of Statistical Software, vol. 74.

https://www.jstatsoft.org/article/view/v074i07

17.6 Moving Window (for raster data)

17.6.1 {grainchanger}

“The grainchanger package provides functionality for data aggregation to a grid via moving-window or direct methods.”

17.7 Multivariate Analysis

(Not to be confused with multi_variable analysis)

17.7.1 {explor}

GitHub page – “an R package to allow interactive exploration of multivariate analysis results.”

Covers Principal Component Analysis, Correspondence Analysis, Multiple Correspondence Analysis, among other methods.

17.8 Principal Component Analysis (PCA)

New Video!
PCA (Principal Component Analysis), enjoy and share if you like it!https://t.co/9jvOIE4xAh
— Luis G. Serrano (/@/luis_likes_math) February 10, 2019

17.9 Random walk

From wikipedia entry on random walk:

A random walk is a mathematical object, known as a stochastic or random process, that describes a path that consists of a succession of random steps on some mathematical space such as the integers.

17.9.1 Theory and methods

Karl Pearson (1905). “The Problem of the Random Walk”. Nature. 72 (1865): 294.

** The Problem of the Random Walk **

Can any of your readers refer me to a work wherein I should find a solution of the following problem, or failing the knowledge of any existing solution provide me with an original one? I should be extremely grateful for aid in the matter.

A man starts from a point O and walks l yards in a straight line; he then turns at any angle whatever and walks another l yards in a second straight line. he repeats this process n times. I require the probability that after these n stretches he is at a distance between r and r + delta-r from his starting point, O.

The problem is one of considerable interest, but I have only succeeded in obtaining an integrated solution for two stretches. I think, however, that a solution ought to be found, if only in the form of a series in powers of 1/n, where n is large.

Karl Pearson

The Gables, East Ilsley, Berks.

17.9.2 R

Zhijun Yang, “Brownian Motion Simulation Project in R”

17.10 Raking

Also known as iterative proportional fitting procedure, or IPFP; uses include weighting survey responses to accurately match the population proportions)

Includes post-stratification weights in surveying.

17.10.1 Theory and methods

The primary method of raking is iterative proportional fitting, or IPF

IPF resources

LCDR Lew Anderson and Dr. Ronald D. Fricker, Jr. “Raking: An Important and Often Overlooked Survey Analysis Tool” {PDF}

Michael P. Battaglia, David Izrael, David C. Hoaglin, and Martin R. Frankel, “Tips and Tricks for Raking Survey Data (a.k.a. Sample Balancing)” {PDF}

Andrew Gelman, Tracking public opinion with biased polls, Washington Post, 2014-04-09.

Eddie Hunsinger, “Iterative Proportional Fitting For A Two-Dimensional Table”, May 2008

Sven Kurras, “Symmetric Iterative Proportional Fitting”, Appearing in Proceedings of the 18th International Conference on Artificial Intelligence and Statistics (AISTATS) 2015, San Diego, CA, USA. JMLR: W&CP volume 38.

Robin Lovelace, “Population synthesis with R”, from Spatial Microsimulation with R

17.10.2 R

DIY Solution

Christopher Waldhauser (2014-04-13) Survey: Computing Your Own Post-Stratification Weights in R (at R-Bloggers)

17.10.2.1 {anesrake}

package

CRAN page: anesrake: ANES Raking Implementation

articles

Josh Pasek (2010-03-15) “ANES Weighting Algorithm: A Description” {PDF}

Josh Pasek, Matthew DeBell, Jon A. Krosnick (2014-07-26) “Standardizing!and!Democratizing!Survey!Weights: The ANES Weighting System and anesrake” {PDF}

Raking weights with R

17.10.2.2 {ipfp}

package

CRAN page: ipfp: Fast Implementation of the Iterative Proportional Fitting Procedure in C

github page: awblocker/ipfp

articles

Iterative proportional fitting in R (stackexchange)

17.10.2.3 {survey}

package

CRAN page: survey: analysis of complex survey samples

homepage: Survey analysis in R

articles

Lumley, Thomas (2010) Complex Surveys: A Guide to Analysis Using R, John Wiley & Sons, Inc.

17.10.2.4 rake() function in {survey}

articles

1/2 Social Science Goes R: Weighted Survey Data

2/2 Survey: Computing Your Own Post-Stratification Weights in R

rake {survey}: Raking of replicate weight design

17.10.2.5 {weights}

package

CRAN page: weights: Weighting and Weighted Statistics

17.11 Seasonal adjustment

From the wikipedia entry:

Seasonal adjustment is a statistical method for removing the seasonal component of a time series that exhibits a seasonal pattern. It is usually done when wanting to analyse the trend of a time series independently of the seasonal components. It is normal to report seasonally adjusted data for unemployment rates to reveal the underlying trends in labor markets. Many economic phenomena have seasonal cycles, such as agricultural production and consumer consumption, e.g. greater consumption leading up to Christmas. It is necessary to adjust for this component in order to understand what underlying trends are in the economy and so official statistics are often adjusted to remove seasonal components.

Seasonal adjustment: wikipedia entry, 2016-05-07

see also Forecasting and Time series analysis

17.11.1 Theory and methods

Statistics Canada, “Seasonal adjustment and trend-cycle estimation” (part of Statistics Canada Quality Guidelines, Catalogue 12-539-X)

U.S. Census Bureau, The X-13ARIMA-SEATS Seasonal Adjustment Program

17.11.2 R

17.11.2.1 {ggsdc}

package

CRAN page: ggseas: ‘stats’ for Seasonal Adjustment on the Fly with ‘ggplot2’

Vignette

17.11.2.2 {ggseas}

package

CRAN page: ggseas: ‘stats’ for Seasonal Adjustment on the Fly with ‘ggplot2’

Vignette

articles

Ellis, Peter. 2016-10-12. “Update of ggseas for seasonal decomposition on the fly”, blog entry

Ellis, Peter. 2016-03-28. “Seasonal decomposition in the ggplot2 universe with ggseas”, blog entry.

Ellis, Peter. 2016-02-08. “ggseas package for seasonal adjustment on the fly with ggplot2”, blog entry.

17.11.2.3 {seasonal}

seasonal: R-interface to X-13ARIMA-SEATS

Packages the U.S. Census Bureau’s gold-standard X13-SEATS-ARIMA for use in R.

“…the best interface on the planet to the X13-SEATS-ARIMA time series analysis application from the US Census Department, which is the industry standard particularly for official statistics agencies doing seasonal adjustment.” (Peter Ellis, vignette for ggsdc)

package

CRAN page: seasonal: R Interface to X-13-ARIMA-SEATS’

github page: christophsax/seasonal

17.11.2.4 {x13binary}

(US Census Bureau X-13, packaged for easy loading. Loads as a dependency for most of the other seasonal adjustment packages.)

package

CRAN page: x13binary: Provide the ‘x13ashtml’ Seasonal Adjustment Binary

17.12 Structural equation modeling (SEM)

17.12.1 R

Arranged by package

17.12.1.1 {lavaan}

package

CRAN page: lavaan: Latent Variable Analysis

articles

“The lavaan project”

Yves Rosseel, 2012-05-24, “lavaan: An R Package for Structural Equation Modeling”, Journal of Statistical Software, Vol. 48, Issue 2.

Grace Charles, 2015-05-20, First Steps with Structural Equation Modeling – blog post by Noam Ross, re: Charles’ presention at Davis R Users’ Group.

17.12.1.2 {sem}

package

CRAN page: sem: Structural Equation Models

articles

Jeremy Albright, 2015-02-26, “Structural Equation Models Using the SEM Package in R”

John Fox, “Structural Equation Modeling With the sem Package in R” {PDF}

“Structural Equation Modeling in R”

17.13 Time series analysis

A common theme in data analysis…comparing multiple points in time.

17.13.1 Theory and methods

Tavish Srivastava, 2015-12-16, “A Complete Tutorial on Time Series Modeling in R”

17.13.2 R

Work w/ time series? Check out (???)'s 🌟 talk from #rstudioconf:
⏰ “Melt the clock: tidy time series analysis”
📽 https://t.co/5xkkMpAsxn
📺 https://t.co/yvyU6RpW8U
{tsibble} https://t.co/Gth8ZimfOz
{fable} https://t.co/YTfWMo4VYV #rstats #timeseries pic.twitter.com/CtCHnChzA6
— Mara Averick ((???)) March 8, 2019

Earo Wang, “Melt the clock: Tidy time series analysis” (presentation at RStudio conference, 2019)

17.13.2.1 {tsfeatures}

Methods for extracting various features from time series data

package

CRAN: tsfeatures: Time Series Feature Extraction

package webpage

articles

getting started article

17.13.2.2 {tsibble}

package

CRAN page: tsibble: Tidy Temporal Data Frames and Tools

github page: tsibble`: Tidy Temporal Data Frames and Tools

articles

Earo Wang, 2018-12-20, “Reintroducing tsibble: data tools that melt the clock”

Earo Wang and Dianne Cook and Rob J Hyndman, January 2019, “A new tidy data structure to support exploration and modeling of temporal data”(Wang, Cook, and Hyndman 2019)

17.13.2.3 {padr}

package

CRAN page: padr: Quickly Get Datetime Data Ready for Analysis

articles

Andrew Clark, 2017-07-19, padr package example

17.13.2.4 {zoo}

package

CRAN page: zoo: S3 Infrastructure for Regular and Irregular Time Series (Z’s Ordered Observations)

References

Wang, Earo, Dianne Cook, and Rob J Hyndman. 2019. “A New Tidy Data Structure to Support Exploration and Modeling of Temporal Data.”