Chapter 17 Specific techniques
17.1 Activity-Based Costing (ABC)
Usually seen in the context of management accounting, ABC is a method that measures the cost and volume of inputs required to produce a fixed amount of output.
17.1.1 Theory and methods
Activity-Based Costing, at Inc.com
Robert S. Kaplan and Steven R. Anderson, Time-Driven Activity-Based Costing (November 2003). Available at SSRN: https://ssrn.com/abstract=485443 or http://dx.doi.org/10.2139/ssrn.485443
Robert S. Kaplan and Steven R. Anderson, Rethinking Activity-Based Costing, 2005-01-24
Fariborz Y.Partovi, An analytic hierarchy approach to activity-based costing, International Journal of Production Economics, 1991, 151-161
17.1.2 R
Ryan K McBain, et al., “Activity-based costing of health-care delivery, Haiti”, Bulletin of the World Health Organization, 2018; 96:10-17.
17.2 Ecological inference
Ecological inference is a method for inferring individual behavior from group-level data.
17.2.1 Theory and methods
Gary King, Ecological Inference – topic page by a leader in the field, with links to assorted research and methodology papers.
- Gary King, 1997, A Solution to the Ecological Inference Problem: Reconstructing Individual Behavior from Aggregate Data ; part 1 {PDF}
Michael Stoto “Ecological Inference in Public Health”, book review of King, Ecological Inference
17.2.2 R
Arranged by package
17.2.2.1 {ei}
package
CRAN page: ei: Ecological Inference
articles
Gary King and Margaret Roberts, EI: A(n R) Program for Ecological Inference – website with assorted resources
17.3 Forecasting
Forecasting methods extrapolate past trends. There is a wealth of material supporting the theory and methods around this, much of it coming from econometrics.
- See also Time series analysis and Seasonal adjustment
17.3.1 Theory and methods
Kamala Kanta Mishra, Selecting Forecasting Methods in Data Science (2017-02-13)
17.3.2 R
Kostiantyn Kravchuk, “Forecasting: Time Series Exploration Exercises (Part-1)” (2017-04-10)
17.3.2.1 {fable}
“…provides methods and tools for displaying and analysing univariate time series forecasts including exponential smoothing via state space models and automatic ARIMA modelling. Data, model and forecast objects are all stored in a tidy format.”
package
documentation: fable
17.3.2.2 {prophet}
package
CRAN page: prophet: Automatic Forecasting Procedure
documentation: Prophet: forecasting at scale
articles
“Prophet: How Facebook operationalizes time series forecasting at scale” at Revolutions Analytics (2017-02-24)
17.4 Gini coefficient
From the wikipedia entry:
The Gini coefficient (also known as the Gini index or Gini ratio) is a measure of statistical dispersion intended to represent the income distribution of a nation’s residents, and is the most commonly used measure of inequality. It was developed by the Italian statistician and sociologist Corrado Gini and published in his 1912 paper “Variability and Mutability” (Italian: Variabilità e mutabilità).
The Gini coefficient measures the inequality among values of a frequency distribution (for example, levels of income). A Gini coefficient of zero expresses perfect equality, where all values are the same (for example, where everyone has the same income). A Gini coefficient of 1 (or 100%) expresses maximal inequality among values (e.g., for a large number of people, where only one person has all the income or consumption, and all others have none, the Gini coefficient will be very nearly one).
Gini coefficient: wikipedia entry, 2016-05-07
17.4.0.1 Further reading
Lamb, Evelyn (2012-11-12) “Ask Gini: How to Measure Inequality”, Scientific American “The Sciences”.
World Bank (date unknown) “Measuring Inequality”
17.4.1 R
17.5 Imputation of missing data (or missing values)
Missing data can pose a challenge for a data analysis, and can limit or compromise the models and conclusions that can be drawn.
One method of dealing with missing data is through imputation.
17.5.1 Theory and methods
Missing data – wikipedia
Allison, P. (2000). Multiple Imputation for Missing Data: A Cautionary Tale, Sociological Methods and Research, 28, 301-309. (Preprint)
Fichman, Mark and Jonathon N. Cummings (2003) “Multiple Imputation for Missing Data: Making the most of What you Know”, Organizational Research Methods, Volume: 6 issue: 3, page(s): 282-308.
Gelman, Andrew and Jennifer Hill (2006) Data Analysis Using Regression and Multilevel/Hierarchical Models, Cambridge University Press. * “Chapter 25: Missing Data Imputation”
Gelman, Andrew, et al. (2014) Bayesian Data Analysis, (3rd edition). (see chapter 18, “Models for missing data”, pp.449-467)
Karen Grace-Martin (2016?) “Two Recommended Solutions for Missing Data: Multiple Imputation and Maximum Likelihood”
Karen Grace-Martin, “Two Recommended Solutions for Missing Data: Multiple Imputation and Maximum Likelihood”
Neil J Perkins, Stephan R Cole, et al. (2017) “Principled Approaches to Missing Data in Epidemiologic Studies”, American Journal of Epidemiology
Karen, The Analysis Factor, Multiple Imputation of Categorical Variables
Jeff Meyer, The Analysis Factor, Multiple Imputation for Missing Data: Indicator Variables versus Categorical Variables
17.5.2 R
Robert I. Kabacoff, (2011) [R in Action: Data analysis and graphics with R], Manning. (see chapter 15, “Advanced methods for missing data”, pp.352-372)
Joseph Rickert, “Missing Values, Data Science and R”, 2016-11-30
Thomas Leeper, Multiple imputation {tutorial for Amelia
, mi
, and mice
}
“Tutorial on 5 Powerful R Packages used for imputing missing values” {MICE
, Amelia
, missForest
, Hmisc
, mi
}
17.5.2.1 {Amelia}
package
CRAN page: Amelia: A Program for Missing Data
vignette: Amelia II: A Package for Missing Data {PDF version}
description: Amelia II: A Program for Missing Data
github page for Amelia II
17.5.2.2 {BaBooN}
17.5.2.4 {mi}
package
CRAN page: mi: Missing Data Imputation and Model Checking
articles
Su, Gelman, Hill and Yajima (2011) Multiple Imputation with Diagnostics (mi) in R: Opening Windows into the Black Box, Journal of Statistical Software, vol. 45.
Ben Goodrich and Jonathan Kropko, 2014-06-16, “An Example of mi Usage”
17.5.2.5 {mice}
package
CRAN page: mice: Multivariate Imputation by Chained Equations
see also
package miceadds
on CRAN: miceadds: Some Additional Multiple Imputation Functions, Especially for ‘mice’
articles
Stef van Buuren & Karin Groothuis-Oudshoorn, 2011-12-12, “mice: Multivariate Imputation by Chained Equations in R”, Journal of Statistical Software, Vol 45, Issue 3.
Michy Alice, “Imputing missing data with R; MICE package”
datascience+, 2015-10-04 and updated 2017-04-28, Imputing Missing Data with R; MICE package
17.5.2.6 {missMDA}
package
CRAN page: missMDA: Handling Missing Values with Multivariate Data Analysis
articles
francoishusson, 2017-08-15, Multiple imputation for continuous and categorical data
17.5.2.7 {missForest}
package
CRAN page: missForest: Nonparametric Missing Value Imputation using Random Forest
17.5.2.8 {NPBayesImpute}
CRAN page: NPBayesImpute: Non-Parametric Bayesian Multiple Imputation for Categorical Data
17.5.2.9 {VIM}
package
CRAN page: VIM: Visualization and Imputation of Missing Values
articles
Alexander Kowarik, Matthias Templ (2016) “Imputation with the R Package VIM”, Journal of Statistical Software, vol. 74.
https://www.jstatsoft.org/article/view/v074i07
17.6 Moving Window (for raster data)
17.6.1 {grainchanger}
“The grainchanger package provides functionality for data aggregation to a grid via moving-window or direct methods.”
17.7 Multivariate Analysis
(Not to be confused with multi_variable analysis)
17.7.1 {explor}
GitHub page – “an R package to allow interactive exploration of multivariate analysis results.”
- Covers Principal Component Analysis, Correspondence Analysis, Multiple Correspondence Analysis, among other methods.
17.8 Principal Component Analysis (PCA)
New Video!
— Luis G. Serrano (/@/luis_likes_math) February 10, 2019
PCA (Principal Component Analysis), enjoy and share if you like it!https://t.co/9jvOIE4xAh
17.9 Random walk
From wikipedia entry on random walk:
A random walk is a mathematical object, known as a stochastic or random process, that describes a path that consists of a succession of random steps on some mathematical space such as the integers.
17.9.1 Theory and methods
Karl Pearson (1905). “The Problem of the Random Walk”. Nature. 72 (1865): 294.
** The Problem of the Random Walk **
Can any of your readers refer me to a work wherein I should find a solution of the following problem, or failing the knowledge of any existing solution provide me with an original one? I should be extremely grateful for aid in the matter.
A man starts from a point O and walks l yards in a straight line; he then turns at any angle whatever and walks another l yards in a second straight line. he repeats this process n times. I require the probability that after these n stretches he is at a distance between r and r + delta-r from his starting point, O.
The problem is one of considerable interest, but I have only succeeded in obtaining an integrated solution for two stretches. I think, however, that a solution ought to be found, if only in the form of a series in powers of 1/n, where n is large.
Karl Pearson
The Gables, East Ilsley, Berks.
17.10 Raking
Also known as iterative proportional fitting procedure, or IPFP; uses include weighting survey responses to accurately match the population proportions)
Includes post-stratification weights in surveying.
17.10.1 Theory and methods
The primary method of raking is iterative proportional fitting, or IPF
LCDR Lew Anderson and Dr. Ronald D. Fricker, Jr. “Raking: An Important and Often Overlooked Survey Analysis Tool” {PDF}
Michael P. Battaglia, David Izrael, David C. Hoaglin, and Martin R. Frankel, “Tips and Tricks for Raking Survey Data (a.k.a. Sample Balancing)” {PDF}
Andrew Gelman, Tracking public opinion with biased polls, Washington Post, 2014-04-09.
Eddie Hunsinger, “Iterative Proportional Fitting For A Two-Dimensional Table”, May 2008
Sven Kurras, “Symmetric Iterative Proportional Fitting”, Appearing in Proceedings of the 18th International Conference on Artificial Intelligence and Statistics (AISTATS) 2015, San Diego, CA, USA. JMLR: W&CP volume 38.
Robin Lovelace, “Population synthesis with R”, from Spatial Microsimulation with R
17.10.2 R
DIY Solution
- Christopher Waldhauser (2014-04-13) Survey: Computing Your Own Post-Stratification Weights in R (at R-Bloggers)
17.10.2.1 {anesrake}
package
CRAN page: anesrake: ANES Raking Implementation
articles
Josh Pasek (2010-03-15) “ANES Weighting Algorithm: A Description” {PDF}
Josh Pasek, Matthew DeBell, Jon A. Krosnick (2014-07-26) “Standardizing!and!Democratizing!Survey!Weights: The ANES Weighting System and anesrake” {PDF}
17.10.2.2 {ipfp}
package
CRAN page: ipfp: Fast Implementation of the Iterative Proportional Fitting Procedure in C
github page: awblocker/ipfp
articles
17.10.2.3 {survey}
package
CRAN page: survey: analysis of complex survey samples
homepage: Survey analysis in R
articles
Lumley, Thomas (2010) Complex Surveys: A Guide to Analysis Using R, John Wiley & Sons, Inc.
17.10.2.4 rake() function in {survey}
articles
1/2 Social Science Goes R: Weighted Survey Data
2/2 Survey: Computing Your Own Post-Stratification Weights in R
17.11 Seasonal adjustment
From the wikipedia entry:
Seasonal adjustment is a statistical method for removing the seasonal component of a time series that exhibits a seasonal pattern. It is usually done when wanting to analyse the trend of a time series independently of the seasonal components. It is normal to report seasonally adjusted data for unemployment rates to reveal the underlying trends in labor markets. Many economic phenomena have seasonal cycles, such as agricultural production and consumer consumption, e.g. greater consumption leading up to Christmas. It is necessary to adjust for this component in order to understand what underlying trends are in the economy and so official statistics are often adjusted to remove seasonal components.
Seasonal adjustment: wikipedia entry, 2016-05-07
- see also Forecasting and Time series analysis
17.11.1 Theory and methods
Statistics Canada, “Seasonal adjustment and trend-cycle estimation” (part of Statistics Canada Quality Guidelines, Catalogue 12-539-X)
U.S. Census Bureau, The X-13ARIMA-SEATS Seasonal Adjustment Program
17.11.2 R
17.11.2.1 {ggsdc}
package
CRAN page: ggseas: ‘stats’ for Seasonal Adjustment on the Fly with ‘ggplot2’
17.11.2.2 {ggseas}
package
CRAN page: ggseas: ‘stats’ for Seasonal Adjustment on the Fly with ‘ggplot2’
articles
Ellis, Peter. 2016-10-12. “Update of ggseas
for seasonal decomposition on the fly”, blog entry
Ellis, Peter. 2016-03-28. “Seasonal decomposition in the ggplot2 universe with ggseas”, blog entry.
Ellis, Peter. 2016-02-08. “ggseas package for seasonal adjustment on the fly with ggplot2”, blog entry.
17.11.2.3 {seasonal}
seasonal: R-interface to X-13ARIMA-SEATS
Packages the U.S. Census Bureau’s gold-standard X13-SEATS-ARIMA for use in R.
“…the best interface on the planet to the X13-SEATS-ARIMA time series analysis application from the US Census Department, which is the industry standard particularly for official statistics agencies doing seasonal adjustment.” (Peter Ellis, vignette for ggsdc
)
package
CRAN page: seasonal: R Interface to X-13-ARIMA-SEATS’
github page: christophsax/seasonal
17.11.2.4 {x13binary}
(US Census Bureau X-13, packaged for easy loading. Loads as a dependency for most of the other seasonal adjustment packages.)
package
CRAN page: x13binary: Provide the ‘x13ashtml’ Seasonal Adjustment Binary
17.12 Structural equation modeling (SEM)
17.12.1 R
Arranged by package
17.12.1.1 {lavaan}
package
CRAN page: lavaan: Latent Variable Analysis
articles
Yves Rosseel, 2012-05-24, “lavaan: An R Package for Structural Equation Modeling”, Journal of Statistical Software, Vol. 48, Issue 2.
Grace Charles, 2015-05-20, First Steps with Structural Equation Modeling – blog post by Noam Ross, re: Charles’ presention at Davis R Users’ Group.
17.12.1.2 {sem}
package
CRAN page: sem: Structural Equation Models
articles
Jeremy Albright, 2015-02-26, “Structural Equation Models Using the SEM Package in R”
John Fox, “Structural Equation Modeling With the sem
Package in R” {PDF}
“Structural Equation Modeling in R”
17.13 Time series analysis
A common theme in data analysis…comparing multiple points in time.
See also
17.13.1 Theory and methods
Tavish Srivastava, 2015-12-16, “A Complete Tutorial on Time Series Modeling in R”
17.13.2 R
Work w/ time series? Check out (???)'s 🌟 talk from #rstudioconf:
— Mara Averick ((???)) March 8, 2019
⏰ “Melt the clock: tidy time series analysis”
📽 https://t.co/5xkkMpAsxn
📺 https://t.co/yvyU6RpW8U
{tsibble} https://t.co/Gth8ZimfOz
{fable} https://t.co/YTfWMo4VYV#rstats #timeseries pic.twitter.com/CtCHnChzA6
Earo Wang, “Melt the clock: Tidy time series analysis” (presentation at RStudio conference, 2019)
17.13.2.1 {tsfeatures}
Methods for extracting various features from time series data
package
CRAN: tsfeatures: Time Series Feature Extraction
articles
17.13.2.2 {tsibble}
package
CRAN page: tsibble: Tidy Temporal Data Frames and Tools
github page: tsibble`: Tidy Temporal Data Frames and Tools
articles
Earo Wang, 2018-12-20, “Reintroducing tsibble: data tools that melt the clock”
Earo Wang and Dianne Cook and Rob J Hyndman, January 2019, “A new tidy data structure to support exploration and modeling of temporal data”(Wang, Cook, and Hyndman 2019)
17.13.2.3 {padr}
package
CRAN page: padr: Quickly Get Datetime Data Ready for Analysis
articles
Andrew Clark, 2017-07-19, padr package example
17.13.2.4 {zoo}
package
CRAN page: zoo: S3 Infrastructure for Regular and Irregular Time Series (Z’s Ordered Observations)
References
Wang, Earo, Dianne Cook, and Rob J Hyndman. 2019. “A New Tidy Data Structure to Support Exploration and Modeling of Temporal Data.”