12.1 Continuous Variables

Transforming continuous variables can be useful for various reasons, including:

  • Changing the scale of variables to make them more interpretable or comparable.
  • Reducing skewness to approximate a normal distribution, which can improve statistical inference.
  • Stabilizing variance in cases of heteroskedasticity.
  • Enhancing interpretability in business applications (e.g., logarithmic transformations for financial data).

12.1.1 Standardization (Z-score Normalization)

A common transformation to center and scale data:

xi=xiˉxs

where:

  • xi is the original value,
  • ˉx is the sample mean,
  • s is the sample standard deviation.

When to Use:

  • When variables have different units of measurement and need to be on a common scale.

  • When a few large numbers dominate the dataset.


12.1.2 Min-Max Scaling (Normalization)

Rescales data to a fixed range, typically [0,1]:

xi=xixmin

When to Use:

  • When working with fixed-interval data (e.g., percentages, proportions).

  • When preserving relative relationships between values is important.

  • Caution: This method is sensitive to outliers, as extreme values determine the range.


12.1.3 Square Root and Cube Root Transformations

Useful for handling positive skewness and heteroskedasticity:

  • Square root: Reduces moderate skewness and variance.
  • Cube root: Works on more extreme skewness and allows negative values.

Common Use Cases:

  • Frequency count data (e.g., website visits, sales transactions).

  • Data with many small values or zeros (e.g., income distributions in microfinance).


12.1.4 Logarithmic Transformation

Logarithmic transformations are particularly useful for handling highly skewed data. They compress large values while expanding small values, which helps with heteroskedasticity and normality assumptions.

12.1.4.1 Common Log Transformations

Formula When to Use
x_i' = \log(x_i) When all values are positive.
x_i' = \log(x_i + 1) When data contains zeros.
x_i' = \log(x_i + c) Choosing c depends on context.
x_i' = \frac{x_i}{|x_i|} \log |x_i| When data contains negative values.
x_i'^\lambda = \log(x_i + \sqrt{x_i^2 + \lambda}) Generalized log transformation.

Selecting the constant c is critical:

  • If c is too large, it can obscure the true nature of the data.
  • If c is too small, the transformation might not effectively reduce skewness.

From a statistical modeling perspective:

  • For inference-based models, the choice of c can significantly impact the fit. See (Ekwaru and Veugelers 2018).
  • In causal inference (e.g., DID, IV), improper log transformations (e.g., logging zero values) can introduce bias (J. Chen and Roth 2023).

12.1.4.2 When is Log Transformation Problematic?

  • When zero values have a meaningful interpretation (e.g., income of unemployed individuals).
  • When data are censored (e.g., income data truncated at reporting thresholds).
  • When measurement error exists (e.g., rounding errors from survey responses).

If zeros are small but meaningful (e.g., revenue from startups), then using \log(x + c) may be acceptable.


library(tidyverse)

# Load dataset
data(cars)

# Original speed values
head(cars$speed)
#> [1] 4 4 7 7 8 9

# Log transformation (basic)
log(cars$speed) %>% head()
#> [1] 1.386294 1.386294 1.945910 1.945910 2.079442 2.197225

# Log transformation for zero-inflated data
log1p(cars$speed) %>% head()
#> [1] 1.609438 1.609438 2.079442 2.079442 2.197225 2.302585

12.1.5 Exponential Transformation

The exponential transformation is useful when data exhibit negative skewness or when an underlying logarithmic trend is suspected, such as in survival analysis and decay models.

When to Use:

  • Negatively skewed distributions.

  • Processes that follow an exponential trend (e.g., population growth, depreciation of assets).


12.1.6 Power Transformation

Power transformations help adjust skewness, particularly for negatively skewed data.

When to Use:

  • When variables have a negatively skewed distribution.

  • When the relationship between variables is non-linear.

Common power transformations include:

  • Square transformation: x^2 (moderate adjustment).

  • Cubic transformation: x^3 (stronger adjustment).

  • Fourth-root transformation: x^{1/4} (more subtle than square root).


12.1.7 Inverse (Reciprocal) Transformation

The inverse transformation is useful for handling platykurtic (flat) distributions or positively skewed data.

Formula:

x_i' = \frac{1}{x_i}

When to Use:

  • Reducing extreme values in positively skewed distributions.

  • Ratio data (e.g., speed = distance/time).

  • When the variable has a natural lower bound (e.g., time to completion).

data(cars)

# Original distribution
head(cars$dist)
#> [1]  2 10  4 22 16 10
plot(cars$dist)


# Reciprocal transformation
plot(1 / cars$dist)

12.1.8 Hyperbolic Arcsine Transformation

The arcsinh (inverse hyperbolic sine) transformation is useful for handling proportion variables (0-1) and skewed distributions. It behaves similarly to the logarithmic transformation but has the advantage of handling zero and negative values.

Formula:

\text{arcsinh}(Y) = \log(\sqrt{1 + Y^2} + Y)

When to Use:

  • Proportion variables (e.g., market share, probability estimates).

  • Data with extreme skewness where log transformation is problematic.

  • Variables containing zeros or negative values (unlike log, arcsinh handles zeros naturally).

  • Alternative to log transformation for handling zeros.

# Visualize original distribution 
cars$dist %>% hist() 

# Alternative histogram  
cars$dist %>% MASS::truehist()  


# Apply arcsinh transformation 
as_dist <- bestNormalize::arcsinh_x(cars$dist) 
as_dist
#> Standardized asinh(x) Transformation with 50 nonmissing obs.:
#>  Relevant statistics:
#>  - mean (before standardization) = 4.230843 
#>  - sd (before standardization) = 0.7710887
as_dist$x.t %>% hist()

Paper Interpretation
Azoulay, Fons-Rosen, and Zivin (2019) Elasticity
Faber and Gaubert (2019) Percentage
Hjort and Poulsen (2019) Percentage
M. S. Johnson (2020) Percentage
Beerli et al. (2021) Percentage
Norris, Pecenco, and Weaver (2021) Percentage
Berkouwer and Dean (2022) Percentage
Cabral, Cui, and Dworsky (2022) Elasticity
Carranza et al. (2022) Percentage
Mirenda, Mocetti, and Rizzica (2022) Percentage

Consider a simple regression model: Y = \beta X + \epsilon When both Y and X are transformed:

  • The coefficient estimate \beta represents elasticity: A 1% increase in X leads to a \beta% change in Y.

When only Y is transformed:

  • The coefficient estimate represents a percentage change in Y for a one-unit change in X.

This makes the arcsinh transformation particularly valuable for log-linear models where zero values exist.

12.1.9 Ordered Quantile Normalization (Rank-Based Transformation)

The Ordered Quantile Normalization (OQN) technique transforms data into a normal distribution using rank-based methods (Bartlett 1947).

Formula:

x_i' = \Phi^{-1} \left( \frac{\text{rank}(x_i) - 1/2}{\text{length}(x)} \right)

where \Phi^{-1} is the inverse normal cumulative distribution function.

When to Use:

  • When data are heavily skewed or contain extreme values.

  • When normality is required for parametric tests.

ord_dist <- bestNormalize::orderNorm(cars$dist)
ord_dist
#> orderNorm Transformation with 50 nonmissing obs and ties
#>  - 35 unique values 
#>  - Original quantiles:
#>   0%  25%  50%  75% 100% 
#>    2   26   36   56  120
ord_dist$x.t %>% hist()

12.1.10 Lambert W x F Transformation

The Lambert W transformation is a more advanced method that normalizes data by removing skewness and heavy tails.

When to Use:

  • When traditional transformations (e.g., log, Box-Cox) fail.

  • When dealing with heavy-tailed distributions.

data(cars)
head(cars$dist)
#> [1]  2 10  4 22 16 10
cars$dist %>% hist()


# Apply Lambert W transformation
l_dist <- LambertW::Gaussianize(cars$dist)
l_dist %>% hist()

12.1.11 Inverse Hyperbolic Sine Transformation

The Inverse Hyperbolic Sine (IHS) transformation is similar to the log transformation but handles zero and negative values (N. L. Johnson 1949).

Formula:

f(x,\theta) = \frac{\sinh^{-1} (\theta x)}{\theta} = \frac{\log(\theta x + (\theta^2 x^2 + 1)^{1/2})}{\theta}

When to Use:

  • When data contain zeros or negative values.

  • Alternative to log transformation in economic and financial modeling.

12.1.12 Box-Cox Transformation

The Box-Cox transformation is a power transformation designed to improve linearity and normality (Manly 1976; Bickel and Doksum 1981; Box and Cox 1981).

Formula:

x_i'^\lambda = \begin{cases} \frac{x_i^\lambda-1}{\lambda} & \text{if } \lambda \neq 0\\ \log(x_i) & \text{if } \lambda = 0 \end{cases}

When to Use:

  • To fix non-linearity in the error terms of regression models.

  • When data are strictly positive

library(MASS)
data(cars)
mod <- lm(cars$speed ~ cars$dist, data = cars)

# Check residuals
plot(mod)


# Find optimal lambda
bc <- boxcox(mod, lambda = seq(-3, 3))

best_lambda <- bc$x[which.max(bc$y)]

# Apply transformation
mod_lambda = lm(cars$speed ^ best_lambda ~ cars$dist, data = cars)
plot(mod_lambda)

For the two-parameter Box-Cox transformation, we use:

x_i' (\lambda_1, \lambda_2) = \begin{cases} \frac{(x_i + \lambda_2)^{\lambda_1}-1}{\lambda_1} & \text{if } \lambda_1 \neq 0 \\ \log(x_i + \lambda_2) & \text{if } \lambda_1 = 0 \end{cases}

# Two-parameter Box-Cox transformation
two_bc <- geoR::boxcoxfit(cars$speed)
two_bc
#> Fitted parameters:
#>    lambda      beta   sigmasq 
#>  1.028798 15.253008 31.935297 
#> 
#> Convergence code returned by optim: 0
plot(two_bc)

12.1.13 Yeo-Johnson Transformation

Similar to Box-Cox (when \lambda = 1), but allows for negative values.

Formula:

x_i'^\lambda = \begin{cases} \frac{(x_i+1)^\lambda -1}{\lambda} & \text{if } \lambda \neq0, x_i \ge 0 \\ \log(x_i + 1) & \text{if } \lambda = 0, x_i \ge 0 \\ \frac{-[(-x_i+1)^{2-\lambda}-1]}{2 - \lambda} & \text{if } \lambda \neq 2, x_i <0 \\ -\log(-x_i + 1) & \text{if } \lambda = 2, x_i <0 \end{cases}

data(cars)
yj_speed <- bestNormalize::yeojohnson(cars$speed)
yj_speed$x.t %>% hist()

12.1.14 RankGauss Transformation

A rank-based transformation that maps values to a normal distribution.

When to Use:

  • To handle skewed data while preserving rank order.

12.1.15 Automatically Choosing the Best Transformation

The bestNormalize package selects the best transformation for a given dataset.

bestdist <- bestNormalize::bestNormalize(cars$dist)
bestdist$x.t %>% hist()

References

Azoulay, Pierre, Christian Fons-Rosen, and Joshua S Graff Zivin. 2019. “Does Science Advance One Funeral at a Time?” American Economic Review 109 (8): 2889–2920.
Bartlett, Maurice S. 1947. “The Use of Transformations.” Biometrics 3 (1): 39–52.
Beerli, Andreas, Jan Ruffner, Michael Siegenthaler, and Giovanni Peri. 2021. “The Abolition of Immigration Restrictions and the Performance of Firms and Workers: Evidence from Switzerland.” American Economic Review 111 (3): 976–1012.
Berkouwer, Susanna B, and Joshua T Dean. 2022. “Credit, Attention, and Externalities in the Adoption of Energy Efficient Technologies by Low-Income Households.” American Economic Review 112 (10): 3291–3330.
Bickel, Peter J, and Kjell A Doksum. 1981. “An Analysis of Transformations Revisited.” Journal of the American Statistical Association 76 (374): 296–311.
Box, George EP, and David R Cox. 1981. An Analysis of Transformations Revisited, Rebutted. University of Wisconsin-Madison. Mathematics Research Center.
Cabral, Marika, Can Cui, and Michael Dworsky. 2022. “The Demand for Insurance and Rationale for a Mandate: Evidence from Workers’ Compensation Insurance.” American Economic Review 112 (5): 1621–68.
Carranza, Eliana, Robert Garlick, Kate Orkin, and Neil Rankin. 2022. “Job Search and Hiring with Limited Information about Workseekers’ Skills.” American Economic Review 112 (11): 3547–83.
Chen, Jiafeng, and Jonathan Roth. 2023. “Logs with Zeros? Some Problems and Solutions.” The Quarterly Journal of Economics, qjad054.
Ekwaru, John Paul, and Paul J Veugelers. 2018. “The Overlooked Importance of Constants Added in Log Transformation of Independent Variables with Zero Values: A Proposed Approach for Determining an Optimal Constant.” Statistics in Biopharmaceutical Research 10 (1): 26–29.
Faber, Benjamin, and Cecile Gaubert. 2019. “Tourism and Economic Development: Evidence from Mexico’s Coastline.” American Economic Review 109 (6): 2245–93.
Hjort, Jonas, and Jonas Poulsen. 2019. “The Arrival of Fast Internet and Employment in Africa.” American Economic Review 109 (3): 1032–79.
Johnson, Matthew S. 2020. “Regulation by Shaming: Deterrence Effects of Publicizing Violations of Workplace Safety and Health Laws.” American Economic Review 110 (6): 1866–1904.
Johnson, N. L. 1949. “Systems of Frequency Curves Generated by Methods of Translation.” Biometrika 36 (1/2): 149. https://doi.org/10.2307/2332539.
Manly, Bryan FJ. 1976. “Exponential Data Transformations.” Journal of the Royal Statistical Society Series D: The Statistician 25 (1): 37–42.
Mirenda, Litterio, Sauro Mocetti, and Lucia Rizzica. 2022. “The Economic Effects of Mafia: Firm Level Evidence.” American Economic Review 112 (8): 2748–73.
Norris, Samuel, Matthew Pecenco, and Jeffrey Weaver. 2021. “The Effects of Parental and Sibling Incarceration: Evidence from Ohio.” American Economic Review 111 (9): 2926–63.