# Chapter 17 Demand Estimation with aggregate market shares

## 17.1 Micro versus Market level data

One of the advantages of having microdata is that you can estimate demand directly from individual consumer choices.

But often times, we do not get to observe individual transaction data.

Instead, we observe market level data where we know the market share for a certain product and product characteristics.

## 17.2 Another problem: price endogeneity?

Note: in deriving all these examples, an implicit assumption is that the distribution of the $$\epsilon_{ij}$$’s are independent of the prices. This is analogous to assuming that prices are exogenous.

Case study: Trajtenberg (1989) study of demand for CAT scanners. Disturbing finding: coefficient on price is positive, implying that people prefer more expensive machines!

Possible explanation: quality differentials across products not adequately controlled for. In differentiated product markets, where each product is valued on the basis of its characteristics, brands with highly-desired characteristics (higher quality) may command higher prices. If any of these characteristics are not observed, and hence not controlled for, we can have endogeneity problems. $$E[p\epsilon] \ne 0$$

## 17.3 Estimation with aggregate market shares

Next we consider how to estimate demand functions in the presence of price endogeneity, and when the researcher only has access to aggregate market shares.

This summarizes findings from Berry (1994).

## 17.4 Data

Our data for a particular market looks like this

j $$\widehat{s}_{mj}$$ $$p_j$$ $$X_{1j}$$ $$X_{2j}$$
A 25% $1.50 red large B 35%$2.00 blue small
C 45% 2.50 green large Total Market size = $$M$$ Total number of brands = $$J$$ We want to use these data to estimate the demand for different products using differences in market share and characteristics across different brands and different markets. ## 17.5 Model Let a consumer’s utility function be $U_{ijm}=X_{jm}\beta-\alpha p_{jm}+\xi_{jm}+\epsilon_{ijm}$ where i indexes consumer, j indexes product, and m indexes market. Let $$\delta_{jm}=X_{jm}\beta-\alpha p_{jm}+\xi_{jm}$$ Econometrician observes neither $$\xi_{jm}$$ or $$\epsilon_{ij}$$, but household i observes both. You can think of $$\xi_{jm}$$ as a product specific unobserved quality shock. It is easy to see that high quality products (high $$\xi$$) would imply consumers are willing to pay more for the good so price would be higher as well. $$E[\xi p \neq 0]$$ The other error is an idiosyncratic shock, which we will assume is distributed Type I Extreme Value across all consumers, brands, and markets. ### 17.5.1 Probabilities Given our assumption of the idiosyncratic error, we can write the choice probabilities as a conditional logit probability. $Pr(y_{ijm}=j)=\frac{exp(\delta_{jm})}{1+\sum_{k=1}^J exp(\delta_{km})}$ The 1 in the denominator is a normalize good. In practice, not everyone buys a particular type of good, like cereal. So we create a fictitious good. Let’s call it “No Cereal”. “No Cereal” is free and everyone who does not buy an actual cereal product is buying “No Cereal”. The reason this value becomes 1 is because we are forcing all of the parameters to equal zero for this product. If this is the case, then exp(0) = 1. ### Predicted market shares We can’t estimate this probability directly, but we do observe the actual market shares. So we can transform this probability into a predicted market share That is, if we knew the values of the $$\delta$$’s, then we could construct the actual market shares. ### 17.5.2 Predicted market share $\widetilde{s}_{jm}=Pr(y_{ijm}=j)=\frac{exp(\delta_{jm})}{1+\sum_{k=1}^J exp(\delta_{km})}$ Remember, we need to normalize one product such that $$\delta_{0m}=0$$. We call this the “outside good”. $\widetilde{s}_{0m}=Pr(y_{ijm}=0)=\frac{1}{1+\sum_k exp(\delta_{km})}$ ### 17.5.3 Transform shares We can then use these predicted shares to make a linear equation by taking logs $log(\widetilde{s}_{jm})-log(\widetilde{s}_{0m})=\delta_{jm}=X_{jm}\beta-\alpha p_{jm}+\xi_{jm}$ Construct our new dependent variable Since we actually observe the market shares of each product in each market, we can construct our actual values of $$\delta_{jm}$$. Let $$s_{jm}$$ be the actual market share of product j in market m. Then the actual mean utility is $$\widehat{\delta}_{jm}=log({s}_{jm})-log({s}_{0m})$$ ### 17.5.4 Objective Function We calculate this for every market. Our objective function becomes \begin{align*}E[\xi Z]&=\frac{1}{MM}\frac{1}{J}\sum_{m=1}^M \sum_{j=1}^J \xi_{jm}Z_{jm} \\ \\ &=\frac{1}{MM}\frac{1}{J}\sum_{m=1}^M \sum_{j=1}^J [\widehat{\delta}_{jm}-X_{jm}\beta-\alpha p_{jm}]Z_{jm}\end{align*} Where Z is a set of instrumental variables and MM is the total number of markets. Basically, we reduce the whole problem down to an instrumentals variable problem. ### 17.5.5 What are appropriate instruments • Cost shifters (prices of raw materials or workers; industry dependent) • Characteristics of competitors’ products • If panel, the price of the same good, but in other markets ## 17.6 Berry 1994 in R: Data Berry (1994) uses the above method and estimates the market for cars. We observe the model name, model id, the manufacture, the market location, the log of price, miles per gallon, miles per dollar, horse power per weight, air conditioning (these data are old, ac was an option), size of car, and the market share of the car. We will estimate the shared regression using OLS and using the BLP method. I want you to notice what happens to the sensitivity to price. library(AER) blpdata<-hdm::BLPBLP #BLP Data
blpZ<-hdm::BLP$Z #Instrumental Variables blpZ<-as.data.frame(blpZ) blp.var.names=c("model name","model id","firm id","cdid", "log price","miles per gallon","miles per dollar", "horse power per weight","air conditioning", "size of the car","market share","outside option share", "log share j - log share 0","time trend") ols.1<-lm(y~price+hpwt+space+mpg+air+mpd+factor(model.id), data=blpdata) iv.1<-ivreg(y~price+hpwt+space+mpg+air+mpd+factor(model.id)|hpwt+space+mpg +air+mpd+factor(model.id)+blpZ$sum.other.1+blpZ$sum.other.hpwt +blpZ$sum.other.air+blpZ$sum.other.mpd +blpZ$sum.other.space+blpZ$sum.rival.1+blpZ$sum.rival.hpwt
+blpZ$sum.rival.air+blpZ$sum.rival.mpd+blpZ\$sum.rival.space, data=blpdata)

 OLS IV price -0.085c (0.009) -0.197c (0.021) Horse power per wgt 0.065 (0.246) 0.754c (0.285) size of car 0.945c (0.197) 0.952c (0.207) miles per gallon 0.016 (0.063) 0.004 (0.067) air conditioning 0.165b (0.067) 0.457c (0.087) miles per dollar -0.018 (0.040) 0.018 (0.043) Constant -1.030b (0.419) -1.734c (0.457) Observations 2,217 2,217 R2 0.874 0.861 F Statistic 20.446c (df = 562; 1654) Notes: ***Significant at the 1 percent level. **Significant at the 5 percent level. *Significant at the 10 percent level.

You will notice that the coefficient on price is now more than twice the magnitude of the OLS result. This is very important.

In economics, we have a concept called elasticity. Elasticity tells us how sensitive consumers are to price changes. The larger the elasticity is the more sensitive consumers are to price changes.

Firms always want to price on the elastic part of the demand curve. Let’s think about why. If the original price is on the inelastic part of the demand curve, then a 1 percent increase in price leads to less than 1 percent decrease in quantity. Although we are selling fewer cars at the higher price, total revenue is actual higher at the higher price. Therefore, we should continue to push the price higher until the percent change in quantity is greater than the percent change in price. When the percent change in quantity is greater that means an additional increase in price leads to less revenue. It also means we are on the elastic part of the demand curve.