Chapter 6 All about Volatility

6.1 The many flavours of volatility

6.1.1 Some facets of volatility

There are many ways to look at volatility.

The intuitive meaning of the word is that volatility measures the level of fluctuations for a particular price.

The way we measure it, the unit we use, the time-scale at which we are looking all have an impact and should be specified in order to transform a single volatility number into a solid understanding of how much level of fluctuation there is in that particular stock.

By taking a more academic approach based on statistics, one can argue that the value of the stock in one year is uncertain and assign a probability distribution to it. It could be desirable to use the width or standard deviation of this distribution to link to the volatility of the stock. This point of view is exactly what has become the market standard.

It is clear that the statistical approach is focused on the one-year horizon, whereas a trader, who wants to delta-hedge on a daily basis, is not so interested in knowing the uncertainty accumulated over the year. What he is really interested in is to understand how the uncertainty plays a role on a much smaller scale, such that piled up over the year it leads the same distribution as the statistician has presented.

In mathematical terms, knowing the distribution at one time (or multiple times) is not enough to complete the dynamic picture. One needs to know how the distribution changes over time. Clearly, on a very short time-scale, the uncertainty is very small and the distribution function should be sharply peaked about the current level of the stock. As the time horizon increases, the density should widen up.

One can show that at any time t, the solution of the Black-Scholes SDE, describing a model for the movement of the stock, is a random variable S(t) that behaves according to a lognormal distribution. So at any time t, we have a density that depends on the original parameters in the equation, being the drift \(µ\) and the volatility parameter \(σ\).

Note that the volatility is not the standard deviation of this distribution, but it does control the wideness of the distribution. Clearly, if we used another model for the stock price, it would lead to another family of density functions, and to other formulas for the moments.

In a way, when people use the word volatility, they also agree on the underlying mathematical model!

A higher volatility means more uncertainty about the size of an asset’s fluctuations and, as such, it can be considered a measurement of uncertainty.

Volatility is dynamic and changes a great deal over time. It experiences high and low regimes, but it also has a long-term mean to which it reverts. Also, as a stock market witnesses a large decline, volatility tends to shoot up: we therefore generally see a negative correlation between such assets and their volatilities.

6.1.2 Realized Volatility

This is probably the most common volatility measure.

One could imagine selecting a stock and a certain time period from the past, and trying to estimate the \(σ\) parameter in the Black-Scholes model based on this data.

This requires knowledge of Ito’s formula, which allows us to transform the Black-Scholes equation into a more suitable format. The solution of this equation is following a lognormal distribution. So the logarithmic of the stock price follows a normal distribution.

By applying Ito’s lemma we can write down the dynamic this quantity follows:

\(\boxed{d log(S_t) = (\mu - \frac{\sigma^2}{2}) \; dt + \sigma \; dW_t}\)

This equation is saying that the change in logarithmic stock price is composed of two parts:

The drift term, \((\mu - \frac{\sigma^2}{2}) \; dt\), is proportional to the period of time over which we observe this change.

The volatility term, \(\sigma \; dW_t\), which is determined by noise (Brownian motion).

If one wants to estimate the \(σ\) parameter of the stock, one can use the lognormal returns over one day and calculate the standard deviation from them. This would give the volatility over one day, which is \(σ\sqrt{dt}\).

This follows from the property of the Brownian motion that tells us that the variance of \(W_t\) is given by the time t. That just leaves us with the normalizing effect to withdraw the value of \(σ\).

The first obvious question is how many days there are in a year. In your dataset, there are no quotes for weekends and holidays. So there is no volatility to observe there. Therefore a common approach is to use the number of trading days in a year.

To move from the standard deviation of daily log returns to an annualized volatility therefore requires you to multiply by \(\sqrt{252} = 16\). So a volatility of 16% equates to a standard deviation of daily moves of around 1%.

A fair challenge would be to ask if there is really no volatility during the weekend. In other words, is it fair to regard the return from Friday to Monday in the same manner as from Monday to Tuesday? Wouldn’t that imply that the world stops turning in the weekend? For that matter, filtering techniques are sometimes applied to estimate the volatility parameter.

Also, it makes a difference which frequency of data you use. Typically, the longer the time period, the more normal the returns tend to be.

6.1.3 Implied Volatility

This is a more market-related volatility concept. This takes everything to the next step, taking into account the options market.

By observing the price of the option, one can back out the \(\sigma\) parameter one has to push into the formula in order to find that price. The market has adjusted for the shortcomings of the Black-Scholes model and the market-implied distribution is not lognormal anymore.

However, the beauty of the Black-Scholes formula is that you can tune your \(\sigma\) parameter such that you match this market price of the option.

“Implied volatility is the wrong number you put in the wrong formula to get the right price.”
Riccardo Rebonato

Vanilla options are quoted in terms of their implied volatilities since this, or a given price, amounts to the same information. The implied volatilities are in fact the market’s consensus on the forward-looking volatility of the asset. This implied volatility incorporates the forward views on all market participants on the asset’s volatility.

Sometimes analysts try to use implied volatility to defer conclusions about market direction. It is very tempting to think that this information is present in the option market, but the motivation for buying an OTM put does not have to be because the buyer is expecting a decrease in the stock price.

From the trader’s point of view, knowing there is a huge interest in these options can bring about two thoughts:

First, he should increase his price. This is basic feature of supply and demand.

Second, if the scale of the orders becomes really big, he might become less comfortable with the risk and he might want a bigger premium for that.

This clearly means that the implied volatility does not imply anything for the future direction of the market. There are extensive studies on comparing the realized volatility to the implied volatility.

Implied volatility and realized volatility do not necessarily coincide, and although they may be close, they are typically not equal.

Using the correct implied volatility of an asset allows one to price other derivatives on the asset, in particular those that are not liquidly traded. Where the implied volatility of an asset cannot be implied from traded instruments, one may resort to using the realized volatility as a proxy for implied volatility to get an idea of what volatility would be correct to use. In contrast, the realized volatility of an asset can be used as a sanity check to ensure that the implied volatilities being used make sense.

The two are different, with implied volatility generally being higher than realized volatility, but too far a spread could imply a mistake, or if correct, an arbitrage opportunity.

6.1.4 Hedging Volatility

What volatility to use in the hedging procedure dictated by the Black-Scholes model?

It may seem a stupid question at first, because it has already been answered by the implied volatility. If you can price a particular option, meaning you have the implied volatility available, it is straightforward solution to use this value as the volatility to hedge with.

However, let assume you have a crystal ball and you know that the realized volatility is going to be 20%, but you can negotiate your client into paying an implied volatility of 25%. So you are selling the option more expensively that it is really worth.

So which of the two values do I plug into the dynamic hedging strategy to hedge this sold option?

As it turns out, it does not make that much of a difference anyway. This brings us to another strength of the Black-Scholes model. It turns out that the model is so robust that almost any value will do. Most traders choose to use the implied volatility for consistency reasons.

6.2 The Volatility Surface

6.2.1 Introduction

Vanilla options are quoted in volatility terms.

For this to work, both counterparties have first to agree on the values of the inputs to the Black-Scholes equation. The volatility one must plug into the Black-Scholes formula to get the true market price of a vanilla option is called the implied volatility.

In liquid markets, brokers will quote fairly tight two-way prices for vanilla options at several strikes and several maturities.

Plotting the associated implied volatilities in a three-dimensional plot results in what is called the volatility surface.

Fig: 6.1 : Volatility Surface Strike Dependence: Smile/Skew

In Black-Scholes, a single constant volatility is used for the stochastic process followed by the spot. It means that options with different strikes would have the same volatility. In other words, the skew is flat.

In reality, it is not at all the case since volatilities for strikes that are far ITM or OTM are typically higher than the ATM volatility.

In FX markets, the implied volatility curve is usually quite symmetrical around ATM strike. We speak about the smile.

In Equity and fixed-income markets, the implied volatility curve is often far from being symmetrical, but heavily skewed in one direction. We speak about the skew. To say there is a skew means that European options with low strikes have higher implied volatilities than those with higher strikes.

Markets determine vanilla prices, which in turn determine implied volatilities. Time Dependence: Term Structure

Implied volatility for a given contract also depends on its expiry date, T. We therefore also have an implied volatility term structure.

6.2.2 Trading the Term Structure of Implied Volatilities

For a given strike, implied volatilities vary depending on the maturity of the option. In most cases, the term structure is an increasing function of maturity. It is generally the case in calm periods where short-term volatilities are relatively low.

This curve could be decreasing if the market is volatile and short-term volatility is exceptionally high.

This term structure can also reflect the market’s expectations of an anticipated near term event in terms of the volatility that such an event would imply. The term structure also reflects the mean-reversion characteristic of volatility.

One can take a view on the term structure’s shape. A simple trade to provide this is the calendar spread, which is the difference of two call options of the same characteristics but different maturities.

6.2.3 Why a Smile/Skew?

We have seen that Black-Scholes is not the true process followed by the underlying. Therefore using Black-Scholes formula to back out the volatility given prices in the market will not give a constant number.

So what causes the volatility smile?

Supply and Demand of Vanilla Options

Every market has its own participants with their own behavior and risk profile. Therefore the volatility patterns are particular for each asset class.

FX Markets

We have already said that volatility smiles in FX markets are often fairly symmetrical, ex: EURUSD. This is quite intuitive because euro investors see the market as the inverse of the way dollar investors see it.

This explains the symmetry but not why do far OTM/ITM options have higher implied volatility than ATM options?

Well, investors usually want to protect themselves from adverse moves in the FX rate. So those people for whom a drop in the FX rate would be bad buy OTM put options (low strikes) to protect themselves. Meanwhile, those for whom an increase in the FX rate would hurt buy OTM call options (high strikes) to protect themselves. Since there is greater demand from buyers than sellers, the prices are a little higher than you would otherwise expect, and as price increases with volatility, this means the volatilities at low and high strikes are higher.

Equity Markets

A similar argument can be used in equity markets to show why the volatility smile is heavily skewed. In equity markets, investors typically need to protect against decreases rather than increases in the index.

The skew is often explained as this concept of insurance. It is a market where operators cover their downside risk. The market also tends to consider a large downward move in an asset to be more probable than a large upward move. A downward jump also increases the possibility of another such move (volatility clustering), again reflected by higher volatilities. Additionally, one can discuss the leverage effect: a leverage increase given by a decline in the firm’s stock price, with debt levels unchanged, generally results in higher levels of equity volatility.

It is therefore not surprising that this skew phenomenon emerged after the 1987 market crash.

Commodity Markets

Commodities often exhibit an inverse or positive skew because the risk sits on the upside. If commodities prices rise very sharply, this becomes a risk for industrials who process these commodities into their final products. They are often looking for protection on the upside, flipping the story around.

Fixed Income Markets

Interest rates have their own behavior depending on what instrument is being considered.

The true underlying dynamics are not Black-Scholes

It expresses the fact that market participants are well aware that the returns are not Gaussian.

Both suggestions are good ways of understanding why the volatility smile is as it is. Investors’ view of the market affects the way they trade in the spot as well as the vanilla hedges they put on. By no-arbitrage arguments, if it seems clear that volatilities should be higher at certain strikes for supply and demand reasons, then the true market dynamics must reflect this, and vice versa.

Think in terms of realised gamma P&L

The reason behind the skew becomes apparent when thinking in terms of realized gamma losses as a result of rebalancing the delta of the option in order to be delta hedged.

In downward spiraling market, gamma on lower strike increases, which combined with a higher realized volatility causes the option seller to rebalance his delta more frequently, resulting in higher losses for the option seller.

Option sellers want to get compensated for this and charge the option buyers a higher implied volatility for these options.

One thing that can certainly be said is that inverted smiles are rare.

Since the payout of a put option increases as strike increases it must be true that the value of a put option increases with strike, and the opposite is true for calls. This puts a constraint on the shape of no-arbitrage smile curve. Misconceptions about the Skew

When markets go down, they tend to become more volatile. While this statement is true, it does not explain the skew as this realized volatility is the same regardless of any strike price!

The existence of skew is actually saying that this increase of volatility has a bigger impact on lower strike options than on higher strike options.

As stated earlier, it is very tempting to use the implied volatilities to predict the direction of the market. However, this is extremely difficult, if not impossible (see this nice article from Elie Ayache). It brings us back to the very basics of the Black-Scholes model. The price is set by the anticipated cost of hedging. If anything, this cost is related to the magnitude of moves or volatility rather than the direction of the market.

6.2.4 Measuring and Trading the Implied Skew

The first thing in measuring the skew is to note its level, which is given by the ATM implied volatility. The word skew is also used to refer to the slope of the implied volatility skew. Equity markets have a negative skew since his slope is negative.

Assuming we had the set of implied volatilities as a function of strike \(σ(K)\), then the slope is given by the first derivative, at a specific point, possibly the ATM point.

In reality, we only have implied volatilities for a discrete set of strikes. One can use some form of interpolation to obtain the function \(σ(K)\) in order to have a parametric form, but in practice, and to have a standard method of measuring skew, we take the difference between the implied volatilities of the 90% and 100% (or 110%) strike vanillas.

If we compare the implied volatility skews of an index and that of a stock we find that index volatilities are more skewed than those of a single stock. The reason for this is that if stocks are all dropping during a market decline, the realized correlation between them rises, and an equity index is a weighted average of different stocks.

This is a useful property as one can use the skew of an index as a proxy for pricing skew-dependent payoffs on stocks whose implied skews are not as liquid as those of the index. Knowing that the index’s implied volatilities are more skewed than those of the single stock, it is possible to take a percentage of the index skew and use this in the pricing. What percentage to use is primarily a function of whether the structure in question sets the seller short skew or long skew, and from there it is a function of how aggressive/conservative the trader wants to be on the skew position.

6.2.5 Skew through Time

The skew for any specific stock is steeper for short-term maturities than for long-term maturities. A long maturity could have a skew at a level higher than the short-term maturity, but the short-term skew will be more pronounced.

To understand this, remember that skew is mainly there because traders are afraid to lose money on downside strikes in case the market goes down and becomes more volatile.

Obviously, the larger the gamma the more imminent the problem! Since short-term downside options have larger gammas when the stock price moves down to the lower strikes, the effect of skew is largest for those options.

Another reason is that for short-term maturities the trader exactly knows whether an option is a downside option or not. For long-term options, trader cannot qualify whether it is a downside strike, as the trader does not know where the stock will be trading in X year time.

A jump in the underlying’s price in the immediate future would have a large impact on the price of the put; for the short term this is more severe as the market may not have time to recover.

6.2.6 Effect of Skew on Delta Hedging

The skew curve of a particular stock can have a big impact on a trader’s delta hedge against any option positions he has.

Let us take a simple example and assume a trader is long a 1-year Call 120% on a stock. The skew curve for the 1-year maturity indicates that every 10% decrease in strike translates into a 1.5% increase in implied volatility. In other words, if ATM volatility is 20%, then the 120% volatility is 17%.

Suppose instead that the trader decides to mark this 120% implied volatility at 21%, implying a smile.

Because of that, his delta is larger than it should be as he assigns too high a probability of the option expiring in-the-money. Therefore the trader sells more shares than he actually should to hedge the upside call. To make matters worse, he will continue to sell too many shares with the spot increasing. With the spot increasing, the stock is likely to realize even less, making the probability and therefore the proper delta of this option even lower.

The way a trader marks his implied volatility surface has a large impact on his delta hedge. By marking upside strikes on a higher implied volatility than ATM implied volatility, the trader sells too many shares, and especially when the spot increases, the trader not only loses money on extra shares he sold, but he also continues to sell too many shares.

It is these dynamics that force a trader to mark his implied volatility surface per maturity with a skew that has a downward sloping shape rather than a smile.

Let us speak a bit more about delta-hedging with the smile.

Since volatility is an important determinant of hedge ratios, incorrect volatility assumptions may lead to incorrect deltas. If volatility is time-varying and correlated with the price changes of the underlying asset, the delta must control not only for the direct impact of the underlying price change on the option price, but also for the indirect simultaneous change in volatility.

Intuitively, an inverse relationship between volatility changes and stock returns would suggest that the BS delta is too large.

\(\frac{dC}{dS} = \frac{\delta C}{\delta S} + \frac{\delta C}{\delta v} \frac{\delta v}{\delta S}\)


  • \(\frac{\delta C}{\delta S}\) is BS delta assuming constant volatility.
  • \(\frac{\delta C}{\delta v}\) is BS vega.
  • \(\frac{\delta v}{\delta S}\) describes how the IV of the option moves as the spot price moves.

The second part of the equation is sometimes called ‘shadow delta’.

I usually speak about the whole delta as a smile-adjusted delta.

Since the vega of a long position in option is always positive, the above equation shows that in the case of a negative correlation between stock returns and volatility changes (usual downward sloping equity skew), the delta should be smaller than the BS delta.

Note that the last term of this equation is quite difficult to quantify but could be approximated by the slope of the volatility smile: \(\frac{\delta v}{\delta K}\).

By approximating \(\frac{\delta v}{\delta S}\) by \(\frac{\delta v}{\delta K}\), it is assumed that as S changes by one unit, there is a parallel shift of \(\frac{\delta v}{\delta K}\) units in the volatility smile.

So, part of the smile-adjusted delta is from vega. If one uses a lognormal model for pricing, then one needs to properly incorporate the delta due to vega in one’s delta hedging, otherwise one is not truly delta neutral.

Similarly, one needs to properly handle one’s volatility hedging and recognize that part of the vega is hedged by the delta.

If you are interested in reading further about this subject, I invite you to read the original article from Sami Vahamaa, ‘Delta hedging with the smile’.

6.2.7 The Smile Curvature

The curvature is the last parameter that is used to mark an IV surface.

For very high strikes, the implied volatility does not decrease any longer but flattens out.

At the same time, very low strikes have an even higher implied volatility than the skew parameter indicates. This is exactly what a trader would want. As very low strikes have very little premium, sellers want to get properly compensated, which means that the skew parameter alone will not be enough and the curvature parameter will ensure these low strikes are marked on a larger implied volatility. Measuring and Trading the Implied Skew’s Convexity/Curvature

To quantify the skew convexity, one can consider the sum of the 90% and the 110% implied volatilities minus twice the 100% strike volatility and dividing by the difference in strikes squared. This makes sense as an approximation of the second derivative of a function f at the point x is indeed given by \(\frac{f(x+h)+f(x-h) - 2f(x)}{h^2}\).

In fact the combination of vanillas with the above strikes is known as a butterfly spread. If we go long a butterfly spread, we are long a 90% and a 110% strike call option, meaning that we are long the implied volatilities at these two strikes. If the implied skew becomes more convex, it means that these two implied volatilities have increased, making the butterfly spread more valuable.

The implied volatilities of a single stock generally have more curvature than those of an index. The reason for this is that downward jumps have a larger impact on single stocks than they do on an index, and the risk of a single stock crashing completely is greater than that of a whole index doing so. So, although a stock may have less negatively skewed implied volatilities than an index, the former’s implied volatilities are more convex in strike than those of the index. Smile curvature through time

Smile curvature tends to decrease as maturities increase. Smile curvature decreases as an inverse of time. This property is observed in all equity markets.

This is the expression that the risk on short-term volatility is greater than over the long-term. To match this observation, variance at all times must be equal. This can only be achieved with a high mean reversion process and high volatility of volatility. Put vs Call curvature

Put curvature is higher than call curvature. This is certainly due to the fact that puts are used as a protection against default events.

6.2.8 Arbitrage Freedom of the Implied Volatility Surface

In practice we can only observe European option implied volatilities, of a fixed maturity, at a finite set of strikes: \(K_1, K_2, … , K_m\).

It is also the case that we can only obtain these skews for a finite set of maturities: \(T_1, T_2, ... ,T_n\).

For an implied volatility surface to be arbitrage free, some criteria must be met:

  1. For all maturities, all call spreads must be positive

\(\boxed{C(K_j, T_i)-C(K_{j+1}, T_i) \geq 0}\)

  1. An additional restriction on such spreads is that if we were to divide by the difference in strikes, we must have:

\(\boxed{\frac{C(K_j, T_i)-C(K_{j+1}, T_i)}{K_{j+1}-K_j} \leq 1}\)

  1. All Calendar spreads must be positive

\(\boxed{C(K_j, T_{i+1})-C(K_j, T_i) \geq 0}\)

  1. All butterfly spreads must be positive

\(\boxed{C(K_{j-1}, T_i) - \frac{K_{j+1} - K_{j-1}}{K_{j+1} - K_{j}} C(K_j, T_i) + \frac{K_j - K_{j-1}}{K_{j+1} - K_{j}} C(K_{j+1}, T_i) \geq 0}\)

The set of European options will be arbitrage free if all these conditions are met.

We should concern ourselves that any model we use to capture skew observes these conditions. The failure of a model’s calibration to meet these conditions is a solid criterion to reject such calibration. Any interpolation between the implied volatilities of two consecutive strikes in the above set must also observe these conditions to be arbitrage free.

6.2.9 Smile implied probability distribution

Without going to much into details here, we can actually make some progress without any knowledge of the true dynamic causing the implied volatility smile.

Suppose we are interested in a particular expiry T because we have a European contract only depending on the spot at T, and not on the path the spot takes between now and T.

Assuming this contract has a payout function \(A(S_T)\) so that to price it, we want to calculate: \(\boxed{E[A(S_T)] = \int_0^{\inf}A(S) \ p_T(S) \ dS}\) where \(p_T\) is the probability density function for the spot level at time T under the risk-neutral measure.

Skipping the demonstration here, it happens that when we know the implied volatility smile at a given expiry, we can deduce the risk-neutral probability density function. Then we can easily go back to the above equation and use this density function to price any European contract.

Using vanilla prices this way to determine the probability density is known as the Breeden and Litzenberger approach.

6.2.10 Implied Volatility Dynamics

There is a natural order of market data speed:

  • Spot levels change faster than ATM volatility.
  • ATM volatility changes faster than volatility skew.
  • Volatilities are more volatile than dividend forecasts.

As wee have seen when speaking about the effect of skew on delta hedging, hedging performance can be improved by assuming a link between different market parameters.

For example, when calculating a price with a new spot, or computing the delta using a spot shift, one may assume that this move is accompanied by a volatility move in the opposite direction or a change in expected dividends in the same direction.

Thus, a delta hedge also hedges part of vega if stock and volatility are correlated. If delta and vega are hedged separately one has to be careful not to double count vega exposure!

This section discusses deterministic smile dynamics that assume that the implied volatility surface depends on spot only. Thus there is a function, which denotes the implied volatility surface observed at time t if spot level is \(S_t\).

There are two important special cases: sticky strike and sticky delta. Sticky Strike

Sticky strike implies that the volatility associated with a given strike does not change when the spot moves.

The dynamics of vanilla options are thus described by the Black-Scholes model, which is the only complete model with sticky strike dynamics.

The ATM volatility for equities around the ATM strike behaves as a sticky strike movement. Sticky Delta

Sticky delta implies that the volatility associated with an option with a given delta does not change when the spot changes.

In this case, the dynamics of vanilla options are depending on moneyness and term.

The only complete models with sticky delta dynamics are those assuming independent returns.

For currencies, behavior is sticky delta.

Volatility reacts slowly on spot moves. In quiet markets, volatility is quoted by strike and is updated much less frequently than spot. The dynamics resemble sticky strike.

When markets are volatile then implied volatilities will be updated more frequently and dynamics may resemble sticky delta.

Realistic models should exhibit stochastic implied volatility dynamics, in the sense that the smile dynamics may allow for sticky strike and sticky delta dynamics, as well as random changes between the two. To some extent, local stochastic volatility models capture this behavior.

6.2.11 Stylized Facts and Modelisation

Mean Reversion

Implied volatility tends to mean revert around an average level. Model: Ornstein-Uhlenbeck.

Smile slope decreases as \(≈\frac{1}{\sqrt{T}}\)

Volatility slope behaves as the ATM implied volatility. Model: stochastic volatility with two factors in order to control separately ATM volatility and the skew.

Smile curvature decreases as \(≈\frac{1}{T^α}\)

Volatility curve behaves as the ATM volatility with a different speed of mean reversion. Model: a two factor stochastic volatility enables this type of control.

Put curvature is higher than call curvature

There is dissymmetry between calls and puts. Model: Jump model allows the generation of put prices more expensive than call prices.

Smile dynamics

When the spot vibrates, volatility ATM approximately follows the smile (sticky strike). Model: a mixture of model between local volatility and stochastic volatility allows this type of behavior.

6.3 Review of Volatility Models

6.3.1 Small Historical Review

The pioneer

The long story of option pricing began in 1900 when Louis Bachelier developed the earliest known analytical valuation for standard options in his PhD thesis dissertation, ‘The Theory of Speculation’. Finding that stock price changes looked like a random walk process, Bachelier made the quite revolutionary assumption that stock prices follow an arithmetic brownian motion.

Bachelier discovered the immensity of a world in which randomness exists. After his thesis, he proposed a theory of ‘related probabilities’. A theory about what would, 30 years later, be called Markov processes. Bachelier’s work was the starting point of a major study by Kolmogorov in 1931.

While Bachelier was on the right track, his formula had clear drawbacks:

  • It did not take into account any discounting.
  • It allowed for negative stock prices.
  • It allowed for option prices superior to the prices of the underlying securities.

However, due to the precociousness of his work, it took more than 60 years of research to propose any alternative option pricing models.

Before Black-Scholes

It is Sprenkle in 1961 who first extended the work of Bachelier by switching to a geometric brownian motion (GBM) process for the stock price process. This adaptation did not receive much attention despite ruling out negative prices by assuming the log normality of returns. The reasons often put forth are the considerable number of parameters to estimate and the lack of information about how to do so.

Three years later, Boness improved Sprenkle’s model by considering the time value of money. In 1965, Samuelson quickly made the consideration that an option may have a different level of risk than the underlying stock, concluding that the use of the expected rate of return as a discount rate made by Boness was wrong.

The Black-Scholes model

In 1973, Black and Scholes developed the first completely equilibrium option pricing model, which was going to become the greatest breakthrough in the pricing of stock options. A consequence that proved more influential was the realization that by holding stock and risk-less debt, the option position could be hedged completely in a dynamic nature. The Black-Scholes model gave a serious impulse to the worldwide trading of options because it provided a widely suitable option pricing method.

Due to its success, more focus has been put upon the Black-Scholes model and its underlying assumptions.

Even though earlier empirical research had already started rejecting this simple hypothesis, the Black-Scholes model relies on the assumption that stock returns have a log-Normal distribution.

Indeed, Mandelbrot (1963) and Fama (1965) found that stock returns exhibit excess kurtosis, suggesting that returns have a fat-tailed distribution. Mandelbrot also documented what is commonly known as the ’volatility clustering‘: “… large changes tend to be followed by large changes and small changes tend to be followed by small changes …”. This stylized fact clearly violates the independence of returns assumed in the Black-Scholes model.

Furthermore, Fama (1965) and Black (1976) noticed that large downward movements are generally more frequent than their upward counterparts. Statistically, this means that the stock return distribution is negatively skewed. Black further observed the existence of a negative correlation between stock prices and volatility, known as the ’leverage effect‘.

Additional studies such as Blattberg and Gonedes (1974), MacBeth and Merville (1979) have also excluded the GBM hypothesis by showing that stock returns are heteroskedastic. In other words, the variance of aggregate stock returns changes over time.

The Black-Scholes formula has been even more questioned after the Black Monday at Wall Street in 1987 since the probability of such an extreme event under the normal distribution is extremely low (less than \(1.4∗10^{−107}\)). With investors fearing a reappearance as a result of this market crash, they began putting more value on deep OTM put options. Subsequently, those options were traded at a relatively higher price than ATM puts, ATM calls and OTM calls. Their volatilities were therefore higher resulting in a ’volatility smile‘ that contradicts the Black-Scholes model under which the implied volatility surface is flat.

Jump diffusion models

Merton (1976) and Cox and Ross (1976) were the first to allow the stock to jump ‘up’ or ‘down’, engendering a discontinuity in the stock price process. Using adequate parameters, Merton’s model was able to generate a lots of volatility smiles and skews. Particularly, choosing a negative mean for the jump process can readily capture short-term skews. Simultaneously, the model retains the undesirable independence property. Numerous studies on jump diffusion models have been undertaken since that time.

Local volatility model


Stochastic volatility models

The heteroskedasticity in stock returns makes it very tempting to express the volatility as a stochastic process.

Based on a body of work on stochastic volatility models ( Scott (1987), Hull and White (1987), Stein and Stein (1991)), Heston (1993) advanced the first stochastic volatility model with a generalized solution. His model permits the capturing of essential features of stock markets, namely the leverage effect, the volatility clustering and the tail behavior of stock returns. However, it cannot yield realistic implied volatilities for short maturities.

Stochastic volatility with jumps in stock price process

Bates (1996) and Scott (1997) have associated a jump diffusion model with stochastic volatility.

By benefiting from the advantages of both the jumps in the stock price process and the stochastic volatility, those models seem more capable to match the market facts.

Stochastic volatility with jumps in stock price and volatility processes

Although models nesting both stochastic volatility and jumps have shown some success, Bates (2000) and Pan (2002) indicate that they are still incapable of fully capturing the empirical features of stock index options prices. Actually, the significant volatility smile of index option prices cannot be described solely based on the degree of volatility of volatility.

Several researchers have proposed to incorporate further jumps in the volatility process to amend inaccurate descriptions of significant volatility smile. Duffie, Pan and Singleton (2000) models volatility as an affine process that can jump up violently and can justify brutal and lasting market changes with upward movements in volatility. Nonetheless, their model cannot subsequently jump down as observed in the data.

Research that followed then tried to mimic volatility spikes.

I will only name Professor Zerilli as she was my teacher back in 2013 and it is the only one I remember to be honest :). In 2005, Zerilli proposed Normal jumps in an innovative log-variance process that follows an Ornstein-Uhlenbeck process. Her results revealed that mimicking volatility spikes improved the option pricing model considerably.

In the following sections, we will discuss the main models: Black-Scholes, Dupire’s Local Volatility, Heston et SABR.

We can always speak about more models but choosing means eliminating!

6.3.2 Derivation of Black-Scholes PDE

We have already spoken about the Black-Scholes model but we will further derive the Black-Scholes equation in this section.

In the Black-Scholes world, the evolution of the stock price S is given by:
\(\boxed{\frac{dS_t}{S_t} = \mu \; dt + \sigma \; dW_t}\) for \(\mu, \sigma > 0\).


  • stock does not pay any dividends.
  • no transaction costs.
  • continuously compounding IR is r > 0 (constant).

The later assumption implies the evolution of the risk-free asset \(B_t\) is given by: \(\boxed{\frac{dB_t}{B_t} = r \; dt}\)

It means that the risk-free bank account grows at the continuously compounding rate \(r\) and hence: \(\boxed{B_t = e^{rt}}\) –> \(\frac{dB_t}{B_t} = r \; dt\).

We are interested in pricing an option which is a function of the stock price at time T > 0, \(S_T\), a call option for example.

  • The form of the payoff is not so important.
  • The fact that it is a function of the stock price at time T, and only time T, is important.

Under this condition, we can show that call price is a function of current time t and current stock price \(S_t\) only.

To price a derivative in B-S world, we must do so under a measure which does not allow arbitrage –> Risk-neutral measure. One can show that under this measure, the drift term of the stock price changes so that: \(\boxed{\frac{dS_t}{S_t} = r \; dt + \sigma \; dW_t}\).

In risk-neutral world, \(\frac{C(t,S_t)}{B_t}\) is a martingale and hence if we calculate its differential, we know it must have zero drift. A simple explanation of its meaning is that we expect it to have zero growth. Our option price is expected to grow at same rate as bank account and growth of each cancels out in the given process. This is what it means to be a martingale. We do not expect change over time so we have zero expected growth. This translates to the discounted price having a zero drift term.

Applying Ito’s lemma to \(C(t,S_t)\) gives: \(\boxed{dC(t,S_t) = \frac{\partial C}{\partial t} \; dt + \frac{\partial C}{\partial S_t} dS_t + \frac{1}{2} \frac{\partial^2 C}{\partial S_t^2} (dS_t)^2}\).

Under the risk-neutral dynamics of \(S_t\) and recalling that:

  • \((dW_t)^2 = dt\)
  • \(dW_tdt = dt^2 = 0\)

\(\boxed{dC(t,S_t) = \bigg(\frac{\partial C}{\partial t} + \frac{\partial C}{\partial S_t} rS_t + \frac{1}{2} \frac{\partial^2 C}{\partial S_t^2} \sigma^2S_t^2 \bigg) \; dt + \sigma S_t \frac{\partial C}{\partial S_t} \; dW_t}\)

Using the Ito product rule:

\(\boxed{d\bigg(\frac{C(t,S_t)}{B_t}\bigg) = \frac{1}{B_t} \bigg(\frac{\partial C}{\partial t} + \frac{\partial C}{\partial S_t} rS_t + \frac{1}{2} \frac{\partial^2 C}{\partial S_t^2} \sigma^2S_t^2 - rC \bigg) \; dt + \sigma \frac{S_t}{B_t} \frac{\partial C}{\partial S_t} \; dW_t}\)

Since \(\frac{C(t,S_t)}{B_t}\) is a martingale and hence must have zero drift: \(\boxed{\frac{\partial C}{\partial t} + \frac{\partial C}{\partial S_t} rS_t + \frac{1}{2} \frac{\partial^2 C}{\partial S_t^2} \sigma^2S_t^2 - rC = 0}\)

This is the Black-Scholes equation. It is a partial differential equation (PDE) describing the evolution of the option price as a function of the current stock price and the current time.

The equation does not change if we vary the payoff function of the derivative. However, the associated boundary conditions, which are required to solve the equation do vary!

The above implies that two stocks with the same volatility but different drifts will have the same option prices.

The pricing of any derivative must be done in the risk-neutral measure in order to avoir arbitrage. Under this measure, we have seen that the drift changed and was independent of the drift of the stock.

Financially, this reflects the fact that the hedging strategy ensures that the underlying drift of the stock is balanced against the drift of the option. The drifts are balanced since drift reflect the risk premium demanded by investors to account for uncertainty and that uncertainty has been hedged away.

6.3.3 Dupire’s Local Volatility Model

I like the way Colin Bennett provides intuition about Loval volatility in the appendix of his book ‘Trading Volatility, Correlation, Term Structure and Skew’. I am sharing some of it with you here below.

Exotic equity derivatives usually require a more sophisticated model than the Black-Scholes model. The most popular alternative model is a local volatility model (LocVol), which is the only complete consistent volatility model.

Complete: it allows hedging based only on the underlying.

Consistent: it does not contain a contradiction.

LocVol models try to stay close to the Black-Scholes model by introducing more flexibility into the volatility. Some intuitions

LocVol models offer a way of capturing the implied skew without introducing additional sources of randomness; the only source of which is the underlying asset’s price that is modeled as a random variable. In LocVol models, the volatility is a deterministic function of the asset’s level. In the Black-Scholes model, the asset’s price is modeled as a log-normal random variable, which means that the asset’s log-returns are normally distributed. However, the fact that we have a skew is the market telling us that the asset’s log-returns have an implied distribution that is not Normal.

LocVol is still a one-factor model and it also allows for risk-neutral dynamics, which means that, like Black-Scholes, the model is still preference free from the financial point. The LocVol model is the simplest one to account for skew and offers a consistent structure for pricing options.

How does Local Volatility work?

Well, as we said, the presence of skew is the market telling us that the asset’s log-returns are not normally distributed. In fact, the market is implying some distribution.

If we are given a set of vanilla options prices for a fixed maturity across strikes, can we find a distribution that corresponds to these prices? In other words, can we find a distribution for the asset price so that if we used such distribution to price vanilla options on this asset, it would give the same options prices as the ones seen in the market?

YES, theoretically, there is a way to find the distribution (LocVol model) which corresponds exactly to all vanilla prices taken from the skew. In fact, LocVol extends beyond skew and can also capture term structure. It can therefore theoretically supply us with a model that gives the exact same prices for vanillas taken from a whole implied volatility surface.

Local Volatility is instantaneous volatility of underlying

Instantaneous volatility is the volatility of an underlying at any given local point, which we shall call the local volatility. We shall assume the local volatility is fixed and has a normal negative skew. There are many paths from spot to strike and, depending on which path is taken, they will determine how volatile the underlying is during the life of the option.

Black-Scholes volatility is average of local volatilities

It is possible to calculate the LocVol surface from the Black-Scholes implied volatility surface.

This is possible as the Black-Scholes implied volatility of an option is the average of all the paths between spot and the maturity and strike of the option.

Fig: 6.2 : Black-Scholes volatility as an average of local volatilities

A reasonable approximation is the average of all local volatilities on a direct straight-line path between spot and strike. For a normal relatively flat skew, this is simply the average of two values: the ATM LocVol and the strike LocVol.

ATM volatility is the same for both Black-Scholes and Local Volatility

For ATM implied volatilities, the LocVol at the strike is equal to ATM implied volatility. Hence the average of two identical numbers is simply equal to the ATM implied volatility. For this reason, Black-Scholes implied is equal to LocVol ATM implied.

Black-Scholes skew is half of LocVol skew as it is the average

If the LocVol surface has a 22% implied at the 90% strike and 20% implied at the ATM strike, then the Black-Scholes implied volatility for the 90% strike is 21%.

As ATM implied volatilities are identical for both local volatility and Black-Scholes implied volatility, this means that the 90%-100% skew is 2% for LocVol but 1% for Black-Scholes. LocVol skew is therefore twice the Black-Scholes skew. Deeper look into the model

It is often used to calculate exotic option implied volatilities to ensure the prices for these exotics are consistent with the values of observed vanilla options and hence prevent arbitrage.

In the LocVol model, the only stochastic behavior introduced into the volatility function is a result of it being a function of the underlying asset price (if \(r_t\) and \(q_t\) are deterministic). So there is still just one source of stochasticity, ensuring the completeness of the Black-Scholes model is preserved. Completeness is important, because it guarantees unique prices. This is the stated reason to develop the local volatility model in Dupire’s original paper.

From Implied to Local volatilities

Since we can look in the vanilla market and find prices or equivalently implied volatilities for vanilla options at any strike and expiry, can we find the LocVol function \(\sigma_{local}(S_t,t)\) so that if the spot follows \(\boxed{\frac{dS}{S} = \mu \ dt + \sigma_{local}(S_t,t) \ dW}\), then the faire values of vanilla options exactly match the market?

YES, and that gives us a powerful tool to price exotic options in a model that is consistent with the vanilla market. Finding this function \(\sigma_{local}(S_t,t)\) is a process known as calibration. The inputs for these models are not only the current level of the asset, the curve of riskless interest rates, the size and timing of known dividends to come, but also the implied volatility skew (possibly a whole surface). Given the set of implied volatilities of vanilla options, calibration is the process where we search for these volatilities \(\sigma_{local}(S_t,t)\) so that the model matches these prices.

So knowing the market prices of vanilla options, the LocVol function can be derived using the following formula: \(\boxed{\sigma(K,T) = \sqrt{2 \frac{\frac{\partial C}{\partial T} + (r - q)K \frac{\partial C}{\partial K} + qC}{K^2 \frac{\partial^2 C}{\partial K^2}}}}\)

The construction of the LocVol from the implied volatilities constitutes a difficult numerical problem in practice. You can have a look at Gatheral’s book for more details.

Starting from a finite set of listed option prices, a good interpolation in strike and maturities provides a continuum of option prices and we can apply the stripping formula to get the local volatilities.

Once the local volatilities are obtained, one can price exotic instruments with this calibrated local volatility model. Properly accounting for the market skew can have a massive impact on the price of exotics –> example: call up-and-out.

We can complete the LocVol formulation by deriving the PDE to use for pricing. The derivation is identical to that of the Black-Scholes equation and the result is as follows: \(\boxed{\frac{\partial C}{\partial T} = \frac{\sigma^2(K,T)}{2} \; K^2 \frac{\partial^2 C}{\partial K^2} - (r - q)K \frac{\partial C}{\partial K} - qC}\)

The resulting LocVol surface is fully non-parametric. The LocVol model allows a full fitting of an arbitrage-free implied volatility surface.

From local volatilities to implied volatilities

Given the LocVol model, the computation of the implied volatilities is only approximate. There exists several methods of approximation, one of which is the most likely path. It gives intuition as to how the implied volatility is built from local volatility.

In the Monte Carlo simulation approach, we simulate many paths and keep only the ones that finishes around the strike. We obtain a stream of trajectories that start at the initial spot and finish around the strike. We average on each date all these paths and obtain the most likely path. We can also extract the variance around this path. We obtain the implied volatility estimation from it (thanks to the most likely path and the width around it).

This is well explained in Adil Reghai’s book ‘Quantitative Finance: Back to Basic principles’. Strengths and Weaknesses

  • Forward skew is smaller than it should be.
  • Volatility of volatility is small.
  • Numerical problems in implementation. There are computational difficulties in finding the LocVol function that will exactly fit all market prices, which is why Dupire’s formula, though theoretically correct, has some practical drawbacks. In particular, fitting all points may lead to unrealistic model dynamics. In practice, there may be more than one LocVol model that fits a set of vanillas, so one must lay down a set of criteria to follow when choosing the model to use. The surface is two dimensional, one in time and one in strike, and the focus on one or both must be determined in order to correctly capture the effect of the volatility surface on certain payoffs.

Despite all of this, 90% of investment banks’ production systems were still using LocVol model in day-to-day risk management.

It is important to note that traders adjust their prices and their greeks if ever the LocVol model is not the adequate pricing model (for Cliquet options or options on variance for example).


LocVol models offer a way of capturing the implied skew without introducing additional sources of randomness; the only source of which is the underlying asset’s price that is modeled as a random variable.

LocVol is still a one-factor model and it also allows for risk-neutral dynamics, which means that, like Black-Scholes, the model is still preference free from the financial point. The LocVol model is the simplest one to account for skew and offers a consistent structure for pricing options.

The strength of the LocVol model lies in its signature to the product. Only the vega KT map can give precise sensitivities for all vanilla options (K,T). The LocVol model allows a more precise projection of the global vega. It provides a powerful means to find the right strikes and maturities whereby to project the total vega. It is also a P&L explanation for the varied moves in the implied volatility surface. This tool is used every day dozens of times to explain the impact on the book movements in the vol surface. Which products can/cannot be priced using LocVol model?

LocVol models can be used to price options that have skew dependency yet cannot be broken down into vanillas, assuming the LocVol model is correctly calibrated to the skew (or surface) in a manner consistent with the skew sensitivity of the option.

LocVol models cannot be used to price payoffs that exhibit vega convexities that are not captured in the skew since this model has a too small volatility of volatility and therefore will tend to underestimate those payoffs.

Also, LocVol models cannot be used to price a derivative that has exposure to forward skew as the later is underestimated under this model. Although LocVol models can capture the market’s consensus on the prices of vanilla options by matching the volatility surface, the evolution of future volatility implied by these models is not realistic. Forward skews generated by LocVol models flatten out as we go forward in time, even though, in reality, forward skews have no reason to do so. The LocVol model therefore does not provide the correct dynamics for products with sensitivities such as these and will result in wrong price and wrong subsequent hedge ratios.

6.3.4 Stochastic Volatility Model : Heston Model

The Heston Model commonly stands out among the stochastic volatility models for several reasons:

  • It provides a closed-form solution for European Call options.
  • It allows the stock price to follow a non log-Normal probability distribution.
  • It expresses the volatility as a mean-reverting process.
  • It fits pretty well the implied volatility surface of option prices observed in the market.
  • It takes into account a possible correlation between the stock price and its volatility.

The model

\(\begin{cases} dS_t = \mu S_t \ dt + \sqrt{v_t} \ S_t \ dZ_S \\[10pt] dv_t = \kappa (\theta_v - v_t) \ dt + \sigma_v \ \sqrt{v_t} \ dZ_v \\[10pt] \langle dZ_S, dZ_v \rangle = \rho_{S,v} \ dt \end{cases}\)

\(S(0) = S_0\) where \(S_0\) is the spot stock price.
\(V(0) = V_0\) where \(V_0\) is the spot variance.

Model’s parameters

Stock price process: S

The first stochastic differential equation (SDE) expresses the fact that the stock price follows a stochastic process with a constant rate of return \(\mu\).

\(\boxed{dS_t = \mu S_t \ dt + \sqrt{v_t} \ S_t \ dZ_S}\)

The presence of the square root in the diffusion coefficient ensures the non-negativity of the variance.

It is \(v(t) \geqslant 0\) for all time with probability one.

Variance process: \(v\)

It can easily be observed from historical data that volatility changes over time. While volatility is undoubtedly varying over time, it seems to fluctuate around some long-term mean level. Consequently, the variance is represented by a square root mean-reverting process, similar to the one used by Cox, Ingersoll and Ross (1985) for modeling the term structure of interest rates.

\(\boxed{dv_t = \kappa (\theta_v - v_t) \ dt + \sigma_v \ \sqrt{v_t} \ dZ_v}\)

The drift term of the SDE of the variance process indicates a mean reversion when \(\kappa > 0\), with \(\theta_v\) being the long-term mean level of the variance. Basically, whenever \(v_t\) is greater (lesser) than \(\theta_v\), the drift term will push the process value down (up).

\(\kappa\) is the speed of the mean reversion of the variance process and can be thought as the degree of ‘volatility clustering’.

\(\sigma_v\) is the volatility of variance and influences the kurtosis of the distribution. The larger \(\sigma_v\), the greater the kurtosis, the fatter the tails of the distribution.

Effects of the different parameters on the implied volatility


The effect of the long-term mean reversion level is intuitive since an increase in the long-term mean reversion level of the variance corresponds to an increase of the variance. Consequently, an increase of \(\theta_v\) will be associated with an upward translation of the volatility smile.


Since the kurtosis of the distribution has an effect on the implied volatility, the volatility of variance \(\sigma_v\) indirectly affects the implied volatility. The Heston model has the pleasing attribute to mimic the volatility smile observed in the market. The larger \(\sigma_v\), the more pronounced the smile. This is rather intuitive since it increases the probability of extreme movements and thus increases the price of OTM Calls and Puts. Note that the term structure of implied volatility is flat when the volatility is constant (as in Black-Scholes), it is when the volatility of variance is set equal to zero.


Intuitively, the speed of reversion, \(\kappa\), governs the relative weights of the long-term mean level of the variance, \(\theta_v\), and the initial variance, \(V_0\). It is logical the impact of \(\kappa\) to be dependent on both the initial variance and the long-term mean reversion level. We consider the case in which \(V_0\) and \(\theta_v\) are equivalent and provide a natural interpretation.

First of all, the impact of the mean reversion speed of the variance on option prices seems to be rather limited.

Then, the effect of \(\kappa\) appears to be different when options are deeply ITM or deeply OTM. This is very intuitive and is closely related to the fact that increasing the speed of the mean reversion of the variance decreases the probability of extreme movements. As a result, when an option is deeply OTM, increasing \(\kappa\) decreases the probability that the option will finish ITM at maturity, therefore decreasing its price. By the same way, when an option is deeply ITM, increasing \(\kappa\) increases the likelihood of the option finishing ITM, therefore increasing its price.

The Feller Condition

While the variance is never negative, it can actually reach zero except if the Feller condition is satisfied: \(\boxed{2 \kappa \theta_v > \sigma_v^2}\)

Intuitively, \(\sigma_v\) cannot be too large and \(\kappa \theta_v\) cannot be too small. In practice, this condition is regularly not satisfied so that the likelihood of a zero variance is somewhat important.

Correlation between stock price and volatility: \(\rho\)

\(\boxed{\langle dZ_S, dZ_v \rangle = \rho_{S,v} \ dt}\)

Empirical studies have documented that the stock price and volatility processes are negatively correlated (Black (1976), Christie (1982), Engle and Ng (1993)). Decreasing stock prices inflate the leverage firms have. It is commonly thought that this gives rise to more uncertainty and hence volatility. Those authors also showed that both the value and the sign of the correlation have an impact on the return distribution, more specifically on its skewness. A negative correlation makes the left tail fatter than the right tail. Such negative skewness is a typical feature of the distribution of returns.

Since the skewness of the distribution has an effect on the implied volatility, the correlation \(/rho\) indirectly affects the implied volatility. It is distinctly observed that the correlation indeed affects the shape of the implied volatility curve. In particular, a negative correlation induces a declining sloping curve. This coincides with the previous finding that a negative correlation makes the left tail fatter relatively to the right tail. Indeed, this logically contributes to a higher price for deep OTM Puts. By the Put-Call parity, it is readily found that Calls also inflate in implied volatility. Which products can/cannot be priced using Stochastic Volatility model?

In stochastic volatility models, both the asset price and its volatility are assumed to be random processes. In allowing the volatility to be random, stochastic volatility models give rise to implied volatility skews and term structures. SV models can explain in a self-consistent manner the actual features we see in the empirical data from the market. Once such a model is specified, the skews generated by the model are a function of its parameters, and finding the parameters that fit a certain skew (or surface) is again the act of calibration.

Stochastic volatility models go beyond skew and term structure allowing for vega convexity and forward skew. Any derivative that is sensitive to vega convexity or/and forward skew should be priced using a stochastic volatility model.

A derivative exhibits vega convexity when its sensitivity to volatility is non-linear, meaning that there is a non-zero second-order price sensitivity to a change in volatility. Vanilla options are convex in the underlying’s price, but are they also convex in volatility? Well, ATM vanillas are not, but OTM vanillas do have vega convexity. However, these options are liquidly traded and their prices are obtained by using their implied volatilities in Black-Scholes. These implied volatilities give the market’s consensus of the right price; therefore the cost of vega convexity of OTM vanillas is already included in the skew.

In more complex payoffs, almost all the payoffs will exhibit some form of vega convexity, although in many cases this is captured in the skew and can be correctly priced by getting the skew right (with a LocVol model). Other payoffs exhibit such convexities that are not captured in the skew and a stochastic volatility model must be used. Since volatility is taken to be random, it must have its own volatility, known as the volatility of volatility. This parameter corresponds to the vega convexity term. Note that the second-order sensitivity to volatility is known as volga.

The second feature of stochastic volatility models is that they can generate forward skews. If a derivative has exposure to forward skew, one must use a model that knows about forward skews in order to get a correct price. The dynamics of stochastic volatility models are more consistent (than LocVol model) with the dynamics observed in the market. By smile dynamics, we refer to the phenomena of how the skew moves as the underlying moves: if the underlying moves in one direction, how should the skew move?

Stochastic volatility models have also their weaknesses as they have difficulty fitting both ends of the surface, that is fitting the skew for both short and long maturities at the same time. One remedy for this is to add jumps to a stochastic volatility model. Jumps are able to explain the short-term skew quite well, and we recall that the reason for the existence of the steep short-term skew has to do with jumps. Adding jumps to such a model does not generally affect the long-term skews which remain relatively flatter; the long-term implied skew is not driven by jumps in the underlying.

6.3.5 Stochastic Volatility Model : SABR Model

The SABR model was introduced by Hagan and al in 2002. SABR stands for ’Stochastic Alpha Beta Rho‘, which are the main variables of SABR equations (\(\alpha, \beta, \rho\)). It describes a single forward F, such as LIBOR forward rate, a forward swap rate, or a forward stock price. The SABR model is widely used by practitioners in the financial industry, especially in the interest rate derivative markets.

It is easier to write down the model in terms of the forward to a fixed expiry rather than in terms of a spot level.

In terms of F, the Black-Scholes process \(\boxed{\frac{dS}{S} = \mu \ dt + \sigma \ dW}\) becomes \(\boxed{\frac{dF}{F} = \sigma \ dW}\) as a simple application of Ito shows.

SABR is practically the simplest extension of Black-Scholes to a stochastic volatility model. It has SDEs:

\(\begin{cases} dF = \sigma F^{\beta} \ dW_1 \\[10pt] d\sigma = v \ \sigma \ dW_2 \\[10pt] \langle dW_1, dW_2 \rangle = \rho \ dt \\[10pt] \sigma(0) = \alpha \end{cases}\) Small comparison with Heston

Like the Heston model, the SABR model has a vol-of-vol parameter \(v\) that controls the convexity of the implied volatility smile, and a correlation parameter \(\rho\) that controls the skew.

Unlike the Heston model, there is no mean-reversion. This is a drawback as it means that when the instantaneous volatility follows a path in which it becomes very large, it is likely to stay large. Beta parameters

\(\beta\) is one of the key parameters and affect many fundamental characteristics of the model. It is the component that determines the shape of forward rates, leverage effect and backbone of ATM vol. It effects the distribution:

\(\beta = 1\): stochastic log-normal rates.
\(\beta = 0\): stochastic normal rates.
\(\beta = 0.5\): stochastic CIR model (with 0 drift).

Ordinary, say in equity and FX markets, we choose \(\beta = 1\) so that the process is approximately log-normal in the limit of small vol-of-vol. 

On the other hand, interest rates traders are fond of SABR and do make use of the \(\beta\) parameter (usually 0.5).

If we use \(\beta \ne 1\), we could rewrite the process as: \(\boxed{\frac{dF}{F} = \sigma F^{\beta – 1}dW}\) so that the instantaneous ‘log-normal volatility’ is \(\sigma F^{\beta – 1}\). In this way, we add some dependence between the forward level to the volatility, and therefore \(\beta\) also impacts the skew.

We can to some extent play off \(\beta\) and \(\rho\) against one another.

However, \(\beta\) will also impact the convexity of the implied volatility smile. To see this, you could apply Ito’s lemma to calculate the process followed by the effective volatility \(\sigma F^{\beta – 1}\) and see an additional contribution to the vol-of-vol. 

Caution is required in understanding the process when \(\beta \ne 1\). In the special case of \(v\) = 0, for which the volatility is no longer stochastic, the process is known as constant elasticity of variance (CEV). This process is well studied and the following results are provided by Andersen and Andreasen in 1998:

For \(\beta \geq 0.5\) \(\rightarrow\) the SDE has a unique solution.
For \(0 \leq \beta \leq 1\) \(\rightarrow\) the process can reach \(F_t = 0\) but never go negative.
For \(\beta \geq 1\) \(\rightarrow\) the process can never reach \(F_t = 0\).
For \(\beta = 0\) \(\rightarrow\) the process is normal, and therefore \(F_t\) can go negative.
For \(0 \leq \beta \leq 0.5\) \(\rightarrow\) the SDE only has a unique solution if one adds a boundary condition at \(F_t = 0\). For the process to be arbitrage free, the boundary condition must be that when \(F_t\) hits zero it stays there.

SABR is popular because Hagan and al. were able to provide an approximate solution that is valid in the limit of small time to expiry T. \(\lambda\)-SABR

For equity purpose, the log-normal SABR is perfectly good (version with \(\beta = 1\)). Log-normal SABR has a volatility process that cannot hit zero and therefore it does not suffer from the numerical difficulties associated with Heston. Unlike the \(\beta < 1\) version of SABR, log-normal SABR also has a spot that stays positive. Therefore the only drawback with the process itself is the lack of mean reversion, allowing paths in which instantaneous volatilities become very large.

This is not such a problem as we can simply add in a mean reversion term to obtain the \(\lambda\)-SABR model:

\(\begin{cases} dF = \sigma F^{\beta} \ dW_1 \\[10pt] d\sigma = -\lambda (\sigma - \overline{\sigma}) \ dt + v \ \sigma \ dW_2 \\[10pt] \langle dW_1, dW_2 \rangle = \rho \ dt \\[10pt] \sigma(0) = \alpha \end{cases}\)

By including mean reversion, we lose the analytical approximation for the implied volatility smile in terms of our stochastic volatility parameters.

We have seen that the SABR or \(\lambda\)-SABR model itself is attractive.

It is the asymptotic implied volatility formula that causes problems. The approximation is only valid for short expiries, and furthermore, it can imply negative probability densities at strikes that are far away from ATM. These are problems for people who want to use the SABR formula as a method to define or interpolate the implied volatility smile. If you are only interested with smile modeling, these issues will not cause you any problems.