Suppose we have data \(x_1,\dots,x_n\) that we believe come from a Normal distribution with mean 0 and variance \(\sigma^2\) where \(\sigma^2\) is unknown. The maximum likelihood estimator for the variance is simply \[ \hat{\sigma}^2 = \frac{1}{n}\sum_{i=1}^n x_i^2 \] which is also unbiased becase the mean is known. That is \[ E\left[\hat{\sigma}^2\right] = \frac{1}{n}\sum_{i=1}^n E\left[x_i^2\right] = \frac{1}{n}\sum_{i=1}^n\sigma^2 = \sigma^2 \] Typically, to estimate the standard deviation, which is \(\sigma\), we just use \(\hat{\sigma}\), the square root of the estimate of \(\sigma^2\).
But is it true that \(E[\hat{\sigma}] = \sigma\)? No it’s not.
If we let \(f(x) = \sqrt{x}\), then we are trying to calculate \(E\left[f\left(\hat{\sigma}^2\right)\right]\). Using a first order Taylor’s series expantion of \(f(x)\), we can say that \[ f\left(\hat{\sigma}^2\right) \approx f(\sigma^2) + f^\prime(\sigma^2)(\hat{\sigma}^2-\sigma^2) + f^{\prime\prime}(\sigma^2)(\hat{\sigma}^2-\sigma^2)^2 \] The expected value of the middle term is 0 so we have the third term to deal with. \[\begin{eqnarray*} E\left[f\left(\hat{\sigma}^2\right)\right] & \approx & f(\sigma^2) + 0 + f^{\prime\prime}(\sigma^2)E\left[(\hat{\sigma}^2-\sigma^2)^2\right]\\ & = & f(\sigma^2) + f^{\prime\prime}(\sigma^2)\frac{2\sigma^4}{n} \end{eqnarray*}\]The bias of the estimator is \(E\left[f(\hat{\sigma}^2)\right] - f(\sigma^2)\) we can see that is \[ Bias = \frac{2\sigma^4}{n}f^{\prime\prime}(\sigma^2) \] Two things:
The bias will eventually go to zero as \(n\rightarrow\infty\) but if \(n\) is small, then there could be a reasonable bias.
The bias is proportional to the second derivative of \(f\). Because the square root is a concave function, the second derivative is negative, which means that the bias is always downward.
Also note that I used a Taylor series approximation to \(f\) which is technically only reasonable when \(n\) is large, but it nevertheless provides some insight into the problem.