Chapter 9 Partial Differentiation

So far, in the course, we have considered functions of single independent variables (ordinary functions): \[f: \quad \mathbb{R} \rightarrow \mathbb{R},\] or in the case of systems of ODEs: \[f: \quad \mathbb{R} \rightarrow \mathbb{R}^n.\] In the third part of the module, we now consider functions of several variables or Multivariable or Multivariate functions. \[f: \quad \mathbb{R}^n \rightarrow \mathbb{R}.\] For every \(n\)-tuple of \(\{x_i\}_{i=1}^n\), where \(x_i \in \mathbb{R}\), there exists an image in \(\mathbb{R}\). \[f(x_1, \cdots, x_n) \in \mathbb{R}\] An important example is functions of two variables: \[f(x, y) = f(\vec{x}); \quad \vec{x} = \begin{bmatrix} x \\ y \end{bmatrix} \in \mathbb{R}^2\]

9.1 Representation

As seen in Figure 9.1), there are the following two representations for the functions of two variables.

  1. 3D representation where \(f(x,y)\) is the height.

  2. Level curves \(\vec{x}_C = (x, y)_C\), where \(f(\vec{x}_C) = C\). For each \(C\) there will be a set of points that fulfill this condition. This kind of representation is also known as a contour plot.

The 3D and contour plot of function $f(x, y) = x^2 + 2y^2$ drawn using [Wolfram Alpha](wolframalpha.com)

Figure 9.1: The 3D and contour plot of function \(f(x, y) = x^2 + 2y^2\) drawn using Wolfram Alpha

9.2 Limit and continuity

The general notions from calculus can be naturally extended to multivariate functions.

Limit of the function \(f(\vec{x})\) as \(\vec{x}\rightarrow \vec{x}^*\) exists and is equal \(C\): \[\lim\limits_{\vec{x}\rightarrow \vec{x}^*} f(\vec{x}) = C,\] if we have \(\forall \epsilon>0, \quad \exists \delta >0\) so that \[ 0< |\vec{x} - \vec{x^*}| < \delta \quad \Rightarrow \quad |f(\vec{x}) - C| < \epsilon.\]

Function \(f\) is continuous at \(\vec{x}^*\) if: \[ \lim\limits_{\vec{x}\rightarrow \vec{x}^*} f(\vec{x}) = f(\vec{x}^*) \] For example \(f(x, y) = xy\) is continuous at all points in \(\mathbb{R}^2\). But, the following function is not continuous at \(\vec{x} = \begin{bmatrix} 0 \\ 0 \end{bmatrix}\). \[g(x, y) = \left\{ \begin{array}{rl} xy, & \text{if }\, x, y \ne 0,\\ 1, & \text{if }\, x=y = 0. \end{array} \right. \]

9.3 Partial and Total Differentiation

Different derivatives are defined for functions of several variables. First we introduce partial differentiation, which is differentiation with respect to one of the variables while the other ones are held constant: \[ \frac{\partial f}{\partial x_i} =\lim\limits_{h\rightarrow 0} \dfrac{f(x_1, \cdots, x_i + h, \cdots, x_n) - f(x_1, \cdots, x_n)}{h} \] This is read as partial derivative of \(f\) with respect to \(x_i\), it is also sometimes denoted alternatively as \(f_{x_i}\), \(f'_{x_i}\), \(\partial_{x_i} f\) and \(D_{x_i}\).

Higher order partial derivatives can also be defined. For example, consider a function of two variable \(f(x, y)\), we denote the first partial derivatives as: \[ g_1(x, y) = \left(\frac{\partial f}{\partial x}\right)_y ; \quad g_2(x, y) = \left( \frac{\partial f}{\partial y}\right)_x \] The subscript of \(y\) and \(x\) in each of these partial deriveatives, highlights which variable is held constant. We have the following second order partial derivatives for \(f(x, y)\): \[ \left(\frac{\partial g_1}{\partial x}\right)_y = \frac{\partial^2 f}{\partial x^2} ; \quad \left( \frac{\partial g_1}{\partial y}\right)_x = \frac{\partial}{\partial y} \left(\frac{\partial f}{\partial x}\right)\] \[ \left(\frac{\partial g_2}{\partial y}\right)_x = \frac{\partial^2 f}{\partial y^2} ; \quad \left( \frac{\partial g_2}{\partial x}\right)_y = \frac{\partial}{\partial x} \left(\frac{\partial f}{\partial y}\right)\]

Symmetry of mixed derivatives or equality of mixed derivatives: If the second partial derivatives are continuous, the order of differentiation is not important and we therefore have: \[\frac{\partial}{\partial y} \left(\frac{\partial f}{\partial x}\right) = \frac{\partial}{\partial x} \left(\frac{\partial f}{\partial y}\right) \quad \Rightarrow \quad \frac{\partial^2 f}{\partial y\partial x} = \frac{\partial^2 f}{\partial x \partial y}.\] This result is known as Schwarz’s theorem, Clairaut’s theorem, or Young’s theorem.

Operationally, calculations are simple. Partial derivatives are obtained by keeping the other variables constant, using the laws of differentiation for functions of single variables.
Example 9.1 (Obtain all the first and second partial derivatives of the following function.) \[u(x, y) = x^2 \sin y + y^3.\]

We have for the first partial derivatives: \[\left(\frac{\partial u}{\partial x}\right)_y = 2x \sin{y},\] \[\left(\frac{\partial u}{\partial y}\right)_x = x^2 \cos{y} + 3y^2.\] And for the second partial derivatives, we have: \[\frac{\partial^2 u}{\partial x^2} = 2 \sin{y},\] \[\frac{\partial^2 u}{\partial x \partial y} = 2x \cos{y},\] \[\frac{\partial^2 u}{\partial y^2} = -x^2 \sin{y} + 6y,\] \[\frac{\partial^2 u}{\partial y \partial x} = 2x \cos{y}.\] We observe that the symmetry of the mixed derivatives in this example holds.

9.4 Total differentiation of a function of several variables

Total derivative evaluates the infinitesimal change of \(f(\vec{x})\) when all the variables are allowed to change infinitesimally in contrast with partial derivatives that are about change in only one of the variables. We first consider the case of a function of two variables. We have

\[\begin{align*} \Delta f & = f(x + \Delta x, y + \Delta y) - f(x, y) \\ & = f(x + \Delta x, y + \Delta y) - f(x, y) + f(x + \Delta x, y) - f(x + \Delta x, y) \\ & = \left[\frac{f(x + \Delta x, y) - f(x, y)}{\Delta x}\right]\Delta x + \left[\frac{f(x + \Delta x, y+ \Delta y) - f(x + \delta x, y)}{\Delta y}\right]\Delta y. \end{align*}\]

The total derivative is obtained at the limit of \(\Delta x, \Delta y \rightarrow 0\): \[df = \lim\limits_{\Delta x, \Delta y \rightarrow 0} \Delta f = \left(\frac{\partial f}{\partial x}\right)_y dx + \left(\frac{\partial f}{\partial y}\right)_y dy.\]

For function of several variables \(f(x_1, \cdots, x_n)\), total differentiation is generalized. We have: \[df = \sum_{i=1}^n \left(\frac{\partial f}{\partial x_i}\right) dx_i.\]

9.5 Chain rule for functions of several variables

Given ordinary functions \(u(x)\) and \(x(t)\), chain rule for ordinary functions is recalled to be: \[\frac{du}{dt} = \frac{du}{dx}\frac{dx}{dt}.\] What is the equivalent for multivariable functions? Consider a function of two variables:

\[ u = u(x, y) = u(\vec{x}); \quad \vec{x} \in \mathbb{R}^2.\] Now, if we have \(x = x(t)\) and \(y = y(t)\) with \(t \in \mathbb{R}\). What is \(\dfrac{du}{dt}\)?

Combining total differentiation and the chain rule for ordinary functions, one can obtain:

\[\begin{align*} du & = \left(\frac{\partial u}{\partial x}\right)_y dx + \left(\frac{\partial u}{\partial y}\right)_y dy, \\ & = \left(\frac{\partial u}{\partial x}\right)_y \left(\frac{dx}{dt}\right)dt + \left(\frac{\partial u}{\partial y}\right)_y \left(\frac{dy}{dt}\right)dt. \end{align*}\]

So, we have: \[ \frac{du}{dt} = \left(\frac{\partial u}{\partial x}\right)_y \left(\frac{dx}{dt}\right) + \left(\frac{\partial u}{\partial y}\right)_y \left(\frac{dy}{dt}\right).\] This generalises to a function of \(n\) variables \(u(x_1, \cdots, x_n)\) with \(x_i = x_i(t)\) and \(t \in \mathbb{R}\):

\[\frac{du}{dt} = \sum_{i=1}^n \left(\frac{\partial u}{\partial x_i}\right) \frac{dx_i}{dt}.\]

Example 9.2 (Consider a cylinder that its radius and height are expanding with time) \[ r(t) = 2t; \quad \quad h(t) = 1 + t^2.\]

Evaluate the rate of change in volume \(\dfrac{dV}{dt}\).

We have \(V = \pi r^2 h\), therefore \[\dfrac{dV}{dt} = \left(\frac{\partial V}{\partial r}\right)_h \frac{dr}{dt} + \left(\frac{\partial V}{\partial h}\right)_r \frac{dh}{dt} = 2\pi r(2h + rt) = 8 \pi t + 16 \pi t^3.\]

Another example of chain rule when we have multiple dependencies. Consider \[ u = u(x, y); \quad \text{with} \quad y = y(t, x).\] To obtain \(\left(\frac{\partial u}{\partial x}\right)_t\), we combine total differentiation for \(u(x, y)\) and \(y(t, x)\). We get: \[ du = \left(\frac{\partial u}{\partial x}\right)_y dx + \left(\frac{\partial u}{\partial y}\right)_x dy,\] \[dy = \left(\frac{\partial y}{\partial x}\right)_t dx + \left(\frac{\partial y}{\partial t}\right)_x dt.\] Now, by plugging \(dy\) in the expression for \(du\) and rearranging we get:

\[du = \left[ \left(\frac{\partial u}{\partial x}\right)_y + \left(\frac{\partial u}{\partial y}\right)_x \left(\frac{\partial y}{\partial x}\right)_t \right] dx + \left(\frac{\partial u}{\partial y}\right)_x \left(\frac{\partial y}{\partial t}\right)_x dt.\] Now, thinking of the above expression as the total derivative of \(u(x, t)\), we obtain: \[\left(\frac{\partial u}{\partial x}\right)_t = \left(\frac{\partial u}{\partial x}\right)_y + \left(\frac{\partial u}{\partial y}\right)_x \left(\frac{\partial y}{\partial x}\right)_t,\] \[\left(\frac{\partial u}{\partial t}\right)_x = \left(\frac{\partial u}{\partial y}\right)_x \left(\frac{\partial y}{\partial t}\right)_x.\]

Dependencies on another set of coordinates

Let \[ h = h(x, y) ; \quad \text{with} \quad x = x(u, v) \quad \text{and} \quad y = y(u, v).\] Considering total derivative of \(h(x, y)\) and substituting total derivatives of \(x(u, v)\) and \(y(u, v)\), and rearranging, we obtain: \[dh = \left[\left(\frac{\partial h}{\partial x}\right)_y \left(\frac{\partial x}{\partial u}\right)_v+ \left(\frac{\partial h}{\partial y}\right)_x \left(\frac{\partial y}{\partial u}\right)_v \right] du + \left[\left(\frac{\partial h}{\partial x}\right)_y \left(\frac{\partial x}{\partial v}\right)_u+ \left(\frac{\partial h}{\partial y}\right)_x \left(\frac{\partial y}{\partial v}\right)_u \right] dv.\] Now thinking of this expression as total derivative of \(h(u, v)\), we have: \[ \left(\frac{\partial h}{\partial u}\right)_v = \left(\frac{\partial h}{\partial x}\right)_y \left(\frac{\partial x}{\partial u}\right)_v+ \left(\frac{\partial h}{\partial y}\right)_x \left(\frac{\partial y}{\partial u}\right)_v,\] \[ \left(\frac{\partial h}{\partial v}\right)_u = \left(\frac{\partial h}{\partial x}\right)_y \left(\frac{\partial x}{\partial v}\right)_u+ \left(\frac{\partial h}{\partial y}\right)_x \left(\frac{\partial y}{\partial v}\right)_u.\]

Note, that strictly speaking the transformed function \(h(x, y)\) should be denoted as \(h'(u, v)\) as the transformed function is a ‘different’ function of its variables. But, very commonly, the prime on \(h'(u, v)\) is not used. For example \(h(x ,y) = x^2 + y^2\) in polar coordinates is \(h'(r, '\theta) = r^2\), while common notation of \(h(r, \theta)\) could imply \(r^2 + \theta^2\), if one thinks of plugging \(r\) and \(\theta\) in the original function \(h(x, y)\). One should be aware of this notational ambiguity.

9.6 Implicit functions

First, a reminder about the explicit form for an ordinary function: \[ y = f(x); \quad x \in \mathbb{R}\] The implicit form for an ordinary function is \[F(x, y) = 0.\]

Trivially, if we have the explicit form we also have an implicit form: \[F(x, y) = y - f(x) = 0.\] For functions of two variables, we also have explicit form: \[z = z(x, y)\] And implicit form: \[F(x, y, z) = 0\]

Differentiation using the Implicit form

Taking total differential from the implicit form \(F(x, y, z) = 0\), we obtain: \[dF = \left( \dfrac{\partial F}{\partial x}\right)_{y, z} dx + \left(\dfrac{\partial F}{\partial y}\right)_{x, z} dy + \left(\dfrac{\partial F}{\partial z}\right)_{x, y} dz = 0.\] Taking total differential from the explicit form \(z = z(x, y)\), we obtain: \[dz = \left( \dfrac{\partial z}{\partial x}\right)_{y} dx + \left( \dfrac{\partial z}{\partial y}\right)_{x} dy. \] Now, solving for \(dz\) in the \(dF\) equation above, we thus have the following relationship between derivatives of the implicit and explicit form:

\[\left( \dfrac{\partial z}{\partial x}\right)_{y} = - \frac{\left(\dfrac{\partial F}{\partial x}\right)_{y, z}}{\left(\dfrac{\partial F}{\partial z}\right)_{x, y}},\] \[\left( \dfrac{\partial z}{\partial y}\right)_{x} = - \frac{\left(\dfrac{\partial F}{\partial y}\right)_{x, z}}{\left(\dfrac{\partial F}{\partial z}\right)_{x, y}}.\]
Example 9.3 (Obtain the partial derivatives of \(z\) using the explicit and implicit forms) \[ \text{Let} \quad z(x, y)= x^2 + y^2 - 5.\]

We have from the explicit form: \[dz = 2x dx + 2y dy.\] Using the implicit form \(F(x, y, z) = x^2 + y^2 - 5 - z\), we have: \[dF = 2x dx + 2y dy - dz = 0.\] Which results to the same total derivative for \(dz\), that we obtained above from the explicit form.

9.7 Taylor expansion of multivariate functions

Taylor expansion for functions of one variable (reminder): Let \(f(x): \mathbb{R} \rightarrow \mathbb{R}\) and consider \(x_0 \in \mathbb{R}\). We saw in the first term of the module that: \[f(x_0 + \Delta x) = f(x_0) + \left( \dfrac{df}{dx} \right)_{x_0} \!\!\!\!\! \Delta x + \frac{1}{2} \left( \dfrac{d^2f}{dx^2} \right)_{x_0} \!\!\!\!\!(\Delta x)^2 + \frac{1}{3!} \left( \dfrac{d^3f}{dx^3} \right)_{x_0} \!\!\!\!\!(\Delta x)^3 + \cdots. \] Now, let us consider \(f(\vec{x})\), \(\vec{x} \in \mathbb{R}^2\), we assume suitable conditions of differentiability. We can use Taylor expansion for ordinary functions first on the \(x\) direction and then \(y\) to obtain the Taylor expansion for \(f(x, y)\). Up to to 3rd order we have: \[\begin{align*} &f(\vec{x_0} + \Delta \vec{x}) = f(x_0+\Delta x, y_0 + \Delta y) \\ &= f(x_0, y_0 + \Delta y)+ \left( \dfrac{\partial f}{\partial x} \right)_{x_0, y_0+\Delta y} \Delta x + \frac{1}{2} \left( \dfrac{\partial^2f}{\partial x^2} \right)_{x_0, y_0+\Delta y} (\Delta x)^2 + \frac{1}{3!} \left( \frac{\partial^3f}{\partial x^3} \right)_{x_0, y_0+\Delta y} (\Delta x)^3 + \cdots \\ &= f(x_0, y_0) + \left( \dfrac{\partial f}{\partial y} \right)_{\vec{x}_0} \Delta y + \frac{1}{2} \left( \dfrac{\partial^2f}{\partial y^2} \right)_{\vec{x}_0} (\Delta y)^2 + \frac{1}{3!} \left( \frac{\partial^3f}{\partial y^3} \right)_{\vec{x}_0} (\Delta y)^3 + \cdots \\ & + \Delta x \left[ \left( \dfrac{\partial f}{\partial x} \right)_{\vec{x}_0} + \left( \dfrac{\partial^2f}{\partial y \partial x} \right)_{\vec{x}_0} \Delta y + \frac{1}{2} \left( \dfrac{\partial^3f}{\partial y^2 \partial x} \right)_{\vec{x}_0} (\Delta y)^2 + \cdots\right] \\ & + \frac{1}{2} (\Delta x)^2 \left[ \left( \dfrac{\partial^2 f}{\partial x^2} \right)_{\vec{x}_0} + \left( \dfrac{\partial^3f}{\partial y \partial x^2} \right)_{\vec{x}_0} \Delta y + \cdots\right] + \frac{1}{3!} (\Delta x)^3 \left[ \left( \dfrac{\partial^3f}{\partial x^3} \right)_{\vec{x}_0} \, + \cdots \right] \\ & = f(\vec{x}_0) + \left[\left( \dfrac{\partial f}{\partial x} \right)_{\vec{x}_0} \Delta x+ \left( \dfrac{\partial f}{\partial y} \right)_{\vec{x}_0} \Delta y\right] + \\ & \frac{1}{2!}\left[\left( \dfrac{\partial^2f}{\partial x^2} \right)_{\vec{x}_0} (\Delta x)^2 + 2\left( \dfrac{\partial^2f}{\partial x\partial y} \right)_{\vec{x}_0} \Delta x\Delta y + \left( \dfrac{\partial^2f}{\partial y^2} \right)_{\vec{x}_0} (\Delta y)^2\right] \\ & + \frac{1}{3!}\left[\left( \dfrac{\partial^3f}{\partial x^3} \right)_{\vec{x}_0} (\Delta x)^3 + 3\left( \dfrac{\partial^3f}{\partial x^2\partial y} \right)_{\vec{x}_0} (\Delta x)^2\Delta y + 3\left( \dfrac{\partial^3f}{\partial x\partial y^2} \right)_{\vec{x}_0} \Delta x(\Delta y)^2+ \left( \dfrac{\partial^3f}{\partial y^3} \right)_{\vec{x}_0} (\Delta y)^3\right] \\ &+ \cdots. \end{align*}\]

We can write the Taylor expansion up to the second order in a vector-matrix form. We define Gradient of the function \(f\) evaluated at point \(\vec{x}_0\) as: \[\vec{\nabla} f_{\vec{x}_0} = \begin{bmatrix} \frac{\partial f}{\partial x} \\ \\ \frac{\partial f}{\partial y} \end{bmatrix}_{\vec{x}_0}.\] Hessian Matrix associated with the function \(f\) evaluated at the point \(\vec{x}_0\) is defined as: \[H_{ij}(\vec{x}_0) = \left( \dfrac{\partial^2 f}{\partial x_i \partial x_j}\right)_{\vec{x}_0}\]

We can write the Taylor expansion up to the second order in terms of the Gradient and Hessian: \[f(\vec{x}_0 + \Delta \vec{x}) = f(\vec{x_0}) + \vec{\nabla} f(\vec{x}_0)^T.\,\Delta \vec{x} + \frac{1}{2} \Delta \vec{x}^T H(\vec{x}_0) \Delta \vec{x} + \cdots \]

This generalizes to functions of \(n\) dimensions.

Example 9.4 (Approximation) \[ \text{Let} \quad A(x, y)= xy.\] Expand \(A\) around \(\vec{x}_0 = (x_0, y_0)\).

\[\begin{align*}A(\vec{x}_0 + \vec{\Delta x}) &= A(\vec{x}_0) + \begin{bmatrix}y_o & x_0\end{bmatrix} \begin{bmatrix}\Delta x \\ \Delta y\end{bmatrix} + \frac{1}{2}\begin{bmatrix}\Delta x & \Delta y\end{bmatrix} \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix} \begin{bmatrix}\Delta x \\ \Delta y\end{bmatrix} + \cdots \\ & = x_0y_0 + (y_0 \Delta x + x_0 \Delta y) + \Delta x\Delta y. \end{align*}\]

Example 9.5 (Using Taylor Expansion for error analysis) What is the maximum error in \(h\) given errors in \(x\) and \(\theta\) of \(\Delta x\) and \(\Delta \theta\), respectively: \[ h(x, \theta) = x \tan \theta.\]

We have \(x = x_0 \pm \Delta x\) and \(\theta = \theta_0 \pm \Delta \theta\), we are looking for \(\Delta h\), using the Taylor expansion of \(h\) up to the first order we have:

\[h(x_0 \pm \Delta x, \theta_0 \pm \Delta \theta) = h(x_0, y_0) \pm \left( \dfrac{\partial h}{\partial x} \right)_{\vec{x}_0} \Delta x \pm \left( \dfrac{\partial h}{\partial \theta} \right)_{\vec{x}_0} \Delta \theta + \cdots.\]

So for maximum error we have: \[|\Delta h| = |\tan{\theta_0}||\Delta x| + |x_0 \sec{\theta_0}^2||\Delta \theta|.\] For relative error we have: \[\left|\frac{\Delta h}{h(\vec{x}_0)}\right| = \left|\frac{\Delta x}{x_0}\right| + \left|\frac{2\Delta \theta}{\sin{2\theta_0}}\right|.\]