My notes of the excellent lectures of “Denis Auroux. 18.02 Multivariable Calculus. Fall 2007. Massachusetts Institute of Technology: MIT OpenCourseWare, License: Creative Commons BY-NC-SA.”

Matrices can be used to express linear relations between variables. For example when we change coordinate systems from eg. \((x_1,x_2,x_3)\) to \((u_1,u_2,u_3)\) where $$ \left\{ \begin{align} u_1 &= 2x_1+3x_2+3x_3 \nonumber \\ u_2 &= 2x_1+4x_2+5x_3 \nonumber \\ u_3 &= x_1+x_2+2x_3 \nonumber \end{align} \right. \label{eq:linear} $$

Expressed as matrix product $$ \begin{align*} \underbrace{ \left[ \begin{matrix} 2 & 3 & 3 \\ 2 & 4 & 5 \\ 1 & 1 & 2 \end{matrix} \right] }_{A}\; \underbrace{ \left[ \begin{matrix} x_1 \\ x_2 \\ x_3 \end{matrix} \right] }_{X} &= \underbrace{ \left[ \begin{matrix} u_1 \\ u_2 \\ u_3 \end{matrix} \right] }_{U} \ A X &= U \end{align*} $$ Here \(A\) is a \(3\times 3\) matrix, and \(X\) is a vector or a \(3\times 1\) matrix.

Matrix Multiplication


The entries in \(A X\) are the dot-product between the rows in \(A\) and the columns in \(X\), as shown below
matrix multiplication

For example, the entries of \(AB\) are $$ \left[ \begin{matrix} 1 & 2 & 3 & 4 \\ \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot \end{matrix} \right]\; \left[ \begin{matrix} 0 & \cdot \\ 3 & \cdot \\ 0 & \cdot \\ 2 & \cdot \end{matrix} \right] = \left[ \begin{matrix} 14 & \cdot \\ \cdot & \cdot \\ \cdot & \cdot \end{matrix} \right] $$


  • The width of \(A\) must equal the height of \(B\).
  • The product \(AB\) has the same height as \(A\) and the same width as \(B\).
  • Product \(AB\) represents: do transformation \(B\), then transformation \(A\). Unfortunately, you multiply from right to left. Similar to \(f(g(x))\), where you first apply \(g\) and then \(f\). The product \(BA\) is not even be defined when the width of \(B\) is not equal to the height of \(A\). In other words \(AB\ne BA\)
  • They are well behaved associative products: \((AB)X =A(BX)\)
  • \(BX\) means we apply transformation \(B\) to \(X\).

Identity matrix


The identify matrix is a matrix that does no transformation: \(IX=X\)

The height of \(I\) needs to match the width of \(X\). \(I\) has \(1\)’s on the diagonal, and \(0\)’s everywhere else. $$ I_{n\times n} = \left[ \begin{matrix} 1 & & & \ldots & 0 \\ & 1 & & & \vdots \\ & & 1 & & \\ \vdots & & & \ddots & \\ 0 & \ldots & & & 1 \end{matrix} \right] \nonumber $$

For example: $$ I_{3\times3} = \left[ \begin{matrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{matrix} \right] \nonumber $$


Matrix \(R\), gives a \(\frac{\pi}{2}\) rotation. $$ R = \left[ \begin{matrix} 0 & -1 \\ 1 & 0 \end{matrix} \right] \nonumber $$

In general $$ R \left[ \begin{matrix} x \\ y \end{matrix} \right] = \left[ \begin{array}{r} -y \\ x \end{array} \right] \nonumber $$

Try multiplying with unity vector \(\hat\imath\), \(\hat\jmath\), or take \(R\) squared $$ \begin{align*} R\; \hat\imath &= \left[ \begin{array}{rr} 0 & -1 \\ 1 & 0 \end{array} \right] \left[ \begin{matrix} 1 \\ 0 \end{matrix} \right] = \left[ \begin{matrix} 0 \\ 1 \end{matrix} \right] = \hat\jmath \\ R\;\hat\jmath &= \left[ \begin{array}{rr} 0 & -1 \\ 1 & 0 \end{array} \right] \left[ \begin{matrix} 0 \\ 1 \end{matrix} \right] = \left[ \begin{array}{r} -1 \\ 0 \end{array} \right] = -\hat\imath \\ R^2 &= \left[ \begin{array}{rr} 0 & -1 \\ 1 & 0 \end{array} \right] \left[ \begin{array}{rr} 0 & -1 \\ 1 & 0 \end{array} \right] = \left[ \begin{array}{rr} -1 & 0 \\ 0 & -1 \end{array} \right] = -I_{2\times 2} \end{align*} \nonumber $$

Inverse Matrix


The inverse of matrix \(A\) is \(A^{-1}\) such that $$ \shaded{ \left\{ \begin{align*} A\;A^{-1} &= I \\ A^{-1}\;A &= I \end{align*} \right. } \nonumber $$
That implies that \(A\) must be a square matrix (\(n \times n\)).

Referring to the system of equations \(\eqref{eq:linear}\), to express variables \(u_i\) in terms of \(x_i\) values, we need to inverse the transformation. For instance: in \(AX=B\); let matrix \(A\) and \(B\) be known what is \(X\)? $$ \begin{align*} AX &= B \Rightarrow \\ A^{-1}(AX) &= A^{-1} B \Rightarrow \\ IX &= A^{-1} B \Rightarrow \\ X &= A^{-1} B \end{align*} $$


The inverse matrix is calculated using the adjoined matrix

$$ A^{-1}=\frac{1}{\mathrm{det}(A)}\;\mathrm{adj}(A) \nonumber $$

For this \(3\times 3\) example $$ A=\left[ \begin{matrix} 2 & 3 & 3 \\ 2 & 4 & 5 \\ 1 & 1 & 2 \end{matrix} \right] $$

First, find the determinant of \(A\) $$ \det(A)= \left| \begin{array}{rrr} 2 & 3 & 3 \\ 2 & 4 & 5 \\ 1 & 1 & 2 \end{array} \right| = 3 \nonumber $$

Second, find the minors (matrix of determinants) of matrix \(A\) $$ \mathrm{minors} = \left[\begin{array}{rrr} \left|\begin{array}{rrr} 4 & 5 \\ 1 & 2 \end{array}\right| & \left|\begin{array}{rrr} 2 & 5 \\ 1 & 2 \end{array}\right| & \left|\begin{array}{rrr} 2 & 4 \\ 1 & 1 \end{array}\right| \\ \left|\begin{array}{rrr} 3 & 3 \\ 1 & 2 \end{array}\right| & \left|\begin{array}{rrr} 2 & 3 \\ 1 & 2 \end{array}\right| & \left|\begin{array}{rrr} 2 & 3 \\ 1 & 1 \end{array}\right| \\ \left|\begin{array}{rrr} 3 & 3 \\ 4 & 5 \end{array}\right| & \left|\begin{array}{rrr} 2 & 3 \\ 2 & 5 \end{array}\right| & \left|\begin{array}{rrr} 2 & 3 \\ 2 & 4 \end{array}\right| \end{array}\right] = \left[\begin{array}{rrr} 3 & -1 & -2 \\ 3 & 1 & -1 \\ 3 & 4 & 2 \end{array}\right] \nonumber $$

Third, find the cofactors. Flip the signs checker board $$ \begin{array}{rrr} + & – & + \\ – & + & – \\ + & – & + \end{array} \nonumber $$ A ‘\(+\)’ means leave it alone. A ‘\(-\)’ means flip the sign. Apply the cofactors to the minors. $$ \left[\begin{array}{rrr} 3 & 1 & -2 \\ -3 & 1 & 1 \\ 3 & -4 & 2 \end{array}\right] \nonumber $$

Fourth, transpose (switch rows and columns) to find the adjoined matrix \(\mathrm{adj}(A)\). $$ \mathrm{adj}(A) = \left[\begin{array}{rrr} 3 & -3 & 3 \\ 1 & 1 & -4 \\ -2 & 1 & 2 \end{array}\right] \nonumber $$

The inverse matrix \(A^{-1}\) follows as $$ A^{-1} = \frac{1}{\det(A)}\;\mathrm{adj}(A) = \frac{1}{3} \left[\begin{array}{rrr} 3 & -3 & 3 \\ 1 & 1 & -4 \\ -2 & 1 & 2 \end{array}\right] \nonumber $$

Equations of planes

An equation of the form \(ax+by+cz=d\), expresses the condition for the point \((x,y,z)\) to be in the plane. It defines a plane.


Plane through the origin

Find the equation of the plane through the origin with normal vector \(\vec N = \left\langle 1, 5, 10 \right\rangle\).

Plane with normal vector

Point \(P=(x,y,z)\) is in the plane when \(\vec{OP}\perp\vec{N}\). Therefore, their dot-product must equal zero (see vectors). $$ \begin{align*} \overrightarrow{OP}\cdot\vec{N} = 0 \\ \Leftrightarrow \left\langle x, y, z \right\rangle \cdot \left\langle 1, 5, 10 \right\rangle = 0 \\ \Leftrightarrow x + 5y + 10z = 0 \end{align*} $$

Plane not through the origin

Find the equation of the plane through \(P_0=(2,1,-1)\) with normal vector \(\vec N = \left\langle 1, 5, 10 \right\rangle\).

The normal vector is the same as in the first example, therefore it will be the same plane, but shifted so that it passes through \(P_0\).

Shifted plane with normal vector

Point \(P=(x,y,z)\) is in the plane when \(\overrightarrow{P_0P}\perp\overrightarrow{N}\). Therefore, their dot-product must equal zero (see vectors). This vector \(\overrightarrow{P_0P}\) equals \(P-P_0\). $$ \begin{align*} \left\langle x-2, y-1, z+1 \right\rangle \cdot \left\langle 1, 5, 10 \right\rangle &= 0 \\ \Leftrightarrow (x-2)+5(y-1)+10(z+1) &= 0 \\ \Leftrightarrow \underline{1}x+\underline{5}y+\underline{10}z &= -3 \end{align*} $$

In the equation \(ax+by+cz=d\), the coefficients \(\left\langle a,b,c\right\rangle\) is the normal vector \(\vec{N}\). Constant \(d\) indicates how far the plane is from the origin.

How could we have found the \(-3\) more quickly?

The first part of the equation is based on the normal vector $$ x + 5y + 10z = d \label{eq:planeequations2a} $$

We know \(P_0\) is in the plane. Substituting \(\left\langle x,y,z\right\rangle=P_0\) in \(\eqref{eq:planeequations2a}\) $$ \begin{align*} 1(2)+5(1)+10(-1) &= d \\ \Leftrightarrow d &= -3 \end{align*} \nonumber $$

Parallel or perpendicular?

Are vector \(\vec{v}=\left\langle 1,2,-1 \right\rangle\) and plane \(x+y+3z=5\) parallel, perpendicular or neither?

Vector \(\vec{v}\) is perpendicular to the plane when \(\vec{v}\)=\(s\;\vec{N}\), where \(s\) is a scalar. The normal vector follows from the coefficients of the plane equation $$ \vec{N} = \left\langle 1,1,3 \right\rangle \nonumber $$ Therefore \(\vec{V}\) is not perpendicular to the plane.

If \(\vec{v}\) is perpendicular to \(\vec{N}\), it is parallel to the plane. \(\vec{v}\perp\vec{N}\) when the dot-product equals zero. (see vectors) $$ \begin{align*} \vec{v}\cdot\vec{N} &= \left\langle 2, 1, -1 \right\rangle \cdot \left\langle 1, 1, 3 \right\rangle \\ &= 1+2-3 = 0 \end{align*} $$ Therefore, \(\vec{v}\) is parallel to the plane.

Solving systems of equations

To solve a system of equations, you try to find a point that is on several planes at the same time.


Find the \(x,y,z\) that satisfies the conditions of the \(3\times 3\) linear system: $$ \left\{ \begin{align*} x+ z = 1 \\ x + y = 2 \\ x + 2y + 3z = 3 \end{align*} \right. $$

The first 2 equations represent two planes that intersect in line \(P_1\cap P_2\). The third plane intersects that line at the point \(P(x,y,z)\), the solution to the linear system.

3 planes – one solution


  • if the line \(P_1\cap P_2\) is contained in \(P_3\), there are infinite many solutions. (Any point on the line is a solution.)
  • if the line \(P_1\cap P_2\) is parallel to \(P_3\), then there are no solutions.

3 planes – infinite solutions
3 planes – no solutions

In matrix notation $$ \underbrace{ \left[\begin{array}{rrr} 1 & 0 & 1 \\ 1 & 1 & 0 \\ 1 & 2 & 3 \end{array}\right] }_{A}\; \underbrace{ \left[\begin{array}{ccc} x \\ y \\ z \end{array}\right] }_{X} = \underbrace{ \left[\begin{array}{rrr} 1 \\ 2 \\ 3 \end{array}\right] }_{B} \nonumber $$

The solution to \(AX=B\) is given by (see Inverse matrix) $$ X = A^{-1}B \nonumber $$


$$ A^{-1}=\frac{1}{\det (A)}\mathrm{adj}(A) \nonumber $$

This implies that matrix \(A\) is only invertible when $$ \shaded{ \det (A)\ne 0 } \nonumber $$


Homogeneous case

Homogeneous means that equations are invariant under scaling. In matrix notation: \(AX=0\).

For example: $$ \left\{ \begin{align*} x + z = 0 \\ x + y = 0 \\ x + 2y + 3z = 0 \end{align*} \right. $$

There is always the trivial solution: \((0,0,0)\).

3 planes – infinite solutions with normal vectors

Depending on the \(\det(A)\):

  • If the \(\det (A)\ne 0\): \(A\) can be inverted. \(AX=0 \Leftrightarrow X=A^{-1}.0=0\). No other solutions.
  • If the \(\det (A)= 0\): the determinant of \(\vec{N_1},\vec{N_2},\vec{N_3}\) equals \(0\). This implies that the plane’s normal vectors \(\vec{N_1}\), \(\vec{N_2}\) and \(\vec{N_3}\) are coplanar. A line through origin, perpendicular to plane of \(\vec{N_1}, \vec{N_2}, \vec{N_3}\) is parallel to all 3 planes and contained in them. Therefore there are infinite many solutions. To find the solutions, one can take the cross-product of two of the normals. It’s a nontrivial solution.

General case

The system $$ AX=B \nonumber $$

Depending on the \(\det(A)\)

  • if the \(\det {A}\ne 0\): there is an unique solution \(X=A^{-1}B\)
  • if the \(\det {A}=0\): either no solution, or infinitely many solutions. If you would solve it by hand and end up with \(0=0\), there are infinite solutions; if you end up with 1=2, there are no solutions.



My notes of the excellent lectures of “Denis Auroux. 18.02 Multivariable Calculus. Fall 2007. Massachusetts Institute of Technology: MIT OpenCourseWare, License: Creative Commons BY-NC-SA.”

Description will use a plane \(\mathbb{R}^2\), or space \(\mathbb{R}^3\), but the same principles apply to higher dimensions.

Vectors are commonly displayed on the \(xyz\)-axis, with unit vectors \(\hat\imath\, \hat\jmath, \hat k\).

\(x,y,z\)-axis and \(\hat\imath,\hat\jmath,\hat k\)-unit vectors

Vectors do not have a start point, but do have a magnitude (length) and direction. They are described in terms of the unit vectors \(\hat\imath, \hat\jmath, \hat k\), or using angle brackets notation. $$ \vec{A} = \hat\imath\;a_1 + \hat\jmath\;a_2 + \hat\;k a_3 = \left\langle \;a_1,\;a_2,\;a_3\; \right\rangle $$

You can find the length of a vector \(|\vec{A}|\), by applying the Pythagorean theorem twice. $$ \shaded{ |\vec{A}| = \sqrt{(a_1)^2 + (a_2)^2 + (a_3)^2} } \nonumber $$


\(\vec{A}\) rotated over \(\tfrac{\pi}{2}\)

Let \(\vec{A}=\left\langle a_1, a_2\right\rangle\), and let \(\vec{A}’\) be \(\vec{A}\) rotated over \(\frac{\pi}{2}\). Then $$ \shaded{ \vec{A}’=\left\langle -a_2, a_1\right\rangle } \label{eq:rotation} $$


Let \(\vec{A}=\left\langle a_1, a_2, a_3\right\rangle\), and \(\vec{B}=\left\langle b_1, b_2, b_3\right\rangle\). Then \(\vec{A}\) plus \(\vec{B}\) is defined as $$ \shaded{ \vec{A}+\vec{B} = \left\langle a_1+b_1, a_2+b_2, a_3+b_3 \right\rangle } \nonumber $$

Geometric, the sum is the vector to the corner of the parallelogram.

\(\vec{A} + \vec{B}\)

Scalar product

Let \(s\) be a scalar, and \(\vec{A}=\left\langle a_1, a_2, a_3\right\rangle\). Then the scalar product of \(s\) and \(\vec{A}\) is defined as $$ \shaded{ s\;\vec{A} = \left\langle s\;a_1, s\;a_2, s\;a_3\right\rangle } \nonumber $$

Geometrically, it makes the vector longer or shorter.



Let \(\vec{A}=\left\langle a_1, a_2, a_3\right\rangle\), and \(\vec{B}=\left\langle b_1, b_2, b_3\right\rangle\). The dot-product of \(\vec{A}\) and \(\vec{B}\) is defined as the scalar $$ \shaded{ \vec{A} \cdot \vec{B} = \sum_i a_i\,b_i = a_1 b_1 + a_2 b_2 + a_3 b_3 } \nonumber $$

For a geometric interpretation, start with the dot-product of \(\vec{A}\) with itself $$ \vec{A}\cdot\vec{A} = |\vec{A}|^2 \cos 0 = |\vec{A}|^2 \label{eq:vecsquare} $$

Let \(\vec{C}=\vec{A}-\vec{B}\), and expand \(|\vec{C}|^2\) by applying \(\eqref{eq:vecsquare}\) $$ \begin{align} |\vec{C}|^2 &= \vec{C} \vec{C} = \left(\vec{A} – \vec{B} \right) \cdot \left(\vec{A} – \vec{B} \right) \nonumber \\ &= \vec{A}\cdot\vec{A} – \vec{A}\cdot\vec{B} – \vec{B}\cdot\vec{A} + \vec{B}\cdot\vec{B} \nonumber \\ &= |\vec{A}|^2 + |\vec{B}|^2 – 2 \vec{A}\cdot\vec{B} \label{eq:expanded} \end{align} $$

Recall, the law of cosines from geometry.

$$ c^2 = a^2 b^2 – 2 a b\cos\theta \nonumber $$
Law of cosines

Apply the law of cosines to \(|\vec{A}|\), \(|\vec{B}|\) and \(|\vec{C}|\) $$ |\vec{C}|^2 = |\vec{A}|^2 + |\vec{B}|^2 – 2 |\vec{A}| |\vec{B}|\cos\theta \label{eq:lawofcos} $$

Combining equations \(\eqref{eq:expanded}\) and \(\eqref{eq:lawofcos}\) gives the geometric equation $$ \shaded{ \vec{A}\cdot\vec{B} = |\vec{A}|\,|\vec{B}|\, \cos\theta } \nonumber $$

The dot-product can be used to compute length and angles in \(\mathbb{R}^3\), or find components of \(\vec{A}\) along unit vector \(\hat u\) $$ \shaded{ \vec{A}\cdot \hat u } \nonumber $$


In 2 dimensions

Let \(\vec{A}=\left\langle a_1, a_2\right\rangle\) and \(\vec{B}=\left\langle b_1, b_2\right\rangle\). The \(\mathbb{R}^2\)-determinant is defined as $$ \shaded{ \begin{align*} \mathrm{det}(\vec{A}, \vec{B}) &= \left|\begin{matrix} a_1 & a_2 \\ b_1 & b_2 \\ \end{matrix}\right| \\ &= a_1b_2-a_2b_1 \end{align*} } \nonumber $$

In 3 dimensions

Let \(\vec{A}=\left\langle a_1, a_2, a_3\right\rangle\), \(\vec{B}=\left\langle b_1, b_2, b_3\right\rangle\) and \(\vec{C}=\left\langle c_1, c_2, c_3\right\rangle\). The \(\mathbb{R}^3\)-determinant is defined as $$ \shaded{ \begin{align*} \mathrm{det}(\vec{A}, \vec{B}, \vec{C}) &= \left|\begin{matrix} a_1 & a_2 & a_3 \\ b_1 & b_2 & b_3 \\ c_1 & c_2 & c_3 \end{matrix}\right| \\ &= a_1 \left|\begin{matrix} b_2 & b_3 \\ c_2 & c_3 \end{matrix}\right| – a_2 \left|\begin{matrix} b_1 & b_3 \\ c_1 & c_3 \end{matrix}\right| + a_3 \left|\begin{matrix} b_1 & b_2 \\ c_1 & c_2 \end{matrix}\right| \end{align*} } \nonumber $$

Area of a parallelogram

Let \(\vec{A}=\left\langle a_1, a_2\right\rangle\), and \(\vec{B}=\left\langle b_1, b_2\right\rangle\).

Area of triangle

The area of the parallelogram shown above is calculated as width \(\times\) height. $$ \mathrm{area}_\triangle = |\vec{A}| |\vec{B}| \sin\theta \label{eq:triangle} $$

Change from \(\sin\theta\) to \(\cos\theta\) so it fits the dot-product.


Obtain \(\vec{A}’\) by rotating \(\vec{A}\) over \(\frac{\pi}{2}\), see equation \(\eqref{eq:rotation}\). Apply \(sin\;\theta = \cos(\tfrac{\pi}{2}-\theta)\) $$ \left. \begin{array}{l} \theta ‘ = \tfrac{\pi}{2} – \theta \\ \cos(\tfrac{\pi}{2}-\theta) = \sin\theta \end{array} \right\} \Rightarrow \cos(\theta’) = sin(\theta) \label{eq:sincos} $$

Substitute \(\eqref{eq:sincos}\) in \(\eqref{eq:triangle}\) $$ \mathrm{area} = |\vec{A}’| \cdot |\vec{B}| \cos\theta = \tfrac{1}{2}\vec{A}’\cdot \vec{B} $$

Expand the dot-product between \(\vec{A}’\) and \(\vec{B}\), and find the determinant $$ \begin{align*} \mathrm{area} &= \left\langle -a_2, a_1 \right\rangle \cdot \left\langle b_1, b_2 \right\rangle \\ &= \left( a_1 b_2 – a_2 b_1 \right) \\ &= \left|\begin{array}{cc} a_1 & a_2 \\ b_1 & b_2 \end{array}\right| \end{align*} $$

The area of a parallelogram follows $$ \shaded{ \mathrm{area} = \mathrm{det}\left(\vec{A},\vec{B}\right) } \label{eq:area} $$


Let \(\vec{A}=\left\langle a_1, a_2, a_3\right\rangle\), and \(\vec{B}=\left\langle b_1, b_2, b_3\right\rangle\). The cross product of \(\vec{A}\) and \(\vec{B}\) in \(\mathbb{R}^3\) is defined as the pseudo determinant vector $$ \shaded{ \begin{align*} \vec{A}\times\vec{B} &= \left| \begin{array}{ccc} \hat\imath & \hat\jmath & \hat k \\ a_1 & a_2 & a_3 \\ b_1 & b_2 & b_3 \end{array} \right| \\ &= \hat\imath \left| \begin{array}{cc} a_2 & a_3 \\ b_2 & b_3 \end{array} \right| – \hat\jmath \left| \begin{array}{cc} a_1 & a_3 \\ b_1 & b_3 \end{array} \right| + \hat k \left| \begin{array}{cc} a_1 & a_2 \\ b_1 & b_2 \end{array} \right| \end{align*} } \nonumber $$


  • the area of the parallelogram from the vectors \(\vec{A}\) and \(\vec{B}\) is \(|\vec{A}\times\vec{B}|\)
  • the direction of \(\vec{A}\times\vec{B}\) is perpendicular to the plane of the parallelogram.

The direction of the vector \(|\vec{A}\times\vec{B}|\) is determined by the right-hand rule

Cross-product right-hand rule

For example: \(\hat\imath\times\hat\jmath=\hat k\) $$ \begin{align*} \hat\imath\times\hat\jmath &= \left| \begin{array}{ccc} \hat\imath & \hat\jmath & \hat k \\ 1 & 0 & 0 \\ 0 & 1 & 0 \end{array} \right| \\ &= \hat\imath \left| \begin{array}{cc} 0 & 0 \\ 1 & 0 \end{array} \right| – \hat\jmath \left| \begin{array}{cc} 1 & 0 \\ 0 & 0 \end{array} \right| + \hat z \left| \begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array} \right| \\ &= \hat z \end{align*} $$

Some properties

The right-hand rule shows that $$ \shaded{ \vec{A}\times\vec{B}=-\vec{B}\times\vec{A} } $$

The parallelogram of \(\vec{A}\times\vec{A}\) has area zero. $$ \vec{A}\times\vec{A}=\vec{0} $$

Volume in space

Let \(\vec{A}, \vec{B}, \vec{C}\) in space \(\mathbb{R}^3\).

Volume in space

The volume equals the area of the base times the height. The area base follows from equation \(\eqref{eq:area}\). The height is the component of \(\vec{A}\) that is perpendicular to the base. Call the direction perpendicular to the base unit vector \(\hat n\). $$ \mathrm{volume} = |\vec{B}\times\vec{C}|\;(\vec{A}\cdot\hat n) \label{eq:volume1} $$

The unit vector \(\hat n\) can be derived from the cross-product of \(\vec{B}\) and \(\vec{C}\). To make it a unit vector, we divide by its length. $$ \hat n = \frac{\vec{B}\times\vec{C}}{|\vec{B}\times\vec{C}|} \nonumber $$

Substitute this back in \(\eqref{eq:volume1}\) $$ \begin{align*} \mathrm{volume} &= \bcancel{|\vec{B}\times\vec{C}|}\;\left(\vec{A}\cdot \frac{\left(\vec{B}\times\vec{C}\right)}{\bcancel{|\vec{B}\times\vec{C}|}}\right) \\ &= \vec{A}\ \cdot\ \left(\vec{B}\times\vec{C}\right) \end{align*} $$

This equals the determinant of \(\vec{A}, \vec{B}, \vec{C}\), the so called “triple product” rule $$ \shaded{ \mathrm{det}\left(\vec{A},\vec{B},\vec{C}\right) =\vec{A}\ \cdot\ \left(\vec{B}\times\vec{C}\right) } \label{eq:tripleproduct} $$

Because $$ a_1 \left| \begin{matrix} b_2 & b_3 \\ c_2 & c_3 \end{matrix} \right| – a_2 \left| \begin{matrix} b_1 & b_3 \\ c_1 & c_3 \end{matrix} \right| + a_3 \left| \begin{matrix} b_1 & b_2 \\ c_1 & c_2 \end{matrix} \right| \ = \left\langle a_1, a_2,a_3 \right\rangle \cdot \ \left( \hat\imath \left| \begin{array}{cc} b_2 & b_3 \\ c_2 & c_3 \end{array} \right| – \hat\jmath \left| \begin{array}{cc} b_1 & b_3 \\ c_1 & c_3 \end{array} \right| + \hat k \left| \begin{array}{cc} b_1 & b_2 \\ c_1 & c_2 \end{array} \right| \right) \nonumber $$

The volume in space described by \(\vec{A}\), \(\vec{B}\) and \(\vec{C}\) follows as $$ \shaded{ \mathrm{volume } = \mathrm{det}\left(\vec{A},\vec{B},\vec{C}\right) } \nonumber $$

Equation of a plane from points

Find the plane that contains the points \(p\), \(q\) and \(r\).

Point \(p, q, r, s\) in space


Consider \(\overrightarrow{qr}\), \(\overrightarrow{qs}\) and \(\overrightarrow{qp}\) that form a parallelepiped. if these vectors are in the same plane, the parallelepiped will be flat. In other words, it will have no volume.

If \(p\) is in the \(qrs\)-plane, the determinant should be \(0\). $$ \shaded{ \mathrm{det}\left( \overrightarrow{qp}, \overrightarrow{qr}, \overrightarrow{qs} \right) = 0 } \nonumber $$ with \(q\), \(r\) and \(s\) known, and \(p\) unknown, this equation will give the expression in \(x,y,z\) for the plane.

A more intuitive solution

Point \(p, q, r, s\) and \vec{n} in space

Let a “normal vector” \(\overrightarrow n\) be a vector perpendicular to the plane. Then \(p\) is the plane when \(\overrightarrow{qp} \perp \overrightarrow n\). Therefore the dot-product $$ \overrightarrow{qp}\cdot \overrightarrow{n} = 0 \label{eq:moreintuitive} $$

\(\overrightarrow{n}\) equals \(\overrightarrow{pr} \times \overrightarrow{qs}\). Substituting this in equation \(\eqref{eq:moreintuitive}\) $$ \overrightarrow{qp} \cdot \left( \overrightarrow{pr} \times \overrightarrow{qs} \right) = 0 \nonumber $$

Applying the triple product equation \(\eqref{eq:tripleproduct}\) gives the condition $$ \shaded{ \mathrm{det}\left( \overrightarrow{qp}, \overrightarrow{pr}, \overrightarrow{qs} \right) = 0 } $$ with \(q\), \(r\) and \(s\) known, and \(p\) unknown, this equation will give the expression in \(x,y,z\) for the plane.

Gradient field (in plane)


My notes of the excellent lectures 20 and 21 by “Denis Auroux. 18.02 Multivariable Calculus. Fall 2007. Massachusetts Institute of Technology: MIT OpenCourseWare, License: Creative Commons BY-NC-SA.”


When vector field \(\vec F\) is a gradient of function (written using the symbolic \(\nabla\)-operator) \(f(x,y)\), it is called a gradient field $$ \newcommand{pdv}[1]{\tfrac{\partial}{\partial #1}} \shaded{ \vec F = \nabla f = \left\langle \pdv{x}f, \pdv{y}f \right\rangle = \left\langle f_x, f_y \right\rangle } \nonumber $$

Where \(f(x,y)\) is called the potential.

Fundamental theorem

Recall the fundamental theorem of calculus

If you integrate the derivative, you get back the function. $$ \int_a^b \frac{df(t)}{dt}\,dt=f(b)-f(a) \label{eq:fndcalc} $$

In multivariable calculus, it is the same

If you take the line integral of the gradient of a function, what you get back is the function. $$ \shaded{ \int_C\nabla f\cdot d\vec r = f(P_1) – f(P_0) } \label{eq:fundthm} $$ where \(f(x,y)\) is called the potential.
Work in gradient field

Only when the field is a gradient, and you know the function \(f\), you can simplify the evaluation of the line integral for work. $$ \shaded{ \int_C\nabla f\cdot d\vec r=f(P_1)-f(P_0) } \nonumber $$


In coordinates, the gradient field \(\nabla f\) is expressed as $$ \newcommand{pdv}[1]{\tfrac{\partial}{\partial #1}} \nabla f =\left\langle \pdv{x}f, \pdv{y}f \right\rangle =\left\langle M, N \right\rangle \nonumber $$

Recall: the work integral in differential form
$$ \int_C\vec F\cdot d\vec r = \int_C\left( M\,dx + N\,dy \right) \nonumber $$

Substituting \(M\) and \(N\) from the gradient field into the work integral $$ \renewcommand{dv}[2]{\frac{d #1}{d #2}} \renewcommand{pdv}[2]{\frac{\partial #1}{\partial #2}} \begin{align} \int_C \nabla f\cdot d\vec r &= \int_C \pdv{f}{x}dx+\pdv{f}{y}dy \nonumber \\ &= \int_C \underline{ \left(\pdv{f}{\color{red} x}\dv{\color{red}x}{t} + \pdv{f}{\color{blue}y}\dv{\color{blue}y}{t}\right)}\,dt \label{eq:subgrad} \end{align} $$

Recall: the multivariable calculus chain rule

$$ \renewcommand{dv}[2]{\frac{d #1}{d #2}} \renewcommand{pdv}[2]{\frac{\partial #1}{\partial #2}} \dv{}{t}\,f\left(\,\color{red}x(t),\,\color{blue}y(t)\,\right) = \pdv{f}{\color{red}x}\frac{d\color{red}x}{dt}\,+\, \pdv{f}{\color{blue}y}\frac{d\color{blue}y}{dt} \nonumber $$

Substitute the reverse chain rule to equation \(\eqref{eq:subgrad}\), and integrate the differential of a function $$ \renewcommand{dv}[2]{\frac{d #1}{d #2}} \renewcommand{pdv}[2]{\frac{\partial #1}{\partial #2}} \begin{align*} \int_C \nabla f\cdot d\vec r &= \int_{t_0}^{t_1} \dv{}{t} f\color{grey}{\big(\,x(t),\,y(t)\,\big)}\,dt = \int_{t_0}^{t_1} f\color{grey}{\big(\,x(t),\,y(t)\,\big)} \end{align*} $$

By the fundamental theorem of calculus \(\eqref{eq:fndcalc}\) $$ \renewcommand{dv}[2]{\frac{d #1}{d #2}} \renewcommand{pdv}[2]{\frac{\partial #1}{\partial #2}} \begin{align*} \int_C \nabla f\cdot d\vec r &= f\color{grey}{\big(\,x(t_1),\,y(t_1)\,\big)} – f\color{grey}{\big(\,x(t_0),\,y(t_0)\,\big)} \end{align*} $$

With the points $$ \left\{ \begin{align*} P_0 &= \big(\,x(t_0),\,x(t_0)\,\big) \\ P_1 &= \big(\,x(t_1),\,x(t_1)\,\big) \end{align*} \right. $$

So, the work done in gradient field \(f\) can be expressed as the difference in potential $$ \renewcommand{dv}[2]{\frac{d #1}{d #2}} \renewcommand{pdv}[2]{\frac{\partial #1}{\partial #2}} \shaded{ \int_C \nabla f\cdot d\vec r = f(P_1)-f(P_0) } $$

Physics (using math notation)

A lot of forces are gradients of potentials such as the electric force and the gravitational force. However, magnetic fields are not gradients.

The work done by the electrical (or gravitational) force, is given by the change of the potential energy from the starting point to the ending point.

Note that: physics potentials are the opposite of mathematical potentials. The force \(\vec F\) will be negative the gradient. So in physics, it would be expressed as $$ \vec F=-\nabla f \nonumber $$


Equivalent properties of the work in a gradient field $$ \rm W = \int_C\nabla f\,d\vec r \nonumber $$

  1. Path-independent: the work only depends on at the start and end points, \(f(P_0)\) and \(f(P_1)\).
  2. Conservative: the work is \(0\) along all closed curves. This means a closed loop in a gradient field does not provide energy. Conservativeness means no energy can be extracted from the field for free. The total energy is conserved.
  3. \(Mdx+Ndy\) is an exact differential. That means it can be put in the form \(df\).

See also Curl and Green’s.



Let’s look at the earlier example again: Curve \(C\) starting and ending at \((0,0)\) through vector field \(\vec F\) $$ \begin{array}{l} \vec F = \left\langle y,x \right\rangle \\ C_1: (0,0)\ \mathrm{to}\ (1,0) \\ C_2: \mathrm{unit\ circle\ from}\ (1,0)\ \mathrm{to\ the\ diagonal} \\ C_3: \mathrm{from\ the\ diagonal\ to}\ (0,0) \end{array} \nonumber $$


Try function \(f=xy\). The gradient is $$ \renewcommand{pdv}[2]{\frac{\partial #1}{\partial #2}} \begin{align*} \nabla f &= \left\langle \pdv{}{x}xy, \pdv{}{y}xy \right\rangle = \left\langle y,x \right\rangle \end{align*} $$ That means the line integral can just be evaluated by finding the values of \(f\) at the endpoints.


Visualize using a contour plot of \(f=xy\) through gradient field \(\vec F\)

Contour plot of gradient with curve

Along the segments

  • On \(C_1\) the potential stays \(0\).
  • On \(C_2\) $$ \begin{align*} \int_{C_2}\vec F\cdot d\vec r &= f\left(\frac{1}{\sqrt 2},\frac{1}{\sqrt 2}\right) – f(1,0) \\ &= \frac{1}{2}-0=\frac{1}{2} \end{align*} $$
  • On \(C_3\) it decreases back to \(0\).
The sum of the work therefore is \(0\).

When is a vector field a gradient field?

Let vector field \(\vec F=\left\langle M,N\right\rangle\) where \(M\) and \(N\) are functions of \(x\) and \(y\).

When is this a gradient field? $$ \newcommand{pdv}[2]{\frac{\partial #1}{\partial #2}} \vec F = \left\langle M, N \right\rangle \stackrel{?}{=} \nabla f = \left\langle \pdv{}{x}f, \pdv{}{y}f \right\rangle \nonumber $$

If \(\vec F\) is a gradient field, \(\vec F=\nabla f\), then $$ \newcommand{dv}[2]{\frac{d #1}{d #2}} \newcommand{pdv}[2]{\frac{\partial #1}{\partial #2}} \left\{ \begin{align*} M=\pdv{}{x}f = f_x \\ N=\pdv{}{y}f = f_y \end{align*} \right. $$

Take the partial derivatives of \(M\) and \(N\) $$ \newcommand{pdv}[2]{\frac{\partial #1}{\partial #2}} \newcommand{ppdv}[3]{\frac{\partial^2 #1}{\partial #2\partial #3}} \begin{align} M_y=\pdv{M}{y}=\ppdv{}{\color{red}x}{\color{blue}y}f = f_{xy} \label{eq:proof1} \\ N_x=\pdv{M}{x}=\ppdv{}{\color{blue}y}{\color{red}x}f = f_{yx} \label{eq:proof2} \end{align} $$

Recall: the second partial derivative of function \(f\)

$$ \newcommand{dv}[2]{\frac{d #1}{d #2}} \newcommand{pdv}[2]{\frac{\partial #1}{\partial #2}} \newcommand{ppdv}[3]{\frac{\partial^2 #1}{\partial #2\partial #3}} \pdv{}{\color{red}x}\left(\pdv{f}{\color{blue}y}\right) =\ppdv{f}{\color{red}x}{\color{blue}y} =\ppdv{f}{\color{blue}y}{\color{red}x} =\pdv{}{\color{blue}y}\left(\pdv{f}{\color{red}x}\right) \nonumber $$

Based on the second partial derivative rule, equations \(\eqref{eq:proof1}\) and \(\eqref{eq:proof2}\) are the same. That implies that a gradient field should have the property $$ \newcommand{dv}[2]{\frac{d #1}{d #2}} \newcommand{pdv}[2]{\frac{\partial #1}{\partial #2}} \newcommand{ppdv}[3]{\frac{\partial^2 #1}{\partial #2\partial #3}} \left. \begin{align*} M_y = \ppdv{}{\color{red}x}{\color{blue}y}f = f_{xy} \\ N_x = \ppdv{}{\color{blue}y}{\color{red}x}f = f_{yx} \end{align*} \right\} \Rightarrow M_y=N_x $$

Therefore, \(\vec F=\left\langle M,N \right\rangle\), defined and differentiable everywhere, is a gradient field, when $$ \shaded{ M_y = N_x } \nonumber $$ (Also see the Definition of Curl.)

So, if \(\vec F=\left\langle M,N\right\rangle\) is a gradient field in a region of the plane.

  • \(\Leftrightarrow\) Conservative if \(\int_C \vec F\cdot d\vec r=0\) for any closed curve. To note it is along a closed curve, we note it as \(\oint_C\) $$ \oint_C \vec F\cdot d\vec r=0 \nonumber $$
  • \(\Rightarrow\) \(N_x=M_y\) at every point.
  • \(\Leftarrow\) \(N_x=M_y\) at every point, if \(\vec F\) is defined in the entire plane (or, in a simply connected region). (see later)



Is \(\vec F\) a gradient field? $$ \vec F=\underbrace{-y}_{M}\hat\imath+\underbrace{x}_N\hat\jmath=\left\langle -y,x \right\rangle \nonumber $$

\(\vec F\) is not a gradient field, because $$ \newcommand{dv}[2]{\frac{d #1}{d #2}} \newcommand{pdv}[2]{\frac{\partial #1}{\partial #2}} \newcommand{ppdv}[3]{\frac{\partial^2 #1}{\partial #2\partial #3}} \left. \begin{align*} \pdv{M}{y}&=\pdv{}{y}(-y)=-1 \\ \pdv{N}{x}&=\pdv{}{x}x=1 \end{align*} \right\} \Rightarrow \pdv{M}{y}\neq \pdv{N}{x} $$


For what value of \(a\) is \(\vec F\) a gradient field? $$ \vec F = \underbrace{(4x^2+axy)}_{M}\hat\imath+\underbrace{(3y^2+4x^2)}_N\hat\jmath = \left\langle 4x^2+axy, 3y^2+4x^2\right\rangle \nonumber $$

$$ \newcommand{dv}[2]{\frac{d #1}{d #2}} \newcommand{pdv}[2]{\frac{\partial #1}{\partial #2}} \newcommand{ppdv}[3]{\frac{\partial^2 #1}{\partial #2\partial #3}} \left. \begin{align*} \pdv{M}{y}&=\pdv{}{y}(4x^2+axy)=ax \\ \pdv{N}{x}&=\pdv{}{x}(3y^2+4x^2)=8x \end{align*} \right\} \Rightarrow a=8 $$ Note that \(x=0\) is not an answer everywhere.

Finding the potential

Recall: from earlier

When the field is a gradient, and you know the function \(f\), you can simplify the evaluation of the line integral for work. $$ \shaded{ \int_C\nabla f\cdot d\vec r=f(P_1)-f(P_0) } \nonumber $$ where \(f(x,y)\) is called the potential

To show the two methods, we will find the potential of the gradient field \(\vec F\) $$ \vec F = \left\langle \underbrace{4x^2+axy}_{=M}, \underbrace{3y^2+4x^2}_{=N} \right\rangle \nonumber $$

Compute line integrals

Apply the fundamental theorem, equation \(\eqref{eq:fundthm}\), to find an expression for the potential at \((x_1,y_1)\) $$ \begin{align} &\int_C\vec F\cdot d\vec r=f(x_1,y_1)-f(0,0) \nonumber \\ \Rightarrow & f(x_1,y_1) = \underbrace{\int_C\vec F\cdot d\vec r}_{\rm{work}} + \underbrace{f(0,0)}_{\mathrm{constant}} \label{eq:method1} \end{align} $$

Apply the work differential, to find the work along \(C\) in gradient field \(\vec F\) $$ \begin{align*} \underline{\int_C\vec F\cdot d\vec r} &= \int_C M\,dx+N\,dy \\ &= \int_C\left(4x^2+8xy\right)dx+\left(3y^2+4x^2\right)dy \end{align*} $$

The work in a gradient is path independent \(\Longrightarrow\) find the easiest path

Paths \(C, C_1, C_2\)

The easiest path is $$ \begin{array}{lll} C_1: & x\ \mathrm{from}\ 0\ \mathrm{to}\ x_1 & y=0 &\Rightarrow dy=0 \\ C_2: & x=x_1 & y\ \mathrm{from}\ 0\ \mathrm{to}\ y_1 &\Rightarrow dx=0 \end{array} \nonumber $$

Work along the curves

  • Along \(C_1\) $$ \begin{align*} \int_{C_1}\vec F\cdot d\vec r &= \int_0^{x_1}(4x^2+0)\,dx + 0 \\ &= \left[\frac{4}{3}x^3\right]_0^{x_1} = \frac{4}{3}{x_1}^3 \end{align*} $$
  • Along \(C_2\) $$ \begin{align*} \int_{C_2}\vec F\cdot d\vec r &= \int_0^{y_1}0+(3y^2+4{x_1}^2)\,dy \\ &= \left[y^3+4{x_1}^2y \right]_0^{y_1} = {y_1}^3+4{x_1}^2y_1 \end{align*} $$

The total work $$ \int_C\vec F\cdot d\vec r = \int_{C_1}\ldots + \int_{C_2}\ldots = \frac{4}{3}{x_1}^3 + {y_1}^3+4{x_1}^2y_1 \nonumber $$

Substitute \(\int_{C_1}, \int_{C_2}\) back in \(\eqref{eq:method1}\) $$ \begin{align*} f(x_1,y_1) &= \int_C\vec F\cdot d\vec r + \rm{c} \\ &= \frac{4}{3}{x_1}^3 + {y_1}^3+4{x_1}^2y_1 + \rm{c} \end{align*} $$

Drop the subscripts $$ f(x,y) = \frac{4}{3}x^3 + 4x^2y_1 + y^3\, (+ \rm{c}) \nonumber $$ If you would take the gradient, you should get \(\vec F\) back.

Compute using antiderivatives

No integrals, but you have to follow the procedure very carefully. A common pitfall, is to treat the second equation, like the first one.

For the example, we want to solve $$ \newcommand{pdv}[2]{\frac{\partial #1}{\partial #2}} \left. \begin{align} \pdv{f}{x}&=f_x=4x^2+8xy \label{eq:anti1} \\ \pdv{f}{y}&=f_y=3y^2+4x^2 \label{eq:anti2} \end{align} \right. $$

Integrate equation \(\eqref{eq:anti1}\) in respect to \(x\). The integration constant might depend on \(y\), so we call it \(g(y)\) $$ \newcommand{pdv}[2]{\frac{\partial #1}{\partial #2}} \pdv{f}{x} = 4x^2+8xy \xrightarrow{\int dx} f = \underline{\frac{4}{3}x^3 + 4x^2y + g(y)} \label{eq:anti} $$

To get information of \(g(y)\), we look at the other partial. Take the derivative of \(f\) in respect to \(y\) and compare to \(\eqref{eq:anti}\) $$ \require{cancel} \newcommand{pdv}[2]{\frac{\partial #1}{\partial #2}} \begin{align*} \pdv{}{y}\left(\frac{4}{3}x^3 + 4x^2y + g(y)\right) &= 3y^2+\bcancel{4x^2} \\ 0 + \bcancel{4x^2}+\pdv{}{y}g(y) &= 3y^2+\bcancel{4x^2} \Rightarrow \pdv{}{y}g(y) = 3y^2\\ \xrightarrow{\int dy} g(y) &= \int \pdv{}{y}g(y)\,dy = \underline{y^3 + c} \end{align*} $$ \(g(y)\) only depends on \(y\), so \(c\) is a true constant.

Plug this back into equation \(\eqref{eq:anti}\), gives the potential \(f(x,y)\) $$ f = \frac{4}{3}x^3 + 4x^2y + \underline{y^3\ (+ \rm{c})} \nonumber $$

Double integrals


My notes of the excellent lectures 16, 17 and 18 by “Denis Auroux. 18.02 Multivariable Calculus. Fall 2007. Massachusetts Institute of Technology: MIT OpenCourseWare, License: Creative Commons BY-NC-SA.”

Recall: the integral of function of one variable \(f(x)\) corresponds to the area below the graph of \(f\) over \([a,b]\).

$$ \int_a^b f(x)\,dx \nonumber $$

The input domain of \(f(x)\) is \(x\), therefore the region of integration \(R\) is on a line along the \(x\)-axis. Here \(x=a\) is the lower bound, and \(x=b\) is the upper bound.

Single variable function in \(xy\)-plane


For a function of two variables \(f(x,y)\), the region of integration \(R\) is bounded by a curve on the \(xy\)-plane. Using a double integral, you can find the volume between the region and a function \(z=f(x,y)\).

Volume under \(z=f(x,y)\) over region \(R\)

To compute the volume, start with cutting the area of \(R\) in small pieces \(\Delta A=\Delta y\Delta x\)

\(xyz\)-space with region \(R\) and area \(\Delta A\)
\(xy\)-plane with region \(R\) and area \(\Delta A\)

Consider all the pieces, and take the limit \(\Delta A_i\to 0\). $$ \lim_{\Delta A_i\to 0}\sum_i f(x_i,y_i)\,\Delta A_i \nonumber $$

Let \(dA=dy\,dx\) be a tiny piece of area in region \(R\). This gives the definition of the double integral of \(f(x,y)\) over region \(R\). $$ \shaded{ \iint_R f(x,y)\,dA } \nonumber $$

Double integrals are evaluated as two embedded integrals, starting with the inner integral $$ \int_{x_{min}}^{x_{max}} \underbrace{ \int_{y_{min}(x)}^{y_{max}(x)} f(x,y)\,dy }_{\text{function of only }y} \,dx \nonumber $$ The bound functions encode the shape of region \(R\).

The bounds of the inner integral might be functions of the outer variables.

In Cartesian coordinates

To compute \(\iint_R f(x,y)\,dA\), we take slices that scan the volume from the back to the front.

Slice for a given \(x_i\) in \(xyz\)-space
Slice for a given \(x\) in \(xy\)-plane

For the outer integral, let \(S(x_i)\) be the area of a slice \(\newcommand{\parallelsum}{\mathbin{\!/\mkern-5mu/\!}} \parallelsum\ yz\)-plane (the area of the thin purple vertical wall in the picture on the left). Then, the volume of each slice is \(S(x_i)\,\Delta x\). The total volume follows as $$ \begin{align} \rm{volume} &= \lim_{\Delta x\to 0}\sum_i S(x)\,\Delta x \nonumber \\ &=\int_{x_{min}}^{x_{max}} \underline{S(x)}\,dx \label{eq:doublecomp1} \end{align} $$

For the inner integral, \(x\) is constant and \(y\) is the variable of integration. For the range of \(y\), we go from the far left to the far right on the given slice, as shown in the picture on the right $$ S(x) = \int_{y_{min}(x)}^{y_{max}(x)} f(x,y)\,dy \label{eq:doublecomp2} $$ Note that these inner bounds depend on \(x\).

Substituting equation \(\eqref{eq:doublecomp2}\) in \(\eqref{eq:doublecomp1}\) give the iterated integral $$ \shaded{ \iint_R f(x,y)\,dA = \int_{x_{min}}^{x_{max}} \left[ \int_{y_{min}(x)}^{y_{max}(x)} f(x,y)\,dy \right] dx } \nonumber $$



Integrate \(z=1-x^2-y^2\) over the region $$ \left\{\begin{align*} 0\leq &x\leq 1 \\ 0\leq &y\leq 1 \end{align*}\right. \nonumber $$


Volume under \(z=f(x,y)\) over region \(R\)

The bounds are trivial $$ \begin{align*} \iint_R z(x,y)\,dA &= \int_0^1\underline{\int_0^1 1-x^2-y^2\,dy}\,dx \end{align*} \nonumber $$

Evaluate the inner integral $$ \begin{align*} \int_0^1 1-x^2-y^2\,dy &= \left[ y-x^2y-\frac{y^3}{3} \right]_{y=0}^1 \\ &= (1-x^2-\frac{1}{3}) – 0 \\ &= \underline{\frac{2}{3}-x^2} \end{align*} \nonumber $$

Substituted back in the outer integral $$ \begin{align*} \iint_R z(x,y)\,dA &=\int_0^1 \underline{\frac{2}{3}-x^2}\,dx \\ &=\left[\frac{2}{3}x-\frac{x^3}{3}\right]_{x=0}^1 = \frac{1}{3} \end{align*} \nonumber $$


Integrate \(z=1-x^2-y^2\) over the quarter unit disk region $$ \left\{\begin{align*} x^2 + y^2 &\leq 1 \\ x &\geq 0 \\ y &\geq 0 \end{align*}\right. \nonumber $$


\(z=f(x,y)\) and region \(R\) in \(xyz\)-space
Region \(R\) in \(xy\)-plane

Find the bounds of integration

  1. For \(\int dy\), the inner integral, express the bounds of \(y\) as a function of \(x\). The lower bond is \(0\). The upper bounds are on a quarter circle with \(x^2+y^2 = 1 \Rightarrow y=\sqrt{1-x^2}\).
  2. For \(\int dx\), the outer integral, the range for \(x\) is \(0\) to \(1\).

Fill in the bounds of the integrals $$ \begin{align*} \iint_R z(x,y)\,dA &= \int_0^1\underline{\int_0^{\sqrt{1-x^2}} 1-x^2-y^2\,dy}\,dx \end{align*} \nonumber $$

Evaluate the inner integral $$ \begin{align*} \int_0^{\sqrt{1-x^2}} 1-x^2-y^2\,dy &= \left[ y-x^2y-\frac{y^3}{3} \right]_{y=0}^{\sqrt{1-x^2}} \\ &= \left(\sqrt{1-x^2}-x^2\sqrt{1-x^2}-\frac{1}{3}(1-x^2)^{3/2}\right) – 0 \\ &= (1-x^2)(1-x^2)^{1/2}-\frac{1}{3}(1-x^2)^{3/2} \\ &= \underline{\frac{2}{3}\left(1-x^2 \right)^{3/2}} \end{align*} \nonumber $$

Substitute back in the outer integral $$ \begin{align*} \iint_R z(x,y)\,dA &= \int_0^1\underline{\frac{2}{3}\left(1-x^2 \right)^{3/2}}\,dx \end{align*} \nonumber $$

For computing the outer integral, substitute \(x=\sin\theta\) and using the double angle formula \(cos^2\theta=\frac{1}{2}(1+\cos2\theta)\) twice. This will eventually lead to the answer \(\frac{\pi}{8}\).

As we will see later, using polar coordinates will be much easier!

Changing the order of integration

We change the order of integration, when it makes it easier to compute the double integral.



When the bounds are numbers, they form a rectangle and we can simply switch the order of integration $$ \int_0^1\int_0^2 dx\,dy = \int_0^2\int_0^1 dy\,dx \nonumber $$


The written way can’t be computed. Change the order of integration. $$ \int_0^1\int_x^{\sqrt{x}} \frac{e^y}{y}dy\,dx \nonumber $$

Plot the region based on the existing bounds.


For the new inner integral, \(y\) is constant and \(x\) is the variable of integration. The old upper bound \(y=\sqrt{x} \Rightarrow x=y^2\), and lower bound \(y=x \Rightarrow x=y\) $$ \begin{align*} \int_0^1\int_x^{\sqrt{x}} \frac{e^y}{y}dy\,dx &= \int_0^1 \underline{\int_{y^2}^y \frac{e^y}{y}dx}\,dy \\ \end{align*} \nonumber $$

Evaluate the inner integral $$ \begin{align*} \int_{y^2}^y \frac{e^y}{y}dx &= \left[x\frac{e^y}{y}\right]_{x=y^2}^y \\ &=e^y – e^y y \end{align*} \nonumber $$

Find the antiderivative for \(e^y – e^y y\) (or use integrating by parts) $$ \begin{align*} \left(y\,e^y\right)’ &= 1.e^y+y.(e^y)’=e^y+y\,e^y \\ \Rightarrow \left(-y\,e^y\right)’ &= -e^y-y\,e^y \\ \Rightarrow \left(-y\,e^y+2\,e^y\right)’ &= -e^y-y\,e^y + 2e^y \\ &= e^y-y\,e^y \end{align*} \nonumber $$

The outer integral evaluates to $$ \begin{align*} \int_0^1\int_x^{\sqrt{x}} \frac{e^y}{y}dy\,dx &= \int_0^1 (e^y – e^y y)\,dx \\ &=\Big[ -y\,e^y + 2\,e^y \Big]_{y=0}^1 \\ &= (-1.e^1+2e^1)-(0+2.e^0) \\ &= e -2 \end{align*} \nonumber $$


Exchange the order of integration to \(dx\,dy\) for $$ \int_0^1\int_x^{2x}f\,dy\,dx \nonumber $$

Plot the region based on the existing bounds.

Simply connected regions
Not simply connected regions

These not simply connected regions results in two terms: \(0\lt y\lt 1\) and \(1\lt y\lt 2\). Each with different bounds for \(x\) $$ \int_0^1\int_x^{2x}f\,dy\,dx = \int_{0}^{1}\int_{y/2}^{y} f\,dx\,dy + \int_{1}^{2}\int_{y/2}^{1} f\,dx\,dy \nonumber $$

In polar coordinates

In general, you switch to polar coordinates because the region is easier to setup, or the integrand becomes simpler.

Polar-coordinates vs. \(xy\)-coordinates

Polar coordinates express point \((x,y)\) in the plane, using \(r\) for the distance from the origin \(r\), and \(\theta\) as the counterclockwise angle with the positive \(x\)-axis. $$ \shaded{ \begin{align*} x &= r\cos\theta \\ y &= r\sin\theta \end{align*} } \nonumber $$

Area element

The area element \(\Delta A\) is almost rectangular as shown below

Area \(\Delta A\)

One side is \(\Delta r\) and the other side is \(r\,\Delta\theta\). For the limit where \(\Delta\theta,r\to 0\), the area element becomes $$ \shaded{ dA=r\,dr\,d\theta } \nonumber $$

The double integral in polar coordinates $$ \shaded{ \int_{\theta_{min}}^{\theta_{max}} \int_{r_{min}}^{r_{max}} f(r,\theta)\,r\,dr\,d\theta } \nonumber $$



Redo the earlier problem using polar coordinates: Integrate \(z=1-x^2-y^2\) over the quarter unit disk region $$ \left\{\begin{align*} x^2 + y^2 &\leq 1 \\ x &\geq 0 \\ y &\geq 0 \end{align*}\right. \nonumber $$

Plot of the region

Region \(R\) in polar-coordinates

Set the bounds for the integrals

  1. For \(\int dr\), the inner integral: fix the value of \(\theta\), and let \(r\) vary. For the bounds, ask yourself for what values of \(r\) will I be inside my region. In this case, that is \( 0\lt r\lt 1\). We let \(\theta\) vary.
  2. For \(\int d\theta\), the outer integral: ask yourself what values of \(\theta\) will I be inside my region.

Fill in the bounds of the double integral $$ \int_0^{\pi/2}\int_0^1 f(r,\theta)\,r\,dr\,d\theta \nonumber $$

Instead of just replacing \(x=r\,\cos\theta\) and \(y=r\,\sin\theta\), we can express the function \(f(x,y)\) in polar coordinates using \(r^2=x^2+y^2\) $$ \begin{align*} f(x,y) &= 1-x^2-y^2 \\ &= 1-(x^2+y^2) \\ \Leftrightarrow f(r,\theta) &= 1-r^2 \end{align*} \nonumber $$

Evaluate the double integral $$ \begin{align*} \text{volume} &= \int_0^{\pi/2}\underline{\int_0^1 (1-r^2)\,r\,dr}\,d\theta \\ &= \int_0^{\pi/2}\left[ \frac{r^2}{2}-\frac{r^4}{4} \right]_{r=0}^1 \,d\theta \\ &= \int_0^{\pi/2} \frac{1}{4} \,d\theta = \frac{1}{4}\frac{\pi}{2}=\frac{\pi}{8} \end{align*} \nonumber $$


Find the area

Find the area of region \(R\). $$ \shaded{ \text{Area}(R)=\iint_R 1\,dA } \nonumber $$

Or, the mass of a (flat) object with density \(\delta\) = mass per unit area. $$ \shaded{ \begin{align*} \Delta m &= \delta .\Delta A \\ \Rightarrow \text{Mass}(R) &= \iint_R\delta(x,y)\,dA \end{align*} } \nonumber $$

Find the average value

Average value of \(f\) in \(R\). $$ \shaded{ \bar f = \frac{1}{\text{Area}(R)}\iint_R f(x,y)\,dA } \nonumber $$

Or, the weighted average value of \(f\) in \(R\) with density \(\delta\) $$ \shaded{ \frac{1}{\text{Mass}(R)}\iint_R f(x,y)\,\underbrace{\delta(x,y)\,dA}_{\text{mass element}} } \nonumber $$

Or, the center of mass \((\bar x,\bar y)\) of a (planar) object with density \(\delta\). The weighted averages on \(x\) and \(y\) $$ \shaded{ \left\{ \begin{align*} \bar x &= \iint_R x\,\delta(x,y)\,dA \\ \bar y &= \iint_R y\,\delta(x,y)\,dA \end{align*} \right. } \nonumber $$

Find the moment of inertia

Recall from physics:

The kinetic energy of a point mass equals \(\frac{1}{2}mv^2\)

Mass is how hard it is to impart a translation movement. (to make it move)

Similarly, the moment of inertia about an axis is how hard it is to rotate about that axis (to make it spin).

Linear motion
Circular motion

Let \(\omega\) be the rate of change of angle \(\theta\), \(\omega=\frac{d\theta}{dt}\).

At unit time, a mass \(m\) rotating by \(\omega\), goes a distance of \(r\omega\), so the speed is \(v=r\omega\). The kinetic energy follows as $$ \shaded{ \tfrac{1}{2}m\,v^2=\tfrac{1}{2}\underline{mr^2}\omega^2 } \nonumber $$

The moment of inertia is defined as $$ \shaded{ I = mr^2 } \nonumber $$

For rotation movements, \(I\) replaces the mass \(m\). The rotational kinetic energy is $$ \shaded{ \frac{1}{2}\,I\,\omega^2 } \nonumber $$

Rotation about the origin

A solid with density \(\delta_i\) rotating about the origin.

Solid rotating around origin

A tiny area \(\Delta A\) with mass \(\Delta m=\delta_i\,\Delta A\), has a moment of inertia $$ \Delta m.r^2=\delta.\Delta A.r^2 \nonumber $$

Consider all the pieces $$ \shaded{ I_o=\iint_R r^2\,\delta\,dA } \nonumber $$ where \(r^2=x^2+y^2\) in \(xy\)-coordinates.

Rotation about the \(\ x\)-axis

In the \(xyz\)-space, the distance to the \(x\)-axis is \(|y|\).

Solid spinning around \(x\)-axis

Moment of inertia for a solid with density \(\delta\) rotaring about the \(x\)-axis $$ \shaded{ I_x=\iint_R y^\,\delta\,dA } \nonumber $$



Disk of radius \(a\) with uniform density \(\delta=1\) spinning around its center. What is the moment of inertia?

Disk spinning around origin

What is \(r^2\) for any point inside \(R\) in this formula? $$ I_o=\iint_R r^2\,\delta\,dA \nonumber $$

Using polar coordinates, \(r\) will go from \(0\) to \(a\) and \(dA=r\,dr\,d\theta\) $$ \begin{align*} I_o &= \iint_R r^2.1.dA \\ &= \int_0^{2\pi} \underline{\int_a^a r^2 r\,dr}\,d\theta \\ &= \int_0^{2\pi} \left[ \frac{r^4}{4} \right]_{r=0}^a\,d\theta = \int_0^{2\pi} \frac{a^4}{4}\,d\theta \\ &= \frac{a^2}{4}\Big[\theta\Big]_0^{2\pi} = \frac{1}{2}\pi a^4 \end{align*} \nonumber $$


How much harder is it to spin this disk around a point on its circumference?

Disk spinning around its circumference

The inertia $$ \begin{align*} I_o & =\iint r^2\,dA \\ &= \int_{-\pi/2}^{\pi/2} \underline{\int_0^{2a\cos\theta} r^2 r\,dr}\,d\theta \\ \end{align*} \nonumber $$

Evaluate the inner integral $$ \begin{align*} \int_0^{2a\cos\theta} r^2 r\,dr &=\left[\frac{r^4}{4}\right]_{r=0}^{2a\cos\theta} \\ &= 4a^4\cos^4\theta \end{align*} \nonumber $$

Evalutate the outer integral $$ \begin{align*} I_o & =\iint r^2\,dA \\ &= \int_{-\pi/2}^{\pi/2} 4a^4\cos^4\theta\,d\theta = \dots = \frac{3}{2}\pi a^4 \end{align*} \nonumber $$

It is three times harder to spin a Frisbee about a point on a circumference than around the center.

Change of variables

We change variables, when it simplifies the integrand or bounds, so it becomes easier to compute the double integral.



Determine the area of an ellipse with semi-axes \(a\) and \(b\). $$ \left(\frac{x}{a}\right)^2+\left(\frac{y}{b}\right)^2=1 \nonumber $$

The double integral for the area $$ \rm{Area} = \iint_{\left(\frac{x}{a}\right)^2+\left(\frac{y}{b}\right)^2\lt 1} dx\,dy \nonumber $$

Use substitution to make it look more like a circle $$ \left. \begin{array}{c} \text{set }\frac{x}{a}=u \Rightarrow du = \frac{1}{a}dx \\ \text{set }\frac{y}{a}=v \Rightarrow dv = \frac{1}{b}dy \end{array} \right\} \\ \begin{align*} \Rightarrow du\,dv &= \frac{1}{ab}dx\,dy \\ \Rightarrow dx\,dy &= ab\,du\,dv \end{align*} \nonumber $$

Substitute it back in the double integral $$ \begin{align*} \rm{Area} &= \iint_{u^+v^2\lt 1} ab\,du\,dv \\ &= ab\underbrace{\iint_{u^+v^2\lt 1} du\,dv}_{\text{area of unit disk}} = a\,b\,\pi \end{align*} \nonumber $$


To simply integrand or bounds, we set a change of variables as $$ \left\{ \begin{align*} u &= 3x-2y \\ v &= x+y \end{align*} \right. \nonumber $$

What is the relation between \(dA=dx\,dy\) and \(dA’=du\,dv\)?

\(\Delta x, \Delta y\)
\(\Delta u, \Delta v\)

The linear transformation changes it to a parallelogram. Because of the linear change of variables, the area scaling factor doesn’t depend on the choice of rectangle. So let’s take the simplest rectangle, the unit square.

Simplest rectangle in \(xy\)

Applying the transformation to the corners

Simplest rectangle in \(uv\)

The area \(A’\) is the determinant of the two vectors from the origin $$ A’ = \left| \begin{array}{rr} 3 & 1 \\ -2 & 1 \end{array} \right| = 3+2=5 \nonumber $$

For any other rectangle, area is also multiplied by \(5\) $$ \begin{align*} dA’ &= 5\,dA \\ \Rightarrow du\,dv &= 5\,dx\,dy \\ \Rightarrow \iint\ldots\,dx\,dy &= \iint\ldots\,\frac{1}{5}du\,dv \end{align*} \nonumber $$


Changing variables to \(u,v\) means $$ \newcommand{pdv}[2]{\frac{\partial #1}{\partial #2}} \left\{ \begin{align*} u = u(x,y) &\Rightarrow \Delta u\approx \pdv{u}{x}\Delta x+\pdv{u}{y}\Delta y = u_x\Delta x+u_y\Delta y \\ v = v(x,y) &\Rightarrow \Delta v\approx \pdv{v}{x}\Delta x+\pdv{v}{y}\Delta y = v_x\Delta x + v_y\Delta y \end{align*} \right. \nonumber $$

In matrix form $$ \left[ \begin{array}{c} \Delta u \\ \Delta v \end{array} \right] \approx \left[ \begin{array}{cc} u_x & u_y \\ v_x & v_y \end{array} \right] \left[ \begin{array}{c} \Delta x \\ \Delta y \end{array} \right] \nonumber $$

A small rectangle in \(xy\)-coordinates corresponds to a small parallelogram in \(uv\)-coordinates. The sides of the parallelogram from \((0,0)\), are the vectors \(\left\langle\Delta x,0\right\rangle\) and \(\left\langle 0,\Delta y\right\rangle\) $$ \left\{ \begin{align*} \left\langle\Delta x,0\right\rangle \rightarrow \left\langle\Delta u,\Delta v\right\rangle &\approx \left\langle u_x\Delta x, v_x\Delta x\right\rangle \\ \left\langle 0,\Delta y\right\rangle \rightarrow \left\langle\Delta u,\Delta v\right\rangle &\approx \left\langle u_y\Delta y, v_y\Delta y\right\rangle \end{align*} \right. \nonumber $$

The area \(\rm{Area}’\) of the parallelogram is the determinant $$ \text{Area}’ = \rm{det} \left( \left[ \begin{array}{cc} u_x & u_y \\ v_x & v_y \end{array} \right] \right) \Delta x\,\Delta y \nonumber $$

When you have a general change of variables, \(du\,dv\) versus \(dx\,dy\) is given by the determinant of the matrix of partial derivatives. $$ \rm{det}\left( \left[ \begin{array}{cc} u_x & u_y \\ v_x & v_y \end{array} \right] \right) \nonumber $$

The definition of Jacobian just means the ratio between \(du\,dv\) and \(dx\,dy\). (Not a partial derivative.) Here the vertical bars stand for determinant.

$$ \newcommand{pdv}[2]{\frac{\partial #1}{\partial #2}} \shaded{ J = \pdv{(u,v)}{(x,y)} = \left| \begin{array}{cc} u_x & u_y \\ v_x & v_y \end{array} \right| } \nonumber $$

Then, because area is always positive $$ \shaded{ du\,dv = |J|\,dx\,dy = \left|\pdv{(u,v)}{(x,y)}\right|\,dx\,dy } \nonumber $$



Switching to polar coordinates $$ \begin{align*} x &= r\cos\theta \\ y &= r\sin\theta \end{align*} \nonumber $$

The Jacobian $$ \newcommand{pdv}[2]{\frac{\partial #1}{\partial #2}} \begin{align*} \pdv{(x,y)}{(r,\theta)} &= \left| \begin{array}{cc} x_r & x_\theta \\ y_r & u_\theta \end{array} \right| \\ &= \left| \begin{array}{cc} \cos\theta & -r\sin\theta \\ \sin\theta & r\cos\theta \end{array} \right| \\ &= r\cos^2\theta – (-r\sin^2\theta) \\ &= r(\cos^2\theta + \sin^2\theta) = r \end{align*} $$

Not a constant, but a function of \(r\), so $$ \shaded{ \begin{align*} dx\,dy &= |r|\,dr\,d\theta \\ &= r\,dr\,d\theta \end{align*} } \nonumber $$

Remark: you can compute the one that easier to compute, because they are the inverse of each other. $$ \newcommand{pdv}[2]{\frac{\partial #1}{\partial #2}} \pdv{(u,v)}{(x,y)} \cdot \pdv{(x,y)}{(u,v)} = 1 \nonumber $$


Compute $$ \int_0^1\int_0^1 x^2y\,dx\,dy \nonumber $$

using change of variables to $$ \left\{ \begin{align*} u &= x \\ v &= xy \end{align*} \right. \nonumber $$

Step 1: Find the area element using the Jacobian $$ \newcommand{pdv}[2]{\frac{\partial #1}{\partial #2}} \begin{align*} \pdv{(x,y)}{(r,\theta)} &= \left| \begin{array}{cc} u_x & u_y \\ v_x & v_y \end{array} \right| \\ &= \left| \begin{array}{cc} 1 & 0 \\ y & x \end{array} \right| = x \end{align*} $$ With \(x\) positive in the region $$ \begin{align*} du\,dv &= |x|\,dx\,dy \\ &= x\,dx\,dy \end{align*} \nonumber $$

Step 2: Express the integrand in terms of \(u,v\) $$ \begin{align*} x^2y\,dx\,dy &= x^2y\,\frac{1}{x}\,du\,dv = xy\,du\,dv \\ &= u\frac{v}{u}\,du\,dv = v\,du\,dv \end{align*} \nonumber $$ Compute (or \(dv\,du\)) $$ \iint_\ldots v\,du\,dv \nonumber $$

Step 3: Find the bounds for \(u,v\) in the new integral $$ \begin{align*} \int_\ldots^\ldots \underbrace{\int_\ldots^\ldots v\,du}_{u \text{ changes},\\ v\text{ is constant}}\,dv \end{align*} \nonumber $$ \(v=\rm{constant} \rightarrow xy=\rm{constant} \rightarrow y=\frac{\rm{constant}}{x}\)

\(xy\) and \(uv\)-coordinates
What is the value of \(u\) when we enter the region from the top, where \(y=1\)? $$ \begin{align*} y &=1 \\ \Rightarrow y &=\frac{v}{u}=1 \\ \Rightarrow u &= v \end{align*} $$ What is the value of \(u\) when we exit the region, where \(x=1\)? $$ \begin{align*} x &=1 \\ \Rightarrow u &= 1 \end{align*} $$ The smallest value of \((x,y)\) is \((0,0)\), what corresponds to \(v=0\). The largest value of \((x,y)\) is \((1,1)\), what corresponds to \(v=1\).

Step 4: The double integral follows as $$ \begin{align*} \int_0^1 \int_v^1 v\,du\,dv \end{align*} \nonumber $$ $$ \nonumber $$

How could we have found the bounds easier? Draw the picture the \(uv\)-coordinates.




Vector calculus is about differentiation and integration of vector fields, primarily in \(\mathbb{R}^3\) with coordinates \(x,y,z\) and unit vectors \(\hat{\imath},\hat{\jmath},\hat{k}\). Here we will focus on differentiation.

Axis and unity vectors


Parametric curve

A parametric curve is

a function with one-dimensional input and a multi-dimensional output.

Parametric curves may be expressed as a set of equations, such as $$ f(t)= \left\{ \begin{array}{l} f_x(t)=t^3-3t \\ f_y(t)=3t^2 \end{array} \right. \nonumber $$ or as a vector $$ f(t) = \left\langle \;t^3-3t,\; 3t^2\; \right\rangle \label{eq:parmcurve} $$

Multi-variable functions

A multi-variable function is a function with more than one argument. This concept extends the idea of a function of one variable to several variables.

In other words, let \(f\) be a function of variables \(x, y, \cdots\), then function \(f(x,y.\cdots)\) is a multi-variable function.

Scalar field

When a multi-variable function returns a scalar value for each point, it is called a scalar field.

A scalar field maps \(n\)-dimensional space to real numbers. Scalar fields are commonly visualized as values on a grid of points in the plane. For instance, a weather map showing the temperature \(T\) at each point \((x,y)\) on a map.

For example: scalar field \(z=\sin x + \cos y\), can be plotted with the result encoded as color, or on the \(z\)-axis.

Plot of \(z=\sin x + \cos y\)
Plot of \(z=\sin x + \cos y\)

Vector field

When a multi-variable function assigns a vector to each point \((x,y)\), it is called a vector field.

Vector fields are commonly visualized as arrows from a grid of points in the plane. This allows a \(n\)-dimensional input and output to be visualized in a \(n\)-dimensional drawing, where the arrows further give an intuition of e.g. fluid or air flow. An example of a vector field is a weather map where the magnitude and angle of the vectors represent the speed and direction of the wind at each point \((x,y)\).

Hurricane Sandy (2021-10-28)

In other words, let \(M,N,\cdots\) be functions of variables \(x,y,\cdots\). Then the function \(\vec{F}\) defined below, is called a vector field. $$ \vec{F}=\hat\imath M + \hat\jmath N +\;\cdots = \left\langle M, N, \cdots \right\rangle $$

The vectors are drawn starting at input \((x,y)\) where the magnitude and direction is determined by \(\vec{F}(x,y)\). For example, the plots for \(\vec{F}=\left\langle\; x,\; y \right\rangle\), and \(\vec{F}=\left\langle\; -y,\; x \right\rangle\) are shown below.

Plot of \(\vec{F}=\left\langle\; x,\; y \right\rangle\)
Plot of \(\vec{F}=\left\langle\; -y,\; x \right\rangle\)
“uniform rotation at unit angular velocity”


If an object is rotating in two dimensions, you can describe the rotation completely with a single value: the angular velocity, \(\omega=\phi/t\). Where a positive value indicates a counter-clockwise​ rotation.

For an object rotating in three dimensions, the direction can be described using a 3D vector, \(\vec{\omega}\). The magnitude of the vector indicates the angular speed; the direction indicates the axis around which it tends to swirl.

Right-hand rule for rotation

The direction of the angular velocity is determined by the convention called the right-hand rule for rotation:

When the object is rotating counter-clockwise, the direction of angular velocity is along with the circular path directed upwards.


Del (\(\nabla\)) is a shorthand form to simplify long mathematical expressions such as the Maxwell equations. Think of this symbol as loosely representing a vector of partial derivative operators $$ \newcommand{pdv}[1]{\tfrac{\partial}{\partial #1}} \nabla = \hat{\imath}\pdv{x} + \hat{\jmath}\pdv{y} + \hat{k}\pdv{z} $$ Or, in vector notation $$ \newcommand{pdv}[1]{\tfrac{\partial}{\partial #1}} \nabla = \left\langle \pdv{x}, \pdv{y}, \pdv{z} \right\rangle $$

Depending how \(\nabla\) is applied, it may denote: a gradient scalar field; the divergence of a vector field; or the curl of a vector field. Each of these are described below.


Let \(f\) be a scalar field with variables \(x,y,z\). The vector derivative of the scalar field \(f(x,y,z)\) is defined as the gradient. Denoted as the \(\nabla\) “multiplied” by a scalar field \(f\) $$ \newcommand{pdv}[1]{\tfrac{\partial}{\partial #1}} \begin{align} \nabla f &=\left\langle \pdv{x}, \pdv{y}, \pdv{z} \right\rangle f \\ &= \left\langle \pdv{x}f, \pdv{y}f, \pdv{z}f \right\rangle \end{align} $$

The gradient of \(f\) at point \((x,y)\) is a vector that points in the direction that makes the function \(f\) increase the fastest. The magnitude of the gradient at point \((x,y)\) equals the slope in that direction.


Find the gradient for scalar field \(f(x,y,z)=x+y^2+z^3\) $$ \newcommand{pdv}[1]{\tfrac{\partial}{\partial #1}} \begin{align*} \nabla f &=\left\langle \pdv{x}, \pdv{y}, \pdv{z} \right\rangle f \\ &=\left\langle \pdv{x}f, \pdv{y}f, \pdv{z}f \right\rangle \\ &=\left\langle 1, 2y, 3z^2 \right\rangle \end{align*} $$


For intuition, picture the vector field as a fluid where each vector describes the velocity at that point. Around some points, where all vectors point outward, the fluid just springs in to existence, as if there is a source. A positive divergence tells you how much of a source it is. Divergence is also positive if there is more flowing out than in that point.

Let \(\vec{v}\) be a vector field where \(v_x,v_y,v_z\) are each functions of variables \(x,y,z\). $$ \vec{v} = \left\langle v_x, v_y, v_z \right\rangle $$

The divergence of vector field \(\vec{v}\) is written as a dot-product $$ \newcommand{pdv}[1]{\tfrac{\partial}{\partial #1}} \begin{align} \nabla \cdot \vec{v} &=\left\langle \pdv{x}, \pdv{y}, \pdv{z} \right\rangle \cdot \left\langle v_x, v_y, v_z \right\rangle \\ &=\pdv{x}v_x + \pdv{y}v_y + \pdv{z}v_z \end{align} $$

When the divergence at point \((x,y)\) is positive, the density increases. In other words, more is coming in than is leaving at that point. For example the electric field of two electric charges

Electric field of charges \(p\) and \(q\)


Find the divergence for vector field \(\vec{v}(x,y,z)=\left\langle xy,yz,xz\right\rangle\) $$ \newcommand{pdv}[1]{\tfrac{\partial}{\partial #1}} \begin{align*} \nabla \cdot \vec{v} &=\left\langle \pdv{x}, \pdv{y}, \pdv{z} \right\rangle \cdot \left\langle xy,yz,xz\right\rangle \\ &= \pdv{x}xy + \pdv{y}yz + \pdv{z}xz \\ &= y + z + x = x + y + z \end{align*} $$ The result is a scalar.


For intuition, think about the vector field as a fluid flow. Imagine placing a tiny paddlewheel into the vector field at a point. Would it spin around? If it spins clockwise, it is said to have positive curl.

Curl in water

Let \(\vec{v}\) be a vector field where \(v_x,v_y,v_z\) are each functions of variables \(x,y,z\). $$ \vec{v} = \left\langle v_x, v_y, v_z \right\rangle $$

The curl (rotation) of vector field \(\vec{v}\) is written as a cross-product. $$ \newcommand{pdv}[1]{\tfrac{\partial}{\partial #1}} \begin{align} \nabla \times \vec{v} &=\left\langle \pdv{x}, \pdv{y}, \pdv{z} \right\rangle \times \left\langle v_x, v_y, v_z \right\rangle \end{align} $$

The cross product can be computed using the pseudo-determinant. $$ \require{color} \newcommand{pdv}[1]{\tfrac{\partial}{\partial #1}} \begin{align} \nabla \times \vec{v} &=\begin{vmatrix} \color{red}{\hat{\imath}} & \color{green}{\hat{\jmath}} & \color{blue}{\hat{z}} \\ \color{red}{\pdv{x}} & \color{green}{\pdv{y}} & \color{blue}{\pdv{z}} \\ \color{red}{v_x} & \color{green}{v_y} & \color{blue}{v_z} \end{vmatrix} \\ &=\color{red}{\hat\imath} \begin{vmatrix} \color{green}{\pdv{y}} & \color{blue}{\pdv{z}} \\ \color{green}{v_y} & \color{blue}{v_z} \end{vmatrix} – \color{green}{\hat\jmath} \begin{vmatrix} \color{red}{\pdv{x}} & \color{blue}{\pdv{z}} \\ \color{red}{v_x} & \color{blue}{v_z} \end{vmatrix} + \color{blue}{\hat z} \begin{vmatrix} \color{red}{\pdv{x}} & \color{green}{\pdv{y}} \\ \color{red}{v_x} & \color{green}{v_y} \end{vmatrix} \\ &=\left\langle \begin{array}{c} \color{green}{\pdv{y}} \color{blue}{v_z} – \color{blue}{\pdv{z}} \color{green}{v_y} \\ \color{blue}{\pdv{z}} \color{red}{v_x} – \color{red}{\pdv{x}} \color{blue}{v_z} \\ \color{red}{\pdv{x}} \color{green}{v_y} – \color{green}{\pdv{y}} \color{red}{v_x} \end{array} \right\rangle \end{align} $$


Find the curl for vector field \(\vec{v}(x,y,z)=\left\langle xy,yz,xz\right\rangle\) $$ \newcommand{pdv}[1]{\tfrac{\partial}{\partial #1}} \begin{align*} \nabla \times \vec{v} &= \left\langle \pdv{x}, \pdv{y}, \pdv{z} \right\rangle \times \left\langle xy,yz,xz\right\rangle \\ &= \begin{vmatrix} \hat\imath & \hat\jmath & \hat z \\ \pdv{x} & \pdv{y} & \pdv{z} \\ xy & yz & xz \end{vmatrix} \\ &= \left\langle \pdv{y}xz – \pdv{z}yz, -\left(\pdv{x}xz – \pdv{z}xy\right), \pdv{x}yz – \pdv{y}xy \right\rangle \\ &= \left\langle 0 – y, -(z – 0), 0 – x \right\rangle \\ &=\left\langle -y, -z, -x \right\rangle \end{align*} $$ The result is a vector.


If you prefer a visual explanation of divergence and curl, refer to YouTube



The length of a curve is called the arc length.

Arc length


The arc length of function graphs is explained using two examples.


Let \(f\) be a function of variable \(x\). $$ y = f(x) $$

We can approximate the length of a small segment \(\Delta s\) using the Pythagorean theorem. $$ \Delta s=\sqrt{(\Delta x)^2 + (\Delta y)^2} \nonumber $$

Adding all the segments gives us the approximate length of the curve $$ \sum\sqrt{(\Delta x)^2 + (\Delta y)^2} \nonumber $$

When we bring \(\Delta s\rightarrow 0\), the approximation becomes the accurate representation. To find the arc length \(s\), we sum all the segments.

The arc length of function graph follows as $$ \shaded{ s=\int\sqrt{(dx)^2 + (dy)^2} } \label{eq:functionint} $$ Note that the limits are conveniently omitted for now. The examples show how to add these.


Consider function \(f(x)\) between \(x=-1\) and \(x=1\) $$ y = f(x) = x^2 \label{eq:functiongraph} $$

Express \(dy\) in terms of \(dx\) in the example equation \(\eqref{eq:functiongraph}\) $$ \newcommand{dv}[1]{\tfrac{d}{d #1}} \begin{align*} y &= x^2 \\ \dv{x}y &= 2x \\ dy &= 2x\;dx \end{align*} $$

Substitute \(dy\) in the integral \(\eqref{eq:functionint}\), and place the bounds \(x=-1\) to \(1\) to find the curve length $$ \begin{align*} s &= \int_{-1}^{1}\sqrt{(dx)^2 + (2x\;dx)^2}\;dx \\ &=\int_{-1}^{1}\sqrt{1+4x^2}\;dx \end{align*} $$

Solve using wolframalpha returns approximately \(3.2671\).


Functions such as a circle on the (x,y) plane are more naturally described using polar coordinates. Consider the polar function of a circle between \(0\) and \(\pi\): $$ r=1 \;\land\; \theta \in \left[0,\pi\right) $$

Since the radius is 1, the value of \(\theta\) reflects the arc length \(\Delta s\) in radians. Bringing \(\Delta \theta\rightarrow 0\), we find the arc length by summing all the tiny segments: $$ s=\int d\theta \label{eq:polarint} $$

The arc length is found by placing the bounds \(x=-1\) to \(1\) in integral \(\eqref{eq:polarint}\). length $$ s = \int_{0}^{\pi}d\theta = \left[ \theta \right]_{0}^{\pi} = \pi \nonumber $$

Parametric curve

A parametric curve is a function with one-dimensional input and a multi-dimensional output.

Determining the length of a parametric curve is best described using an example:

Consider parametric curve \(f(t)\) $$ f(t) = \left\{ \begin{array}{l} f_x(t)=t^3-3t \\ f_y(t)=3t^2 \end{array} \right. \nonumber $$

Abbreviated this using vector notation $$ f(t) = \left\langle\; t^3-3t,\; 3t^2\; \right\rangle \label{eq:parmcurve} $$

What is the length of the curve between \(-1.5\) to \(1.5\)?

We find the arc length similar to function graphs using the integral \(\int\sqrt{(dx)^2+(dy)^2}\) where \(dx\) and \(dy\) represent the tiny change in \(x\) and \(y\) values from the start to the end of the line.

With parametric curves, since \(x\) and \(y\) are given as functions of \(t\), we write \(dx\) and \(dy\) in terms of \(dt\) by taking the derivative of these two functions. $$ \left\{ \begin{array}{ c l l } x=t^3-3t & \Rightarrow \frac{d}{dt}x = 3t^2-3 & \Rightarrow dx=(3t^2-3)\;dt \\ y=3t^2 & \Rightarrow \frac{d}{dt}y=6t & \Rightarrow dy=6t\;dt \end{array} \right. \nonumber $$

Putting these into the integral $$ \begin{align} \int\sqrt{(dx)^2+(dx)^2} &= \int\sqrt{((3t^2-3)dt)^2 + (6t\ dt)^2} \;dt \nonumber \\ &= \int\sqrt{(3t^2-3)^2 + (6t)^2} \;dt \nonumber \\ &= 3\int t^2+1 \;dt \label{eq:parametricfnc} \end{align} $$

Now everything is written in terms of \(t\). Place the bounds on the integral equation \(\eqref{eq:parametricfnc}\) $$ \begin{align*} 3\int_{-2}^{2} t^2+1 \;dt &= \left[ t^3+3t \right]_{-2}^{2} \\ &= (2^3-3(2)) – (3(-2)) \\ &= 28 \end{align*} $$

Linear differential equations

\(\)For a linear non-homogeneous differential equation with constant coefficients \(a_1\ldots a_n\) in the form

$$ \begin{align} \frac{\text{d}^nf(t)}{\text{d}t^n} + a_1\frac{\text{d}^{n-1}f(t)}{\text{d}t^{n-1}} + \cdots + a_{n-1}\frac{\text{d}f(t)}{\text{d}t} + a_nf(t)&=g(t)\nonumber\\[10mu] \overset{abbrev}{\Rightarrow}\quad f^{(n)}(t)+a_1f^{n-1}(t)+\cdots+a_{n-1}f'(t)+a_nf(t)&=g(t)\label{eq:bDV} \end{align} $$

the solution is the superposition of the natural response and the forced response of the system. In math speak, these are called the homogeneous solution \(f_h(t)\) and the particular solution \(f_p(t)\)

$$ \shaded{ f(t)=f_h(t)+f_p(t) } $$

To solve the linear non-homogeneous differential equation, we

  1. set the force \(g(t)=0\) and solve the natural response \(f_h(t)\),
  2. set the initial conditions \(f(0)=f^\prime(0)=f^{\prime\prime}(0)=\ldots=0\) and solve the forced response \(f_p(t)\),
  3. sum the forced response to the natural response to get the total response,
  4. use the initial conditions to resolve any constants.

Natural (homogeneous solution)

The natural response, \(f_h(t)\), is the behavior of a circuit due to initial conditions, but without input force. We suppress the input force \(g(t)=0\) and solve just the circuit itself. This makes the non-homogeneous differential equation \(\eqref{eq:bDV}\) into a homogeneous differential equation.

$$ {f_h}^{(n)}(t)+a_1{f_h}^{(n-1)}(t)+\cdots+a_{n-1}{f_h}^\prime(t)+a_nf_h(t) = 0 \label{eq:bDVh} $$

Leonhard Euler, in E10 (1728) and E62 (1739), realized that general homogeneous solutions have the form

$$ \shaded{ f_{h,i}=e^{pt} } $$

where \(p\in \mathbb{C}\). Substituting \(f_h(t)=\mathrm{e}^{pt}\) in homogeneous differential equation \(\eqref{eq:bDVh}\) gives the so-called characteristic equation

$$ \begin{align} p^n\mathrm{e}^{pt}+ a_1 p^{n-1}\mathrm{e}^{pt} + \cdots + a_n\mathrm{e}^{pt}&=0,&\div \mathrm{e}^{pt} \nonumber \\ \Rightarrow\quad p^n+ a_1 p^{n-1} + \cdots + a_n&=0 \end{align} $$

Substituting any of the roots of the polynomial \(p_1,p_2,\ldots p_n\) in \(\mathrm{e}^{pt}\) results in a solution base \(f_i(t)=\mathrm{e}^{p_it}\).

Given that homogeneous linear differential equations obey the superposition principle, any linear combination of these functions also satisfies the differential equation. Therefore, combining the \(n\) linear independent solutions \(f_1(t), f_2(t),\ldots,f_n(t)\), leads to the homogeneous solution with real arbitrary constants \(c_1,c_2,\ldots,c_n\)

$$ \shaded{ f_h(t)=c_1f_1(t)+c_2f_2(t)+\cdots+c_nf_n(t) } $$


  1. For double roots, or in general when a root \(p_i\) has multiplicity \(m\), the solution base is \(f(t)=t^{k-1}\,\mathrm{e}^{p_it}\) where \(k\in {0,1,\ldots,m-1}\).
  2. If the differential equation \(\eqref{eq:bDV}\) has real coefficients \(a_i\), complex solutions for \(p\) will only occur in complex conjugate pairs. The real-valued solutions are obtained by replacing each pair with their real-valued linear combinations \(\Re(f_1)\) and \(\Im(f_2)\), as in
    $$ \begin{align} \Re(f_1)=\frac{f_1+f_2}{2}\\ \Im(f_1)=\frac{f_1-f_2}{2j} \end{align} $$
    Applying Euler’s trigonometry identities, these will solutions will turn into \(\cos\) and \(\sin\) terms.

The example below shows how an homogeneous linear differential equation is solved.


Assume an homogeneous linear differential equation

$$ \DeclareMathOperator*{\dprime}{\prime\prime} \DeclareMathOperator*{\tprime}{\prime\prime\prime} \DeclareMathOperator*{\qprime}{\prime\prime\prime\prime} f^{\qprime}(t) – 2f^{\tprime}(t) + 2f^{\dprime}(t) – 2f^{\prime}(t) + f(t) = 0 $$

The characteristic equation and its factorized form follow as

$$ \begin{align} p^4-2p^3+2p^2-2p+1&=0\nonumber \\[6mu] \Rightarrow\quad (p-j)\,(p+j)\,(p-1)^2& = 0 \end{align} $$

the solution basis becomes

$$ \begin{align} f_1 &= \mathrm{e}^{jt}&\text{based on }p_1=j \nonumber \\ f_2 &= \mathrm{e}^{-jt}&\text{based on }p_2=-j \nonumber \\ f_3 &= \mathrm{e}^{t}&\text{based on }p_3=p_4=1 \nonumber \\ f_4 &= t\mathrm{e}^{t}&\text{based on }p_3=p_4=1 \nonumber \end{align} $$

Using Euler’s trigonometry identities

$$ \begin{align} \Re(f_1) &= \frac{f_1+f_2}{2}=\frac{\mathrm{e}^{jt}+\mathrm{e}^{-jt}}{2}=\cos(t) \nonumber \\ \Im(f_1) &= \frac{f_1-f_2}{2j}=\frac{\mathrm{e}^{jt}-\mathrm{e}^{-jt}}{2j}=\sin(t) \nonumber \end{align} $$

simplifies the solution basis to

$$ \begin{align} f_1 &= \cos(t)&\text{based on }p_1={p_2}^*=j\nonumber\\ f_2 &= \sin(t)&\text{based on }p_1={p_2}^*=j\nonumber\\ f_3 &= \mathrm{e}^{t}&\text{based on }p_3=p_4=1\nonumber\\ f_4 &= t\,\mathrm{e}^{t}&\text{based on }p_3=p_4=1\nonumber \end{align} $$

The homogeneous solution follows as a linear combination

$$ f_h(t) = c_1\cos (t) + c_2\sin(t) + c_3\,\mathrm{e}^{t} + c_4\,t\,\mathrm{e}^{t} $$
where the constants \(c_{1,\ldots,4}\) follow from the initial conditions.

Forced (particular solution)

The forced response \(f_p(t)\), is the part of the response caused directly by the input force assuming all initial conditions are zero.

$$ {f_p}^{(n)}(t)+a_1{f_p}^{(n-1)}(t)+\cdots+a_{n-1}{f_p}^\prime(t)+a_nf_p(t)=g(t)\label{eq:bDVp} $$

The solution for the forced response is usually a scaled version of the input. In the examples below we will show two methods of finding the particular solution. As you will learn the using a complex forcing function is the easiest way of obtaining the particular solution. In all other cases, we find \(f_p(t)\) by either the method of undetermined coefficients or the variation of parameters method. [link]

The particular solution is typically found using trigonometry identities, as shown in the examples in RC Low-pass Filter Appendix B, and RC Low-pass Filter Appendix B. Even for these linear first order linear systems this is a fairly painstaking process. Here we explain a less involved method to find the response to a sinusoid forcing function.

Complex superposition

The superposition property:
When two signals are added together and forced on a linear system, the system response is the same as if one had forced each signal through the system separately and then added the responses.

The linearity of the system implies that if we use an input of the form \(=\hat{u}\cos(\omega t)\) then the output will have the same frequency but with a different phase and amplitude. As shown in the table below, if we scale the input by a factor \(k\) then the output will scaled by the same factor. This applies even when that factor is the imaginary number \(j\).

Linear system
input output
$$\hat{u}\cos(\omega t+\theta)\nonumber$$ $$\longrightarrow\nonumber$$ $$A\hat{u}\cos(\omega t+\phi)\nonumber$$
$$\color{olive}{k}\hat{u}\cos(\omega t+\theta)\nonumber$$ $$\longrightarrow\nonumber$$ $$\color{olive}{k}A\hat{u}\cos(\omega t+\phi)\nonumber$$
$$\color{blue}{j}\hat{u}\cos(\omega t+\theta)\nonumber$$ $$\longrightarrow\nonumber$$ $$\color{blue}{j}A\hat{u}\cos(\omega t+\phi)\nonumber$$

By the superposition principle of linear systems, a forcing function of a summed \(\cos\) and \(j\sin\), will produce a scaled response of \(\cos\) and \(j\sin\)

$$ \hat{u}\cos(\omega t) + \color{green}{j}\,\hat{u}\sin(\omega t) \longrightarrow A\cos(\omega t+\phi) + \color{green}{j}A\sin(\omega t+\phi) $$

By applying Euler’s formula

$$ \mathrm{e}^{j\varphi}=\cos\varphi+j\sin\varphi\nonumber $$

the complex input \(\underline{u}(t)\) and output \(\underline{f}(t)\) can be expressed as

$$ \underline{u}(t)=\hat{u}\,\mathrm{e}^{j\omega t} \longrightarrow \underline{f}(t)=A\,\mathrm{e}^{j(\omega t+\phi)} $$

Even if the forcing function is only the real-part of \(\underline{u}\), to derive the system response we may assume that it is the mathematically more convenient \(\underline{u}(t)\) even though that also includes an imaginary part, for as long as we ignore the imaginary part of the response.

In other words: if the forcing function is a \(\hat{u}\cos(\omega t)\), we may pretend that the forcing function is \(\underline{u}(t)=\hat{u}\cos(\omega t)+\color{green}{j}\,\hat{u}\sin(\omega t)=\hat{u}\,\mathrm{e}^{j\omega t}\), derive the response and then consider only the real part of the complex solution.

An example can be found under the heading “complex arithmetic method” in the examples in RC Low-pass Filter Appendix B. The main section of that article describes the use of an even more convenient method using a Laplace Transform.

Laurent series

\(\)Explains the Laurent series, a representation of a complex function f(z). Named after Pierre Alphonse Laurent, a French mathematician and Military Officer, published in the series 1843.

Unlike the Taylor series which expresses \(f(z)\) as a series of terms with non-negative powers of \(z\), a Laurent series includes terms with negative powers. Therefore, a Laurent series may be used in cases where a Taylor expansion is not possible.

$$ f(z)=\sum _{n=-\infty }^{\infty }a_{n}(z-c)^{n} $$

where the \(a_n\) and \(c\) are constants defined by

$$ a_{n}={\frac {1}{2\pi i}}\oint _C{\frac {f(z)\,\mathrm {d} z}{(z-c)^{n+1}}} $$

The contour \(C\) is counterclockwise around a closed, enclosing \(c\) and lying in an annulus \(A\) in which \(f(z)\) analytic.

To calculate, use the standard and modified geometric series

$$ \frac{1}{1-z}= \left\{ \begin{align} \sum_{n=0}^{\infty}&\ z^n,&&|z|\lt1\nonumber\\ -\sum_{n=1}^{\infty}&\ z^{-n},&&|z|\gt1\nonumber \end{align}\nonumber \right.\nonumber $$

Here \(f(z)=\frac{1}{1-z}\) is analytic everywhere apart from the singularity at \(z=1\). Above are the expansions for \(f(z)\) in the regions inside and outside the unit circle, centered on \(z=0\), where \(|z|\lt1\) is the region inside the circle and \(|z|\gt1\) is the region outside the circle.

Binomial theorem and series


$$ f(x)=(a+x)^r\label{eq:axr} $$

Recall the MacLaurin Series

$$ \begin{align} f(x)&=\sum _{k=0}^{\infty }{\frac {f^{(k)}(0)}{k!}}\,x^{k}\nonumber \end{align} \nonumber $$

The \(k\)th derivative of equation \(\eqref{eq:axr}\)

$$ f^{(k)}(x) = r\,(r-1)\cdots(r-k+1)(a+x)^{r-k} $$

Substitute \(x=0\) to find the derivatives at \(0\)

$$ f^{(k)}(0)=r\,(r-1)\cdots(r-k+1)\,a^{r-k} $$

Apply the MacLaurin Series to equation \(\eqref{eq:axr}\)

$$ \begin{align} f(x)=(a+x)^r&=\sum _{k=0}^{\infty }\frac{r\,(r-1)\cdots(r-k+1)}{k!}\,a^{r-k}\,x^k \end{align} $$

Isaac Newton generalized binomial theorem for \(r\in\mathbb{C}\)

$$ \shaded{ (a+x)^r=\sum_{k=0}^{\infty}{r \choose k}\,a^{r-k}\,x^k,\quad\text{where }{r \choose k}=\frac{r\,(r-1)\cdots(r-k+1)}{k!} } \label{eq:newton} $$

The binomial coefficient \({r \choose k}\)

$$ \begin{align} \frac{(r)_k}{k!}&=\frac{r(r-1)(r-2)\cdots (r-k+1)}{k(k-1)(k-2)\cdots1} \nonumber \\ &= \prod _{i=1}^{k}\frac{(r-(i-1))}{i} = \prod _{i=0}^{k-1}\frac{r-i}{i} \end{align} $$

The series converges for \(r\geq0\land r\in\mathbb{N}\), or for \(|x|\lt|a|\)

$$ \begin{align} (a+x)^r&=\sum_{k=0}^{\infty}{r \choose k}\,a^{r-k}\,x^k\nonumber\\ &=a^r+r\,a^{r-1}\,x+\frac{r(r-1)}{2!}\,a^{r-2}\,x^2+\frac{r(r-1)(r-2)}{3!}\,a^{r-3}\,x^3+\cdots \end{align} $$

Binomial series

Consider equation \(\eqref{eq:newton}\) for \(a=1\), gives the Binomial series

$$ \shaded{ (1+x)^r=\sum_{k=0}^{\infty}{r \choose k}\,x^k,\quad\text{where }{r \choose k}=\frac{r\,(r-1)\cdots(r-k+1)}{k!} } $$

This series converges when

  • \(|x|\lt1\), converges absolutely for any complex number \(r\).
  • \(|x|\gt1\), converges only when \(r\) is a non-negative integer, what makes the series finite.

Special cases

1) where \(a=1\), converges for \(|x|\lt1\)

$$ \begin{align} (1+x)^{r} &= \sum_{k=0}^{\infty}\frac{(r)_k}{k!}\,x^k \nonumber \\ &= 1+r\,x+\frac{r(r-1)}{2!}\,x^2+\frac{r(r-1)(r-2)}{3!}\,x^3+\cdots \end{align} $$

2) the negative binomial series, converges for \(|x|\lt1\)

Apply the Negated Upper Index of Binomial Coefficient identity \({r \choose k}=(-1)^k{k-r-1 \choose k}\)

$$ \begin{align} (a+x)^r&=\sum_{k=0}^{\infty}{r \choose k}\,a^{r-k}\,x^k\nonumber\\ &=\sum_{k=0}^{\infty}{k-r-1 \choose k}(-1)^k\,\,a^{r-k}\,x^k \end{align} $$

Substitute \(x\to -x\) and \(m\to -m\)

$$ \begin{align} (a-x)^{-r}&=\sum_{k=0}^{\infty}{k+r-1 \choose k}(-1)^k\,\,a^{-r-k}\,(-x)^k\nonumber\\ &=\sum_{k=0}^{\infty}{k+r-1 \choose k}\cancel{(-1)^k}\,\,a^{-r-k}\,\cancel{(-1)^k}\,x^k\nonumber\\ &=\sum_{k=0}^{\infty}{k+r-1 \choose k}\,a^{-r-k}\,x^k \end{align} $$

For \(a=1\)

$$ \begin{align} (1-x)^{-r}&=\sum_{k=0}^{\infty}{k+r-1 \choose k}\,x^k\nonumber\\ \end{align} $$

Geometric series

Rhind Papyrus problem 79 on papyrus

Rhind Papyrus, problem 79
Rhind Papyrus, problem 79
Geometric series are commonly attributed to, philosopher and mathematician, Pythagoras of Samos. However, they already appeared in one of the oldest Egyptian mathematical documents, the Rhind Papyrus around 1550 BC.\(\)

Summation of geometric sequence


A sequence is a list of numbers or terms. In a geometric sequence, each term is found by multiplying the previous term by a constant non-zero number. For example the geometric sequence \(\{2, 6, 18, 54, \ldots\}\). The general form of a geometric sequence is

$$ \{a, ar, ar^2, ar^3, ar^4, \ldots\} $$

The sum of all the terms, is called the summation of the sequence. The summation of an infinite sequence of values is called a series.

Historian Moritz Cantor translated problem 79 from the Rhind Papyrus as

An estate consisted of seven houses; each house had seven cats; each cat ate seven mice; each mouse ate seven heads of wheat; and each heat of wheat was capable of yielding seven hekat measures of grain. Houses, cats, mice, heads of wheat, and hekat measures of grain, how many of these in all were in the estate?

scribe Ahmes, Rhind Papyrus problem #79, translated by Moritz Cantor

The solution in the left column of the Papyrus suggests scribe Ahmes’ understanding of geometric sequences.

Problem 79 as a power series
object count count
Houses \(7\) \(7^1\)
Cats \(49\) \(7^2\)
Mice \(343\) \(7^3\)
Heads of wheat \(2,301\) \(7^4\)
Hekat measures \(16,807\) \(7^5\)
sum \(19,607\) \(19,607\)

To find the sum of the sequence \(7, 49, 343, 2401, 16807\), Ahmes approached it as \(7(1+7+49+343+2401)\). Since the sum of the terms inside the parentheses is \(2801\), he only had to multiply this number by \(7\), thinking of \(7\) as \(1+2+4\) so he could use repeated addition to do the multiplication

Since the first term of the geometric sequence \(7\) is equal to the common ratio of multiplication, the finite geometric series can be reduced to multiplications involving the finite series having one less term. In modern notation:

$$ \sum_{k=1}^n7^k = 7\left(1+\sum_{k=1}^{n-1}7^k\right) $$

Leonardo Fibonacci (1170-1250 AD) described a similarly amusing problem:

There are seven old women on the road to Rome. Each woman has seven mules; each mule carries seven sacks; each sack contains seven loaves; with each loaf are seven knives; and each knife is in seven sheaths. Women, mules, sacks, loaves, knives and sheaths, how many are there in all on the road to Rome.” Leonardo Fibonacci, Liber Abaci, 1202 AD

Even more recently:

As I was going to St. Ives, I met a man with seven wives; each wife had seven sacks, each sack had seven cats, each cat had seven kits. Kits, cats, sacks, and wives. How many were there going to St. Ives? Traditional nursery rhyme, 1730 AD

In general we write a geometric sequence, where \(a\) is the first term, \(r\) is the common ratio and \(m\) is the total number of terms.

$$ \{a,ar,ar^2,ar^3,\ldots, ar^{m-2},ar^{m-1}\} $$

The summation of that geometric sequence is

$$ S\triangleq a+ar+ar^2+ar^3+\ldots + ar^{m-2}+ar^{m-1}=\sum_{n=0}^{m-1} ar^n,\ \ \forall_{|r|>0}\label{eq:finite1} $$

To find the summation, multiply \(\eqref{eq:finite1}\) by \(r\)

$$ Sr=ar+ar^2+ar^3+ar^4+\ldots + ar^{m-1}+ar^{m} \label{eq:finite2} $$

subtract \(\eqref{eq:finite2}\) from \(\eqref{eq:finite1}\), so that all middle terms cancel out

$$ \begin{align} S-Sr &= a-ar^{m}\nonumber\\ S(1-r) &=a(1-r^m)\nonumber\\ S &=a\left(\frac{1-r^{m}}{1-r}\right) \end{align} $$

The summation of the geometric sequence follows as

$$ \shaded{ \sum_{n=0}^{m-1}ar^n=a\left(\frac{1-r^{m}}{1-r}\right) } \label{eq:finitegreometricseries} $$

Power series

A power series is the sum of an infinite sequence of the form

$$ \sum_{n=0}^\infty a_n(r-c)^n=a_0+a_1(r-c)^1+a_2(r-c)^2+\ldots $$

where \(a\) are coefficients independent on \(r\), \(c\) is a constant.

In many situations \(c=0\) and the coefficients are the same (\(a_n=a\)), so that power series takes the form

$$ \sum_{n=0}^\infty a r^n=a+ar^1+ar^2+\ldots \label{eq:power0} $$

Equation \(\eqref{eq:power0}\) resembles \(\eqref{eq:finitegreometricseries}\) for \(m\to\infty\)

$$ \begin{align} \lim_{m\to\infty}\sum_{n=0}^{m-1}ar^n=\lim_{m\to\infty}a\left(\frac{1-r^{m}}{1-r}\right)\nonumber\\ a\sum_{n=0}^{\infty}r^n=a\underbrace{\lim_{m\to\infty}\left(\frac{1-r^{m}}{1-r}\right)} \end{align} $$

The value of \(r\) in the right term determines when the series converges

$$ \lim_{m\to\infty}\left(\frac{1-r^{m}}{1-r}\right)= \begin{cases} \text{doesn’t exist}&r\leq-1\\ \frac{1}{1-r} & -1\lt r\lt 1\\ \infty & r\geq1 \end{cases} $$

The series converges for \(|r|\lt1\), and the formula for the finite geometric series follows

$$ \begin{align} \shaded{\frac{1}{1-r}=\sum_{n=0}^{\infty}r^n},&&|r|\lt1 \label{eq:geoseries} \end{align} $$

What if \(r\gt 1\)?

$$ \begin{align} \frac{1}{1-r} &=\frac{r^{-1}}{r^{-1}-1},&|r|\gt1\nonumber\\ &=-r^{-1}\frac{1}{1-r^{-1}},&|r|\gt1 \label{eq:gt0} \end{align} $$

Substitute \(r=a^{-1}\) in \(\eqref{eq:geoseries}\)

$$ \begin{align} \sum_{n=0}^{\infty}a^{-n}&=\frac{1}{1-a^{-1}},&|r|\gt1\label{eq:gt1} \end{align} $$

Apply \(\eqref{eq:gt1}\) to \(\eqref{eq:gt0}\) gives the converging series for \(|r|\gt1\), the so called modified finite geometric series

$$ \begin{align} \shaded{ \frac{1}{1-r}=-r^{-1}\sum_{n=0}^{\infty}r^{-n}=-\sum_{n=1}^{\infty}r^{-n} },&&|r|\gt1\\ \end{align} $$

Exponential function

Power series are often the result of a Taylor series expansion. A Taylor series represents a function as an infinite sum of terms that are calculated from the function’s derivatives at one point.

$$ f(x)=\frac{f(a)}{0!}+\frac{f^\prime(a)}{1!}(x-a)+\frac{f^{\prime\prime}(a)}{2!}(x-a)^2+\ldots=\sum_{n=0}^{\infty}\frac{f^{(n)}(a)}{n!}(x-a)^n $$

To do Taylor’s expansion of the function \(\mathrm{e}^x\), we start with its definition

$$ \left(\mathrm{e}^x\right)^\prime=\frac{\text{d}}{\text{d}y}\mathrm{e}^x=\mathrm{e}^x $$

Taylor expansion of \(\mathrm{e}^x\) at \(a=0\) using \(\mathrm{e}^x=\left(\mathrm{e}^x\right)^{\prime}=\left(\mathrm{e}^x\right)^{\prime\prime}=\ldots=\left(\mathrm{e}^x\right)^{(n)}\) and \(\mathrm{e}^0=1\)

$$ \begin{align} \mathrm{e}^x(x)&=\frac{\mathrm{e}^0}{0!}+\frac{\mathrm{e}^0}{1!}x+\frac{\mathrm{e}^0}{2!}x^2+\frac{\mathrm{e}^0}{3!}x^3+\ldots\nonumber\\ &=\frac{1}{0!}+\frac{x}{1!}+\frac{x^2}{2!}+\frac{x^3}{3!}+\ldots \end{align} $$


$$ \mathrm{e}^x = \sum_{n=0}^{\infty}\frac{x^n}{n!} $$

Therefore, the constant \(\mathrm{e}=\mathrm{e}^1\) is

$$ \shaded{ \mathrm{e}=\sum_{n=0}^{\infty}\frac{1}{n!} } \approx 2.71828 $$


Another common power series arises from the Taylor expansion of \(\sin(x)\) at \(x=0\). To expand this, we need to examine the nth derivative of \(sin(x)\) at \(x=0\)

$$ sin^{(n)}(0) = \begin{cases} 0 & n\bmod4=0\\ 1 & n\bmod4=1\\ 0 & n\bmod4=2\\ -1 & n\bmod4=-1 \end{cases} $$


$$ \sin(x) = \sum_{n=0}^\infty \frac{(-1)^n}{(2n+1)!}x^{2n+1}=x-\frac{x^3}{3!}+\frac{x^5}{5!}-\frac{x^7}{7!}+\ldots $$

Copyright © 1996-2022 Coert Vonk, All Rights Reserved