My notes of the excellent lectures of “Denis Auroux. 18.02 Multivariable Calculus. Fall 2007. Massachusetts Institute of Technology: MIT OpenCourseWare, License: Creative Commons BY-NC-SA.”

Matrices can be used to express linear relations between variables. For example when we change coordinate systems from eg. \((x_1,x_2,x_3)\) to \((u_1,u_2,u_3)\) where $$ \left\{ \begin{align} u_1 &= 2x_1+3x_2+3x_3 \nonumber \\ u_2 &= 2x_1+4x_2+5x_3 \nonumber \\ u_3 &= x_1+x_2+2x_3 \nonumber \end{align} \right. \label{eq:linear} $$

Expressed as matrix product $$ \begin{align*} \underbrace{ \left[ \begin{matrix} 2 & 3 & 3 \\ 2 & 4 & 5 \\ 1 & 1 & 2 \end{matrix} \right] }_{A}\; \underbrace{ \left[ \begin{matrix} x_1 \\ x_2 \\ x_3 \end{matrix} \right] }_{X} &= \underbrace{ \left[ \begin{matrix} u_1 \\ u_2 \\ u_3 \end{matrix} \right] }_{U} \ A X &= U \end{align*} $$ Here \(A\) is a \(3\times 3\) matrix, and \(X\) is a vector or a \(3\times 1\) matrix.

Matrix Multiplication


The entries in \(A X\) are the dot-product between the rows in \(A\) and the columns in \(X\), as shown below
matrix multiplication

For example, the entries of \(AB\) are $$ \left[ \begin{matrix} 1 & 2 & 3 & 4 \\ \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot \end{matrix} \right]\; \left[ \begin{matrix} 0 & \cdot \\ 3 & \cdot \\ 0 & \cdot \\ 2 & \cdot \end{matrix} \right] = \left[ \begin{matrix} 14 & \cdot \\ \cdot & \cdot \\ \cdot & \cdot \end{matrix} \right] $$


  • The width of \(A\) must equal the height of \(B\).
  • The product \(AB\) has the same height as \(A\) and the same width as \(B\).
  • Product \(AB\) represents: do transformation \(B\), then transformation \(A\). Unfortunately, you multiply from right to left. Similar to \(f(g(x))\), where you first apply \(g\) and then \(f\). The product \(BA\) is not even be defined when the width of \(B\) is not equal to the height of \(A\). In other words \(AB\ne BA\)
  • They are well behaved associative products: \((AB)X =A(BX)\)
  • \(BX\) means we apply transformation \(B\) to \(X\).

Identity matrix


The identify matrix is a matrix that does no transformation: \(IX=X\)

The height of \(I\) needs to match the width of \(X\). \(I\) has \(1\)’s on the diagonal, and \(0\)’s everywhere else. $$ I_{n\times n} = \left[ \begin{matrix} 1 & & & \ldots & 0 \\ & 1 & & & \vdots \\ & & 1 & & \\ \vdots & & & \ddots & \\ 0 & \ldots & & & 1 \end{matrix} \right] \nonumber $$

For example: $$ I_{3\times3} = \left[ \begin{matrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{matrix} \right] \nonumber $$


Matrix \(R\), gives a \(\frac{\pi}{2}\) rotation. $$ R = \left[ \begin{matrix} 0 & -1 \\ 1 & 0 \end{matrix} \right] \nonumber $$

In general $$ R \left[ \begin{matrix} x \\ y \end{matrix} \right] = \left[ \begin{array}{r} -y \\ x \end{array} \right] \nonumber $$

Try multiplying with unity vector \(\hat\imath\), \(\hat\jmath\), or take \(R\) squared $$ \begin{align*} R\; \hat\imath &= \left[ \begin{array}{rr} 0 & -1 \\ 1 & 0 \end{array} \right] \left[ \begin{matrix} 1 \\ 0 \end{matrix} \right] = \left[ \begin{matrix} 0 \\ 1 \end{matrix} \right] = \hat\jmath \\ R\;\hat\jmath &= \left[ \begin{array}{rr} 0 & -1 \\ 1 & 0 \end{array} \right] \left[ \begin{matrix} 0 \\ 1 \end{matrix} \right] = \left[ \begin{array}{r} -1 \\ 0 \end{array} \right] = -\hat\imath \\ R^2 &= \left[ \begin{array}{rr} 0 & -1 \\ 1 & 0 \end{array} \right] \left[ \begin{array}{rr} 0 & -1 \\ 1 & 0 \end{array} \right] = \left[ \begin{array}{rr} -1 & 0 \\ 0 & -1 \end{array} \right] = -I_{2\times 2} \end{align*} \nonumber $$

Inverse Matrix


The inverse of matrix \(A\) is \(A^{-1}\) such that $$ \shaded{ \left\{ \begin{align*} A\;A^{-1} &= I \\ A^{-1}\;A &= I \end{align*} \right. } \nonumber $$
That implies that \(A\) must be a square matrix (\(n \times n\)).

Referring to the system of equations \(\eqref{eq:linear}\), to express variables \(u_i\) in terms of \(x_i\) values, we need to inverse the transformation. For instance: in \(AX=B\); let matrix \(A\) and \(B\) be known what is \(X\)? $$ \begin{align*} AX &= B \Rightarrow \\ A^{-1}(AX) &= A^{-1} B \Rightarrow \\ IX &= A^{-1} B \Rightarrow \\ X &= A^{-1} B \end{align*} $$


The inverse matrix is calculated using the adjoined matrix

$$ A^{-1}=\frac{1}{\mathrm{det}(A)}\;\mathrm{adj}(A) \nonumber $$

For this \(3\times 3\) example $$ A=\left[ \begin{matrix} 2 & 3 & 3 \\ 2 & 4 & 5 \\ 1 & 1 & 2 \end{matrix} \right] $$

First, find the determinant of \(A\) $$ \det(A)= \left| \begin{array}{rrr} 2 & 3 & 3 \\ 2 & 4 & 5 \\ 1 & 1 & 2 \end{array} \right| = 3 \nonumber $$

Second, find the minors (matrix of determinants) of matrix \(A\) $$ \mathrm{minors} = \left[\begin{array}{rrr} \left|\begin{array}{rrr} 4 & 5 \\ 1 & 2 \end{array}\right| & \left|\begin{array}{rrr} 2 & 5 \\ 1 & 2 \end{array}\right| & \left|\begin{array}{rrr} 2 & 4 \\ 1 & 1 \end{array}\right| \\ \left|\begin{array}{rrr} 3 & 3 \\ 1 & 2 \end{array}\right| & \left|\begin{array}{rrr} 2 & 3 \\ 1 & 2 \end{array}\right| & \left|\begin{array}{rrr} 2 & 3 \\ 1 & 1 \end{array}\right| \\ \left|\begin{array}{rrr} 3 & 3 \\ 4 & 5 \end{array}\right| & \left|\begin{array}{rrr} 2 & 3 \\ 2 & 5 \end{array}\right| & \left|\begin{array}{rrr} 2 & 3 \\ 2 & 4 \end{array}\right| \end{array}\right] = \left[\begin{array}{rrr} 3 & -1 & -2 \\ 3 & 1 & -1 \\ 3 & 4 & 2 \end{array}\right] \nonumber $$

Third, find the cofactors. Flip the signs checker board $$ \begin{array}{rrr} + & – & + \\ – & + & – \\ + & – & + \end{array} \nonumber $$ A ‘\(+\)’ means leave it alone. A ‘\(-\)’ means flip the sign. Apply the cofactors to the minors. $$ \left[\begin{array}{rrr} 3 & 1 & -2 \\ -3 & 1 & 1 \\ 3 & -4 & 2 \end{array}\right] \nonumber $$

Fourth, transpose (switch rows and columns) to find the adjoined matrix \(\mathrm{adj}(A)\). $$ \mathrm{adj}(A) = \left[\begin{array}{rrr} 3 & -3 & 3 \\ 1 & 1 & -4 \\ -2 & 1 & 2 \end{array}\right] \nonumber $$

The inverse matrix \(A^{-1}\) follows as $$ A^{-1} = \frac{1}{\det(A)}\;\mathrm{adj}(A) = \frac{1}{3} \left[\begin{array}{rrr} 3 & -3 & 3 \\ 1 & 1 & -4 \\ -2 & 1 & 2 \end{array}\right] \nonumber $$

Equations of planes

An equation of the form \(ax+by+cz=d\), expresses the condition for the point \((x,y,z)\) to be in the plane. It defines a plane.


Plane through the origin

Find the equation of the plane through the origin with normal vector \(\vec N = \left\langle 1, 5, 10 \right\rangle\).

Plane with normal vector

Point \(P=(x,y,z)\) is in the plane when \(\vec{OP}\perp\vec{N}\). Therefore, their dot-product must equal zero (see vectors). $$ \begin{align*} \overrightarrow{OP}\cdot\vec{N} = 0 \\ \Leftrightarrow \left\langle x, y, z \right\rangle \cdot \left\langle 1, 5, 10 \right\rangle = 0 \\ \Leftrightarrow x + 5y + 10z = 0 \end{align*} $$

Plane not through the origin

Find the equation of the plane through \(P_0=(2,1,-1)\) with normal vector \(\vec N = \left\langle 1, 5, 10 \right\rangle\).

The normal vector is the same as in the first example, therefore it will be the same plane, but shifted so that it passes through \(P_0\).

Shifted plane with normal vector

Point \(P=(x,y,z)\) is in the plane when \(\overrightarrow{P_0P}\perp\overrightarrow{N}\). Therefore, their dot-product must equal zero (see vectors). This vector \(\overrightarrow{P_0P}\) equals \(P-P_0\). $$ \begin{align*} \left\langle x-2, y-1, z+1 \right\rangle \cdot \left\langle 1, 5, 10 \right\rangle &= 0 \\ \Leftrightarrow (x-2)+5(y-1)+10(z+1) &= 0 \\ \Leftrightarrow \underline{1}x+\underline{5}y+\underline{10}z &= -3 \end{align*} $$

In the equation \(ax+by+cz=d\), the coefficients \(\left\langle a,b,c\right\rangle\) is the normal vector \(\vec{N}\). Constant \(d\) indicates how far the plane is from the origin.

How could we have found the \(-3\) more quickly?

The first part of the equation is based on the normal vector $$ x + 5y + 10z = d \label{eq:planeequations2a} $$

We know \(P_0\) is in the plane. Substituting \(\left\langle x,y,z\right\rangle=P_0\) in \(\eqref{eq:planeequations2a}\) $$ \begin{align*} 1(2)+5(1)+10(-1) &= d \\ \Leftrightarrow d &= -3 \end{align*} \nonumber $$

Parallel or perpendicular?

Are vector \(\vec{v}=\left\langle 1,2,-1 \right\rangle\) and plane \(x+y+3z=5\) parallel, perpendicular or neither?

Vector \(\vec{v}\) is perpendicular to the plane when \(\vec{v}\)=\(s\;\vec{N}\), where \(s\) is a scalar. The normal vector follows from the coefficients of the plane equation $$ \vec{N} = \left\langle 1,1,3 \right\rangle \nonumber $$ Therefore \(\vec{V}\) is not perpendicular to the plane.

If \(\vec{v}\) is perpendicular to \(\vec{N}\), it is parallel to the plane. \(\vec{v}\perp\vec{N}\) when the dot-product equals zero. (see vectors) $$ \begin{align*} \vec{v}\cdot\vec{N} &= \left\langle 2, 1, -1 \right\rangle \cdot \left\langle 1, 1, 3 \right\rangle \\ &= 1+2-3 = 0 \end{align*} $$ Therefore, \(\vec{v}\) is parallel to the plane.

Solving systems of equations

To solve a system of equations, you try to find a point that is on several planes at the same time.


Find the \(x,y,z\) that satisfies the conditions of the \(3\times 3\) linear system: $$ \left\{ \begin{align*} x+ z = 1 \\ x + y = 2 \\ x + 2y + 3z = 3 \end{align*} \right. $$

The first 2 equations represent two planes that intersect in line \(P_1\cap P_2\). The third plane intersects that line at the point \(P(x,y,z)\), the solution to the linear system.

3 planes – one solution


  • if the line \(P_1\cap P_2\) is contained in \(P_3\), there are infinite many solutions. (Any point on the line is a solution.)
  • if the line \(P_1\cap P_2\) is parallel to \(P_3\), then there are no solutions.

3 planes – infinite solutions
3 planes – no solutions

In matrix notation $$ \underbrace{ \left[\begin{array}{rrr} 1 & 0 & 1 \\ 1 & 1 & 0 \\ 1 & 2 & 3 \end{array}\right] }_{A}\; \underbrace{ \left[\begin{array}{ccc} x \\ y \\ z \end{array}\right] }_{X} = \underbrace{ \left[\begin{array}{rrr} 1 \\ 2 \\ 3 \end{array}\right] }_{B} \nonumber $$

The solution to \(AX=B\) is given by (see Inverse matrix) $$ X = A^{-1}B \nonumber $$


$$ A^{-1}=\frac{1}{\det (A)}\mathrm{adj}(A) \nonumber $$

This implies that matrix \(A\) is only invertible when $$ \shaded{ \det (A)\ne 0 } \nonumber $$


Homogeneous case

Homogeneous means that equations are invariant under scaling. In matrix notation: \(AX=0\).

For example: $$ \left\{ \begin{align*} x + z = 0 \\ x + y = 0 \\ x + 2y + 3z = 0 \end{align*} \right. $$

There is always the trivial solution: \((0,0,0)\).

3 planes – infinite solutions with normal vectors

Depending on the \(\det(A)\):

  • If the \(\det (A)\ne 0\): \(A\) can be inverted. \(AX=0 \Leftrightarrow X=A^{-1}.0=0\). No other solutions.
  • If the \(\det (A)= 0\): the determinant of \(\vec{N_1},\vec{N_2},\vec{N_3}\) equals \(0\). This implies that the plane’s normal vectors \(\vec{N_1}\), \(\vec{N_2}\) and \(\vec{N_3}\) are coplanar. A line through origin, perpendicular to plane of \(\vec{N_1}, \vec{N_2}, \vec{N_3}\) is parallel to all 3 planes and contained in them. Therefore there are infinite many solutions. To find the solutions, one can take the cross-product of two of the normals. It’s a nontrivial solution.

General case

The system $$ AX=B \nonumber $$

Depending on the \(\det(A)\)

  • if the \(\det {A}\ne 0\): there is an unique solution \(X=A^{-1}B\)
  • if the \(\det {A}=0\): either no solution, or infinitely many solutions. If you would solve it by hand and end up with \(0=0\), there are infinite solutions; if you end up with 1=2, there are no solutions.



My notes of the excellent lectures of “Denis Auroux. 18.02 Multivariable Calculus. Fall 2007. Massachusetts Institute of Technology: MIT OpenCourseWare, License: Creative Commons BY-NC-SA.”

Description will use a plane \(\mathbb{R}^2\), or space \(\mathbb{R}^3\), but the same principles apply to higher dimensions.

Vectors are commonly displayed on the \(xyz\)-axis, with unit vectors \(\hat\imath\, \hat\jmath, \hat k\).

\(x,y,z\)-axis and \(\hat\imath,\hat\jmath,\hat k\)-unit vectors

Vectors do not have a start point, but do have a magnitude (length) and direction. They are described in terms of the unit vectors \(\hat\imath, \hat\jmath, \hat k\), or using angle brackets notation. $$ \vec{A} = \hat\imath\;a_1 + \hat\jmath\;a_2 + \hat\;k a_3 = \left\langle \;a_1,\;a_2,\;a_3\; \right\rangle $$

You can find the length of a vector \(|\vec{A}|\), by applying the Pythagorean theorem twice. $$ \shaded{ |\vec{A}| = \sqrt{(a_1)^2 + (a_2)^2 + (a_3)^2} } \nonumber $$


\(\vec{A}\) rotated over \(\tfrac{\pi}{2}\)

Let \(\vec{A}=\left\langle a_1, a_2\right\rangle\), and let \(\vec{A}’\) be \(\vec{A}\) rotated over \(\frac{\pi}{2}\). Then $$ \shaded{ \vec{A}’=\left\langle -a_2, a_1\right\rangle } \label{eq:rotation} $$


Let \(\vec{A}=\left\langle a_1, a_2, a_3\right\rangle\), and \(\vec{B}=\left\langle b_1, b_2, b_3\right\rangle\). Then \(\vec{A}\) plus \(\vec{B}\) is defined as $$ \shaded{ \vec{A}+\vec{B} = \left\langle a_1+b_1, a_2+b_2, a_3+b_3 \right\rangle } \nonumber $$

Geometric, the sum is the vector to the corner of the parallelogram.

\(\vec{A} + \vec{B}\)

Scalar product

Let \(s\) be a scalar, and \(\vec{A}=\left\langle a_1, a_2, a_3\right\rangle\). Then the scalar product of \(s\) and \(\vec{A}\) is defined as $$ \shaded{ s\;\vec{A} = \left\langle s\;a_1, s\;a_2, s\;a_3\right\rangle } \nonumber $$

Geometrically, it makes the vector longer or shorter.



Let \(\vec{A}=\left\langle a_1, a_2, a_3\right\rangle\), and \(\vec{B}=\left\langle b_1, b_2, b_3\right\rangle\). The dot-product of \(\vec{A}\) and \(\vec{B}\) is defined as the scalar $$ \shaded{ \vec{A} \cdot \vec{B} = \sum_i a_i\,b_i = a_1 b_1 + a_2 b_2 + a_3 b_3 } \nonumber $$

For a geometric interpretation, start with the dot-product of \(\vec{A}\) with itself $$ \vec{A}\cdot\vec{A} = |\vec{A}|^2 \cos 0 = |\vec{A}|^2 \label{eq:vecsquare} $$

Let \(\vec{C}=\vec{A}-\vec{B}\), and expand \(|\vec{C}|^2\) by applying \(\eqref{eq:vecsquare}\) $$ \begin{align} |\vec{C}|^2 &= \vec{C} \vec{C} = \left(\vec{A} – \vec{B} \right) \cdot \left(\vec{A} – \vec{B} \right) \nonumber \\ &= \vec{A}\cdot\vec{A} – \vec{A}\cdot\vec{B} – \vec{B}\cdot\vec{A} + \vec{B}\cdot\vec{B} \nonumber \\ &= |\vec{A}|^2 + |\vec{B}|^2 – 2 \vec{A}\cdot\vec{B} \label{eq:expanded} \end{align} $$

Recall, the law of cosines from geometry.

$$ c^2 = a^2 b^2 – 2 a b\cos\theta \nonumber $$
Law of cosines

Apply the law of cosines to \(|\vec{A}|\), \(|\vec{B}|\) and \(|\vec{C}|\) $$ |\vec{C}|^2 = |\vec{A}|^2 + |\vec{B}|^2 – 2 |\vec{A}| |\vec{B}|\cos\theta \label{eq:lawofcos} $$

Combining equations \(\eqref{eq:expanded}\) and \(\eqref{eq:lawofcos}\) gives the geometric equation $$ \shaded{ \vec{A}\cdot\vec{B} = |\vec{A}|\,|\vec{B}|\, \cos\theta } \nonumber $$

The dot-product can be used to compute length and angles in \(\mathbb{R}^3\), or find components of \(\vec{A}\) along unit vector \(\hat u\) $$ \shaded{ \vec{A}\cdot \hat u } \nonumber $$


In 2 dimensions

Let \(\vec{A}=\left\langle a_1, a_2\right\rangle\) and \(\vec{B}=\left\langle b_1, b_2\right\rangle\). The \(\mathbb{R}^2\)-determinant is defined as $$ \shaded{ \begin{align*} \mathrm{det}(\vec{A}, \vec{B}) &= \left|\begin{matrix} a_1 & a_2 \\ b_1 & b_2 \\ \end{matrix}\right| \\ &= a_1b_2-a_2b_1 \end{align*} } \nonumber $$

In 3 dimensions

Let \(\vec{A}=\left\langle a_1, a_2, a_3\right\rangle\), \(\vec{B}=\left\langle b_1, b_2, b_3\right\rangle\) and \(\vec{C}=\left\langle c_1, c_2, c_3\right\rangle\). The \(\mathbb{R}^3\)-determinant is defined as $$ \shaded{ \begin{align*} \mathrm{det}(\vec{A}, \vec{B}, \vec{C}) &= \left|\begin{matrix} a_1 & a_2 & a_3 \\ b_1 & b_2 & b_3 \\ c_1 & c_2 & c_3 \end{matrix}\right| \\ &= a_1 \left|\begin{matrix} b_2 & b_3 \\ c_2 & c_3 \end{matrix}\right| – a_2 \left|\begin{matrix} b_1 & b_3 \\ c_1 & c_3 \end{matrix}\right| + a_3 \left|\begin{matrix} b_1 & b_2 \\ c_1 & c_2 \end{matrix}\right| \end{align*} } \nonumber $$

Area of a parallelogram

Let \(\vec{A}=\left\langle a_1, a_2\right\rangle\), and \(\vec{B}=\left\langle b_1, b_2\right\rangle\).

Area of triangle

The area of the parallelogram shown above is calculated as width \(\times\) height. $$ \mathrm{area}_\triangle = |\vec{A}| |\vec{B}| \sin\theta \label{eq:triangle} $$

Change from \(\sin\theta\) to \(\cos\theta\) so it fits the dot-product.


Obtain \(\vec{A}’\) by rotating \(\vec{A}\) over \(\frac{\pi}{2}\), see equation \(\eqref{eq:rotation}\). Apply \(sin\;\theta = \cos(\tfrac{\pi}{2}-\theta)\) $$ \left. \begin{array}{l} \theta ‘ = \tfrac{\pi}{2} – \theta \\ \cos(\tfrac{\pi}{2}-\theta) = \sin\theta \end{array} \right\} \Rightarrow \cos(\theta’) = sin(\theta) \label{eq:sincos} $$

Substitute \(\eqref{eq:sincos}\) in \(\eqref{eq:triangle}\) $$ \mathrm{area} = |\vec{A}’| \cdot |\vec{B}| \cos\theta = \tfrac{1}{2}\vec{A}’\cdot \vec{B} $$

Expand the dot-product between \(\vec{A}’\) and \(\vec{B}\), and find the determinant $$ \begin{align*} \mathrm{area} &= \left\langle -a_2, a_1 \right\rangle \cdot \left\langle b_1, b_2 \right\rangle \\ &= \left( a_1 b_2 – a_2 b_1 \right) \\ &= \left|\begin{array}{cc} a_1 & a_2 \\ b_1 & b_2 \end{array}\right| \end{align*} $$

The area of a parallelogram follows $$ \shaded{ \mathrm{area} = \mathrm{det}\left(\vec{A},\vec{B}\right) } \label{eq:area} $$


Let \(\vec{A}=\left\langle a_1, a_2, a_3\right\rangle\), and \(\vec{B}=\left\langle b_1, b_2, b_3\right\rangle\). The cross product of \(\vec{A}\) and \(\vec{B}\) in \(\mathbb{R}^3\) is defined as the pseudo determinant vector $$ \shaded{ \begin{align*} \vec{A}\times\vec{B} &= \left| \begin{array}{ccc} \hat\imath & \hat\jmath & \hat k \\ a_1 & a_2 & a_3 \\ b_1 & b_2 & b_3 \end{array} \right| \\ &= \hat\imath \left| \begin{array}{cc} a_2 & a_3 \\ b_2 & b_3 \end{array} \right| – \hat\jmath \left| \begin{array}{cc} a_1 & a_3 \\ b_1 & b_3 \end{array} \right| + \hat k \left| \begin{array}{cc} a_1 & a_2 \\ b_1 & b_2 \end{array} \right| \end{align*} } \nonumber $$


  • the area of the parallelogram from the vectors \(\vec{A}\) and \(\vec{B}\) is \(|\vec{A}\times\vec{B}|\)
  • the direction of \(\vec{A}\times\vec{B}\) is perpendicular to the plane of the parallelogram.

The direction of the vector \(|\vec{A}\times\vec{B}|\) is determined by the right-hand rule

Cross-product right-hand rule

For example: \(\hat\imath\times\hat\jmath=\hat k\) $$ \begin{align*} \hat\imath\times\hat\jmath &= \left| \begin{array}{ccc} \hat\imath & \hat\jmath & \hat k \\ 1 & 0 & 0 \\ 0 & 1 & 0 \end{array} \right| \\ &= \hat\imath \left| \begin{array}{cc} 0 & 0 \\ 1 & 0 \end{array} \right| – \hat\jmath \left| \begin{array}{cc} 1 & 0 \\ 0 & 0 \end{array} \right| + \hat z \left| \begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array} \right| \\ &= \hat z \end{align*} $$

Some properties

The right-hand rule shows that $$ \shaded{ \vec{A}\times\vec{B}=-\vec{B}\times\vec{A} } $$

The parallelogram of \(\vec{A}\times\vec{A}\) has area zero. $$ \vec{A}\times\vec{A}=\vec{0} $$

Volume in space

Let \(\vec{A}, \vec{B}, \vec{C}\) in space \(\mathbb{R}^3\).

Volume in space

The volume equals the area of the base times the height. The area base follows from equation \(\eqref{eq:area}\). The height is the component of \(\vec{A}\) that is perpendicular to the base. Call the direction perpendicular to the base unit vector \(\hat n\). $$ \mathrm{volume} = |\vec{B}\times\vec{C}|\;(\vec{A}\cdot\hat n) \label{eq:volume1} $$

The unit vector \(\hat n\) can be derived from the cross-product of \(\vec{B}\) and \(\vec{C}\). To make it a unit vector, we divide by its length. $$ \hat n = \frac{\vec{B}\times\vec{C}}{|\vec{B}\times\vec{C}|} \nonumber $$

Substitute this back in \(\eqref{eq:volume1}\) $$ \begin{align*} \mathrm{volume} &= \bcancel{|\vec{B}\times\vec{C}|}\;\left(\vec{A}\cdot \frac{\left(\vec{B}\times\vec{C}\right)}{\bcancel{|\vec{B}\times\vec{C}|}}\right) \\ &= \vec{A}\ \cdot\ \left(\vec{B}\times\vec{C}\right) \end{align*} $$

This equals the determinant of \(\vec{A}, \vec{B}, \vec{C}\), the so called “triple product” rule $$ \shaded{ \mathrm{det}\left(\vec{A},\vec{B},\vec{C}\right) =\vec{A}\ \cdot\ \left(\vec{B}\times\vec{C}\right) } \label{eq:tripleproduct} $$

Because $$ a_1 \left| \begin{matrix} b_2 & b_3 \\ c_2 & c_3 \end{matrix} \right| – a_2 \left| \begin{matrix} b_1 & b_3 \\ c_1 & c_3 \end{matrix} \right| + a_3 \left| \begin{matrix} b_1 & b_2 \\ c_1 & c_2 \end{matrix} \right| \ = \left\langle a_1, a_2,a_3 \right\rangle \cdot \ \left( \hat\imath \left| \begin{array}{cc} b_2 & b_3 \\ c_2 & c_3 \end{array} \right| – \hat\jmath \left| \begin{array}{cc} b_1 & b_3 \\ c_1 & c_3 \end{array} \right| + \hat k \left| \begin{array}{cc} b_1 & b_2 \\ c_1 & c_2 \end{array} \right| \right) \nonumber $$

The volume in space described by \(\vec{A}\), \(\vec{B}\) and \(\vec{C}\) follows as $$ \shaded{ \mathrm{volume } = \mathrm{det}\left(\vec{A},\vec{B},\vec{C}\right) } \nonumber $$

Equation of a plane from points

Find the plane that contains the points \(p\), \(q\) and \(r\).

Point \(p, q, r, s\) in space


Consider \(\overrightarrow{qr}\), \(\overrightarrow{qs}\) and \(\overrightarrow{qp}\) that form a parallelepiped. if these vectors are in the same plane, the parallelepiped will be flat. In other words, it will have no volume.

If \(p\) is in the \(qrs\)-plane, the determinant should be \(0\). $$ \shaded{ \mathrm{det}\left( \overrightarrow{qp}, \overrightarrow{qr}, \overrightarrow{qs} \right) = 0 } \nonumber $$ with \(q\), \(r\) and \(s\) known, and \(p\) unknown, this equation will give the expression in \(x,y,z\) for the plane.

A more intuitive solution

Point \(p, q, r, s\) and \vec{n} in space

Let a “normal vector” \(\overrightarrow n\) be a vector perpendicular to the plane. Then \(p\) is the plane when \(\overrightarrow{qp} \perp \overrightarrow n\). Therefore the dot-product $$ \overrightarrow{qp}\cdot \overrightarrow{n} = 0 \label{eq:moreintuitive} $$

\(\overrightarrow{n}\) equals \(\overrightarrow{pr} \times \overrightarrow{qs}\). Substituting this in equation \(\eqref{eq:moreintuitive}\) $$ \overrightarrow{qp} \cdot \left( \overrightarrow{pr} \times \overrightarrow{qs} \right) = 0 \nonumber $$

Applying the triple product equation \(\eqref{eq:tripleproduct}\) gives the condition $$ \shaded{ \mathrm{det}\left( \overrightarrow{qp}, \overrightarrow{pr}, \overrightarrow{qs} \right) = 0 } $$ with \(q\), \(r\) and \(s\) known, and \(p\) unknown, this equation will give the expression in \(x,y,z\) for the plane.

Gradient field (in plane)


My notes of the excellent lectures 20 and 21 by “Denis Auroux. 18.02 Multivariable Calculus. Fall 2007. Massachusetts Institute of Technology: MIT OpenCourseWare, License: Creative Commons BY-NC-SA.”


When vector field \(\vec F\) is a gradient of function (written using the symbolic \(\nabla\)-operator) \(f(x,y)\), it is called a gradient field $$ \newcommand{pdv}[1]{\tfrac{\partial}{\partial #1}} \shaded{ \vec F = \nabla f = \left\langle \pdv{x}f, \pdv{y}f \right\rangle = \left\langle f_x, f_y \right\rangle } \nonumber $$

Where \(f(x,y)\) is called the potential.

Fundamental theorem

Recall the fundamental theorem of calculus

If you integrate the derivative, you get back the function. $$ \int_a^b \frac{df(t)}{dt}\,dt=f(b)-f(a) \label{eq:fndcalc} $$

In multivariable calculus, it is the same

If you take the line integral of the gradient of a function, what you get back is the function. $$ \shaded{ \int_C\nabla f\cdot d\vec r = f(P_1) – f(P_0) } \label{eq:fundthm} $$ where \(f(x,y)\) is called the potential.
Work in gradient field

Only when the field is a gradient, and you know the function \(f\), you can simplify the evaluation of the line integral for work. $$ \shaded{ \int_C\nabla f\cdot d\vec r=f(P_1)-f(P_0) } \nonumber $$


In coordinates, the gradient field \(\nabla f\) is expressed as $$ \newcommand{pdv}[1]{\tfrac{\partial}{\partial #1}} \nabla f =\left\langle \pdv{x}f, \pdv{y}f \right\rangle =\left\langle M, N \right\rangle \nonumber $$

Recall: the work integral in differential form
$$ \int_C\vec F\cdot d\vec r = \int_C\left( M\,dx + N\,dy \right) \nonumber $$

Substituting \(M\) and \(N\) from the gradient field into the work integral $$ \renewcommand{dv}[2]{\frac{d #1}{d #2}} \renewcommand{pdv}[2]{\frac{\partial #1}{\partial #2}} \begin{align} \int_C \nabla f\cdot d\vec r &= \int_C \pdv{f}{x}dx+\pdv{f}{y}dy \nonumber \\ &= \int_C \underline{ \left(\pdv{f}{\color{red} x}\dv{\color{red}x}{t} + \pdv{f}{\color{blue}y}\dv{\color{blue}y}{t}\right)}\,dt \label{eq:subgrad} \end{align} $$

Recall: the multivariable calculus chain rule

$$ \renewcommand{dv}[2]{\frac{d #1}{d #2}} \renewcommand{pdv}[2]{\frac{\partial #1}{\partial #2}} \dv{}{t}\,f\left(\,\color{red}x(t),\,\color{blue}y(t)\,\right) = \pdv{f}{\color{red}x}\frac{d\color{red}x}{dt}\,+\, \pdv{f}{\color{blue}y}\frac{d\color{blue}y}{dt} \nonumber $$

Substitute the reverse chain rule to equation \(\eqref{eq:subgrad}\), and integrate the differential of a function $$ \renewcommand{dv}[2]{\frac{d #1}{d #2}} \renewcommand{pdv}[2]{\frac{\partial #1}{\partial #2}} \begin{align*} \int_C \nabla f\cdot d\vec r &= \int_{t_0}^{t_1} \dv{}{t} f\color{grey}{\big(\,x(t),\,y(t)\,\big)}\,dt = \int_{t_0}^{t_1} f\color{grey}{\big(\,x(t),\,y(t)\,\big)} \end{align*} $$

By the fundamental theorem of calculus \(\eqref{eq:fndcalc}\) $$ \renewcommand{dv}[2]{\frac{d #1}{d #2}} \renewcommand{pdv}[2]{\frac{\partial #1}{\partial #2}} \begin{align*} \int_C \nabla f\cdot d\vec r &= f\color{grey}{\big(\,x(t_1),\,y(t_1)\,\big)} – f\color{grey}{\big(\,x(t_0),\,y(t_0)\,\big)} \end{align*} $$

With the points $$ \left\{ \begin{align*} P_0 &= \big(\,x(t_0),\,x(t_0)\,\big) \\ P_1 &= \big(\,x(t_1),\,x(t_1)\,\big) \end{align*} \right. $$

So, the work done in gradient field \(f\) can be expressed as the difference in potential $$ \renewcommand{dv}[2]{\frac{d #1}{d #2}} \renewcommand{pdv}[2]{\frac{\partial #1}{\partial #2}} \shaded{ \int_C \nabla f\cdot d\vec r = f(P_1)-f(P_0) } $$

Physics (using math notation)

A lot of forces are gradients of potentials such as the electric force and the gravitational force. However, magnetic fields are not gradients.

The work done by the electrical (or gravitational) force, is given by the change of the potential energy from the starting point to the ending point.

Note that: physics potentials are the opposite of mathematical potentials. The force \(\vec F\) will be negative the gradient. So in physics, it would be expressed as $$ \vec F=-\nabla f \nonumber $$


Equivalent properties of the work in a gradient field $$ \rm W = \int_C\nabla f\,d\vec r \nonumber $$

  1. Path-independent: the work only depends on at the start and end points, \(f(P_0)\) and \(f(P_1)\).
  2. Conservative: the work is \(0\) along all closed curves. This means a closed loop in a gradient field does not provide energy. Conservativeness means no energy can be extracted from the field for free. The total energy is conserved.
  3. \(Mdx+Ndy\) is an exact differential. That means it can be put in the form \(df\).

See also Curl and Green’s.



Let’s look at the earlier example again: Curve \(C\) starting and ending at \((0,0)\) through vector field \(\vec F\) $$ \begin{array}{l} \vec F = \left\langle y,x \right\rangle \\ C_1: (0,0)\ \mathrm{to}\ (1,0) \\ C_2: \mathrm{unit\ circle\ from}\ (1,0)\ \mathrm{to\ the\ diagonal} \\ C_3: \mathrm{from\ the\ diagonal\ to}\ (0,0) \end{array} \nonumber $$


Try function \(f=xy\). The gradient is $$ \renewcommand{pdv}[2]{\frac{\partial #1}{\partial #2}} \begin{align*} \nabla f &= \left\langle \pdv{}{x}xy, \pdv{}{y}xy \right\rangle = \left\langle y,x \right\rangle \end{align*} $$ That means the line integral can just be evaluated by finding the values of \(f\) at the endpoints.


Visualize using a contour plot of \(f=xy\) through gradient field \(\vec F\)

Contour plot of gradient with curve

Along the segments

  • On \(C_1\) the potential stays \(0\).
  • On \(C_2\) $$ \begin{align*} \int_{C_2}\vec F\cdot d\vec r &= f\left(\frac{1}{\sqrt 2},\frac{1}{\sqrt 2}\right) – f(1,0) \\ &= \frac{1}{2}-0=\frac{1}{2} \end{align*} $$
  • On \(C_3\) it decreases back to \(0\).
The sum of the work therefore is \(0\).

When is a vector field a gradient field?

Let vector field \(\vec F=\left\langle M,N\right\rangle\) where \(M\) and \(N\) are functions of \(x\) and \(y\).

When is this a gradient field? $$ \newcommand{pdv}[2]{\frac{\partial #1}{\partial #2}} \vec F = \left\langle M, N \right\rangle \stackrel{?}{=} \nabla f = \left\langle \pdv{}{x}f, \pdv{}{y}f \right\rangle \nonumber $$

If \(\vec F\) is a gradient field, \(\vec F=\nabla f\), then $$ \newcommand{dv}[2]{\frac{d #1}{d #2}} \newcommand{pdv}[2]{\frac{\partial #1}{\partial #2}} \left\{ \begin{align*} M=\pdv{}{x}f = f_x \\ N=\pdv{}{y}f = f_y \end{align*} \right. $$

Take the partial derivatives of \(M\) and \(N\) $$ \newcommand{pdv}[2]{\frac{\partial #1}{\partial #2}} \newcommand{ppdv}[3]{\frac{\partial^2 #1}{\partial #2\partial #3}} \begin{align} M_y=\pdv{M}{y}=\ppdv{}{\color{red}x}{\color{blue}y}f = f_{xy} \label{eq:proof1} \\ N_x=\pdv{M}{x}=\ppdv{}{\color{blue}y}{\color{red}x}f = f_{yx} \label{eq:proof2} \end{align} $$

Recall: the second partial derivative of function \(f\)

$$ \newcommand{dv}[2]{\frac{d #1}{d #2}} \newcommand{pdv}[2]{\frac{\partial #1}{\partial #2}} \newcommand{ppdv}[3]{\frac{\partial^2 #1}{\partial #2\partial #3}} \pdv{}{\color{red}x}\left(\pdv{f}{\color{blue}y}\right) =\ppdv{f}{\color{red}x}{\color{blue}y} =\ppdv{f}{\color{blue}y}{\color{red}x} =\pdv{}{\color{blue}y}\left(\pdv{f}{\color{red}x}\right) \nonumber $$

Based on the second partial derivative rule, equations \(\eqref{eq:proof1}\) and \(\eqref{eq:proof2}\) are the same. That implies that a gradient field should have the property $$ \newcommand{dv}[2]{\frac{d #1}{d #2}} \newcommand{pdv}[2]{\frac{\partial #1}{\partial #2}} \newcommand{ppdv}[3]{\frac{\partial^2 #1}{\partial #2\partial #3}} \left. \begin{align*} M_y = \ppdv{}{\color{red}x}{\color{blue}y}f = f_{xy} \\ N_x = \ppdv{}{\color{blue}y}{\color{red}x}f = f_{yx} \end{align*} \right\} \Rightarrow M_y=N_x $$

Therefore, \(\vec F=\left\langle M,N \right\rangle\), defined and differentiable everywhere, is a gradient field, when $$ \shaded{ M_y = N_x } \nonumber $$ (Also see the Definition of Curl.)

So, if \(\vec F=\left\langle M,N\right\rangle\) is a gradient field in a region of the plane.

  • \(\Leftrightarrow\) Conservative if \(\int_C \vec F\cdot d\vec r=0\) for any closed curve. To note it is along a closed curve, we note it as \(\oint_C\) $$ \oint_C \vec F\cdot d\vec r=0 \nonumber $$
  • \(\Rightarrow\) \(N_x=M_y\) at every point.
  • \(\Leftarrow\) \(N_x=M_y\) at every point, if \(\vec F\) is defined in the entire plane (or, in a simply connected region). (see later)



Is \(\vec F\) a gradient field? $$ \vec F=\underbrace{-y}_{M}\hat\imath+\underbrace{x}_N\hat\jmath=\left\langle -y,x \right\rangle \nonumber $$

\(\vec F\) is not a gradient field, because $$ \newcommand{dv}[2]{\frac{d #1}{d #2}} \newcommand{pdv}[2]{\frac{\partial #1}{\partial #2}} \newcommand{ppdv}[3]{\frac{\partial^2 #1}{\partial #2\partial #3}} \left. \begin{align*} \pdv{M}{y}&=\pdv{}{y}(-y)=-1 \\ \pdv{N}{x}&=\pdv{}{x}x=1 \end{align*} \right\} \Rightarrow \pdv{M}{y}\neq \pdv{N}{x} $$


For what value of \(a\) is \(\vec F\) a gradient field? $$ \vec F = \underbrace{(4x^2+axy)}_{M}\hat\imath+\underbrace{(3y^2+4x^2)}_N\hat\jmath = \left\langle 4x^2+axy, 3y^2+4x^2\right\rangle \nonumber $$

$$ \newcommand{dv}[2]{\frac{d #1}{d #2}} \newcommand{pdv}[2]{\frac{\partial #1}{\partial #2}} \newcommand{ppdv}[3]{\frac{\partial^2 #1}{\partial #2\partial #3}} \left. \begin{align*} \pdv{M}{y}&=\pdv{}{y}(4x^2+axy)=ax \\ \pdv{N}{x}&=\pdv{}{x}(3y^2+4x^2)=8x \end{align*} \right\} \Rightarrow a=8 $$ Note that \(x=0\) is not an answer everywhere.

Finding the potential

Recall: from earlier

When the field is a gradient, and you know the function \(f\), you can simplify the evaluation of the line integral for work. $$ \shaded{ \int_C\nabla f\cdot d\vec r=f(P_1)-f(P_0) } \nonumber $$ where \(f(x,y)\) is called the potential

To show the two methods, we will find the potential of the gradient field \(\vec F\) $$ \vec F = \left\langle \underbrace{4x^2+axy}_{=M}, \underbrace{3y^2+4x^2}_{=N} \right\rangle \nonumber $$

Compute line integrals

Apply the fundamental theorem, equation \(\eqref{eq:fundthm}\), to find an expression for the potential at \((x_1,y_1)\) $$ \begin{align} &\int_C\vec F\cdot d\vec r=f(x_1,y_1)-f(0,0) \nonumber \\ \Rightarrow & f(x_1,y_1) = \underbrace{\int_C\vec F\cdot d\vec r}_{\rm{work}} + \underbrace{f(0,0)}_{\mathrm{constant}} \label{eq:method1} \end{align} $$

Apply the work differential, to find the work along \(C\) in gradient field \(\vec F\) $$ \begin{align*} \underline{\int_C\vec F\cdot d\vec r} &= \int_C M\,dx+N\,dy \\ &= \int_C\left(4x^2+8xy\right)dx+\left(3y^2+4x^2\right)dy \end{align*} $$

The work in a gradient is path independent \(\Longrightarrow\) find the easiest path

Paths \(C, C_1, C_2\)

The easiest path is $$ \begin{array}{lll} C_1: & x\ \mathrm{from}\ 0\ \mathrm{to}\ x_1 & y=0 &\Rightarrow dy=0 \\ C_2: & x=x_1 & y\ \mathrm{from}\ 0\ \mathrm{to}\ y_1 &\Rightarrow dx=0 \end{array} \nonumber $$

Work along the curves

  • Along \(C_1\) $$ \begin{align*} \int_{C_1}\vec F\cdot d\vec r &= \int_0^{x_1}(4x^2+0)\,dx + 0 \\ &= \left[\frac{4}{3}x^3\right]_0^{x_1} = \frac{4}{3}{x_1}^3 \end{align*} $$
  • Along \(C_2\) $$ \begin{align*} \int_{C_2}\vec F\cdot d\vec r &= \int_0^{y_1}0+(3y^2+4{x_1}^2)\,dy \\ &= \left[y^3+4{x_1}^2y \right]_0^{y_1} = {y_1}^3+4{x_1}^2y_1 \end{align*} $$

The total work $$ \int_C\vec F\cdot d\vec r = \int_{C_1}\ldots + \int_{C_2}\ldots = \frac{4}{3}{x_1}^3 + {y_1}^3+4{x_1}^2y_1 \nonumber $$

Substitute \(\int_{C_1}, \int_{C_2}\) back in \(\eqref{eq:method1}\) $$ \begin{align*} f(x_1,y_1) &= \int_C\vec F\cdot d\vec r + \rm{c} \\ &= \frac{4}{3}{x_1}^3 + {y_1}^3+4{x_1}^2y_1 + \rm{c} \end{align*} $$

Drop the subscripts $$ f(x,y) = \frac{4}{3}x^3 + 4x^2y_1 + y^3\, (+ \rm{c}) \nonumber $$ If you would take the gradient, you should get \(\vec F\) back.

Compute using antiderivatives

No integrals, but you have to follow the procedure very carefully. A common pitfall, is to treat the second equation, like the first one.

For the example, we want to solve $$ \newcommand{pdv}[2]{\frac{\partial #1}{\partial #2}} \left. \begin{align} \pdv{f}{x}&=f_x=4x^2+8xy \label{eq:anti1} \\ \pdv{f}{y}&=f_y=3y^2+4x^2 \label{eq:anti2} \end{align} \right. $$

Integrate equation \(\eqref{eq:anti1}\) in respect to \(x\). The integration constant might depend on \(y\), so we call it \(g(y)\) $$ \newcommand{pdv}[2]{\frac{\partial #1}{\partial #2}} \pdv{f}{x} = 4x^2+8xy \xrightarrow{\int dx} f = \underline{\frac{4}{3}x^3 + 4x^2y + g(y)} \label{eq:anti} $$

To get information of \(g(y)\), we look at the other partial. Take the derivative of \(f\) in respect to \(y\) and compare to \(\eqref{eq:anti}\) $$ \require{cancel} \newcommand{pdv}[2]{\frac{\partial #1}{\partial #2}} \begin{align*} \pdv{}{y}\left(\frac{4}{3}x^3 + 4x^2y + g(y)\right) &= 3y^2+\bcancel{4x^2} \\ 0 + \bcancel{4x^2}+\pdv{}{y}g(y) &= 3y^2+\bcancel{4x^2} \Rightarrow \pdv{}{y}g(y) = 3y^2\\ \xrightarrow{\int dy} g(y) &= \int \pdv{}{y}g(y)\,dy = \underline{y^3 + c} \end{align*} $$ \(g(y)\) only depends on \(y\), so \(c\) is a true constant.

Plug this back into equation \(\eqref{eq:anti}\), gives the potential \(f(x,y)\) $$ f = \frac{4}{3}x^3 + 4x^2y + \underline{y^3\ (+ \rm{c})} \nonumber $$

Double integrals


My notes of the excellent lectures 16, 17 and 18 by “Denis Auroux. 18.02 Multivariable Calculus. Fall 2007. Massachusetts Institute of Technology: MIT OpenCourseWare, License: Creative Commons BY-NC-SA.”

Recall: the integral of function of one variable \(f(x)\) corresponds to the area below the graph of \(f\) over \([a,b]\).

$$ \int_a^b f(x)\,dx \nonumber $$

The input domain of \(f(x)\) is \(x\), therefore the region of integration \(R\) is on a line along the \(x\)-axis. Here \(x=a\) is the lower bound, and \(x=b\) is the upper bound.

Single variable function in \(xy\)-plane


For a function of two variables \(f(x,y)\), the region of integration \(R\) is bounded by a curve on the \(xy\)-plane. Using a double integral, you can find the volume between the region and a function \(z=f(x,y)\).

Volume under \(z=f(x,y)\) over region \(R\)

To compute the volume, start with cutting the area of \(R\) in small pieces \(\Delta A=\Delta y\Delta x\)

\(xyz\)-space with region \(R\) and area \(\Delta A\)
\(xy\)-plane with region \(R\) and area \(\Delta A\)

Consider all the pieces, and take the limit \(\Delta A_i\to 0\). $$ \lim_{\Delta A_i\to 0}\sum_i f(x_i,y_i)\,\Delta A_i \nonumber $$

Let \(dA=dy\,dx\) be a tiny piece of area in region \(R\). This gives the definition of the double integral of \(f(x,y)\) over region \(R\). $$ \shaded{ \iint_R f(x,y)\,dA } \nonumber $$

Double integrals are evaluated as two embedded integrals, starting with the inner integral $$ \int_{x_{min}}^{x_{max}} \underbrace{ \int_{y_{min}(x)}^{y_{max}(x)} f(x,y)\,dy }_{\text{function of only }y} \,dx \nonumber $$ The bound functions encode the shape of region \(R\).

The bounds of the inner integral might be functions of the outer variables.

In Cartesian coordinates

To compute \(\iint_R f(x,y)\,dA\), we take slices that scan the volume from the back to the front.

Slice for a given \(x_i\) in \(xyz\)-space
Slice for a given \(x\) in \(xy\)-plane

For the outer integral, let \(S(x_i)\) be the area of a slice \(\newcommand{\parallelsum}{\mathbin{\!/\mkern-5mu/\!}} \parallelsum\ yz\)-plane (the area of the thin purple vertical wall in the picture on the left). Then, the volume of each slice is \(S(x_i)\,\Delta x\). The total volume follows as $$ \begin{align} \rm{volume} &= \lim_{\Delta x\to 0}\sum_i S(x)\,\Delta x \nonumber \\ &=\int_{x_{min}}^{x_{max}} \underline{S(x)}\,dx \label{eq:doublecomp1} \end{align} $$

For the inner integral, \(x\) is constant and \(y\) is the variable of integration. For the range of \(y\), we go from the far left to the far right on the given slice, as shown in the picture on the right $$ S(x) = \int_{y_{min}(x)}^{y_{max}(x)} f(x,y)\,dy \label{eq:doublecomp2} $$ Note that these inner bounds depend on \(x\).

Substituting equation \(\eqref{eq:doublecomp2}\) in \(\eqref{eq:doublecomp1}\) give the iterated integral $$ \shaded{ \iint_R f(x,y)\,dA = \int_{x_{min}}^{x_{max}} \left[ \int_{y_{min}(x)}^{y_{max}(x)} f(x,y)\,dy \right] dx } \nonumber $$



Integrate \(z=1-x^2-y^2\) over the region $$ \left\{\begin{align*} 0\leq &x\leq 1 \\ 0\leq &y\leq 1 \end{align*}\right. \nonumber $$


Volume under \(z=f(x,y)\) over region \(R\)

The bounds are trivial $$ \begin{align*} \iint_R z(x,y)\,dA &= \int_0^1\underline{\int_0^1 1-x^2-y^2\,dy}\,dx \end{align*} \nonumber $$

Evaluate the inner integral $$ \begin{align*} \int_0^1 1-x^2-y^2\,dy &= \left[ y-x^2y-\frac{y^3}{3} \right]_{y=0}^1 \\ &= (1-x^2-\frac{1}{3}) – 0 \\ &= \underline{\frac{2}{3}-x^2} \end{align*} \nonumber $$

Substituted back in the outer integral $$ \begin{align*} \iint_R z(x,y)\,dA &=\int_0^1 \underline{\frac{2}{3}-x^2}\,dx \\ &=\left[\frac{2}{3}x-\frac{x^3}{3}\right]_{x=0}^1 = \frac{1}{3} \end{align*} \nonumber $$


Integrate \(z=1-x^2-y^2\) over the quarter unit disk region $$ \left\{\begin{align*} x^2 + y^2 &\leq 1 \\ x &\geq 0 \\ y &\geq 0 \end{align*}\right. \nonumber $$


\(z=f(x,y)\) and region \(R\) in \(xyz\)-space
Region \(R\) in \(xy\)-plane

Find the bounds of integration

  1. For \(\int dy\), the inner integral, express the bounds of \(y\) as a function of \(x\). The lower bond is \(0\). The upper bounds are on a quarter circle with \(x^2+y^2 = 1 \Rightarrow y=\sqrt{1-x^2}\).
  2. For \(\int dx\), the outer integral, the range for \(x\) is \(0\) to \(1\).

Fill in the bounds of the integrals $$ \begin{align*} \iint_R z(x,y)\,dA &= \int_0^1\underline{\int_0^{\sqrt{1-x^2}} 1-x^2-y^2\,dy}\,dx \end{align*} \nonumber $$

Evaluate the inner integral $$ \begin{align*} \int_0^{\sqrt{1-x^2}} 1-x^2-y^2\,dy &= \left[ y-x^2y-\frac{y^3}{3} \right]_{y=0}^{\sqrt{1-x^2}} \\ &= \left(\sqrt{1-x^2}-x^2\sqrt{1-x^2}-\frac{1}{3}(1-x^2)^{3/2}\right) – 0 \\ &= (1-x^2)(1-x^2)^{1/2}-\frac{1}{3}(1-x^2)^{3/2} \\ &= \underline{\frac{2}{3}\left(1-x^2 \right)^{3/2}} \end{align*} \nonumber $$

Substitute back in the outer integral $$ \begin{align*} \iint_R z(x,y)\,dA &= \int_0^1\underline{\frac{2}{3}\left(1-x^2 \right)^{3/2}}\,dx \end{align*} \nonumber $$

For computing the outer integral, substitute \(x=\sin\theta\) and using the double angle formula \(cos^2\theta=\frac{1}{2}(1+\cos2\theta)\) twice. This will eventually lead to the answer \(\frac{\pi}{8}\).

As we will see later, using polar coordinates will be much easier!

Changing the order of integration

We change the order of integration, when it makes it easier to compute the double integral.



When the bounds are numbers, they form a rectangle and we can simply switch the order of integration $$ \int_0^1\int_0^2 dx\,dy = \int_0^2\int_0^1 dy\,dx \nonumber $$


The written way can’t be computed. Change the order of integration. $$ \int_0^1\int_x^{\sqrt{x}} \frac{e^y}{y}dy\,dx \nonumber $$

Plot the region based on the existing bounds.


For the new inner integral, \(y\) is constant and \(x\) is the variable of integration. The old upper bound \(y=\sqrt{x} \Rightarrow x=y^2\), and lower bound \(y=x \Rightarrow x=y\) $$ \begin{align*} \int_0^1\int_x^{\sqrt{x}} \frac{e^y}{y}dy\,dx &= \int_0^1 \underline{\int_{y^2}^y \frac{e^y}{y}dx}\,dy \\ \end{align*} \nonumber $$

Evaluate the inner integral $$ \begin{align*} \int_{y^2}^y \frac{e^y}{y}dx &= \left[x\frac{e^y}{y}\right]_{x=y^2}^y \\ &=e^y – e^y y \end{align*} \nonumber $$

Find the antiderivative for \(e^y – e^y y\) (or use integrating by parts) $$ \begin{align*} \left(y\,e^y\right)’ &= 1.e^y+y.(e^y)’=e^y+y\,e^y \\ \Rightarrow \left(-y\,e^y\right)’ &= -e^y-y\,e^y \\ \Rightarrow \left(-y\,e^y+2\,e^y\right)’ &= -e^y-y\,e^y + 2e^y \\ &= e^y-y\,e^y \end{align*} \nonumber $$

The outer integral evaluates to $$ \begin{align*} \int_0^1\int_x^{\sqrt{x}} \frac{e^y}{y}dy\,dx &= \int_0^1 (e^y – e^y y)\,dx \\ &=\Big[ -y\,e^y + 2\,e^y \Big]_{y=0}^1 \\ &= (-1.e^1+2e^1)-(0+2.e^0) \\ &= e -2 \end{align*} \nonumber $$


Exchange the order of integration to \(dx\,dy\) for $$ \int_0^1\int_x^{2x}f\,dy\,dx \nonumber $$

Plot the region based on the existing bounds.

Simply connected regions
Not simply connected regions

These not simply connected regions results in two terms: \(0\lt y\lt 1\) and \(1\lt y\lt 2\). Each with different bounds for \(x\) $$ \int_0^1\int_x^{2x}f\,dy\,dx = \int_{0}^{1}\int_{y/2}^{y} f\,dx\,dy + \int_{1}^{2}\int_{y/2}^{1} f\,dx\,dy \nonumber $$

In polar coordinates

In general, you switch to polar coordinates because the region is easier to setup, or the integrand becomes simpler.

Polar-coordinates vs. \(xy\)-coordinates

Polar coordinates express point \((x,y)\) in the plane, using \(r\) for the distance from the origin \(r\), and \(\theta\) as the counterclockwise angle with the positive \(x\)-axis. $$ \shaded{ \begin{align*} x &= r\cos\theta \\ y &= r\sin\theta \end{align*} } \nonumber $$

Area element

The area element \(\Delta A\) is almost rectangular as shown below

Area \(\Delta A\)

One side is \(\Delta r\) and the other side is \(r\,\Delta\theta\). For the limit where \(\Delta\theta,r\to 0\), the area element becomes $$ \shaded{ dA=r\,dr\,d\theta } \nonumber $$

The double integral in polar coordinates $$ \shaded{ \int_{\theta_{min}}^{\theta_{max}} \int_{r_{min}}^{r_{max}} f(r,\theta)\,r\,dr\,d\theta } \nonumber $$



Redo the earlier problem using polar coordinates: Integrate \(z=1-x^2-y^2\) over the quarter unit disk region $$ \left\{\begin{align*} x^2 + y^2 &\leq 1 \\ x &\geq 0 \\ y &\geq 0 \end{align*}\right. \nonumber $$

Plot of the region

Region \(R\) in polar-coordinates

Set the bounds for the integrals

  1. For \(\int dr\), the inner integral: fix the value of \(\theta\), and let \(r\) vary. For the bounds, ask yourself for what values of \(r\) will I be inside my region. In this case, that is \( 0\lt r\lt 1\). We let \(\theta\) vary.
  2. For \(\int d\theta\), the outer integral: ask yourself what values of \(\theta\) will I be inside my region.

Fill in the bounds of the double integral $$ \int_0^{\pi/2}\int_0^1 f(r,\theta)\,r\,dr\,d\theta \nonumber $$

Instead of just replacing \(x=r\,\cos\theta\) and \(y=r\,\sin\theta\), we can express the function \(f(x,y)\) in polar coordinates using \(r^2=x^2+y^2\) $$ \begin{align*} f(x,y) &= 1-x^2-y^2 \\ &= 1-(x^2+y^2) \\ \Leftrightarrow f(r,\theta) &= 1-r^2 \end{align*} \nonumber $$

Evaluate the double integral $$ \begin{align*} \text{volume} &= \int_0^{\pi/2}\underline{\int_0^1 (1-r^2)\,r\,dr}\,d\theta \\ &= \int_0^{\pi/2}\left[ \frac{r^2}{2}-\frac{r^4}{4} \right]_{r=0}^1 \,d\theta \\ &= \int_0^{\pi/2} \frac{1}{4} \,d\theta = \frac{1}{4}\frac{\pi}{2}=\frac{\pi}{8} \end{align*} \nonumber $$


Find the area

Find the area of region \(R\). $$ \shaded{ \text{Area}(R)=\iint_R 1\,dA } \nonumber $$

Or, the mass of a (flat) object with density \(\delta\) = mass per unit area. $$ \shaded{ \begin{align*} \Delta m &= \delta .\Delta A \\ \Rightarrow \text{Mass}(R) &= \iint_R\delta(x,y)\,dA \end{align*} } \nonumber $$

Find the average value

Average value of \(f\) in \(R\). $$ \shaded{ \bar f = \frac{1}{\text{Area}(R)}\iint_R f(x,y)\,dA } \nonumber $$

Or, the weighted average value of \(f\) in \(R\) with density \(\delta\) $$ \shaded{ \frac{1}{\text{Mass}(R)}\iint_R f(x,y)\,\underbrace{\delta(x,y)\,dA}_{\text{mass element}} } \nonumber $$

Or, the center of mass \((\bar x,\bar y)\) of a (planar) object with density \(\delta\). The weighted averages on \(x\) and \(y\) $$ \shaded{ \left\{ \begin{align*} \bar x &= \iint_R x\,\delta(x,y)\,dA \\ \bar y &= \iint_R y\,\delta(x,y)\,dA \end{align*} \right. } \nonumber $$

Find the moment of inertia

Recall from physics:

The kinetic energy of a point mass equals \(\frac{1}{2}mv^2\)

Mass is how hard it is to impart a translation movement. (to make it move)

Similarly, the moment of inertia about an axis is how hard it is to rotate about that axis (to make it spin).

Linear motion
Circular motion

Let \(\omega\) be the rate of change of angle \(\theta\), \(\omega=\frac{d\theta}{dt}\).

At unit time, a mass \(m\) rotating by \(\omega\), goes a distance of \(r\omega\), so the speed is \(v=r\omega\). The kinetic energy follows as $$ \shaded{ \tfrac{1}{2}m\,v^2=\tfrac{1}{2}\underline{mr^2}\omega^2 } \nonumber $$

The moment of inertia is defined as $$ \shaded{ I = mr^2 } \nonumber $$

For rotation movements, \(I\) replaces the mass \(m\). The rotational kinetic energy is $$ \shaded{ \frac{1}{2}\,I\,\omega^2 } \nonumber $$

Rotation about the origin

A solid with density \(\delta_i\) rotating about the origin.

Solid rotating around origin

A tiny area \(\Delta A\) with mass \(\Delta m=\delta_i\,\Delta A\), has a moment of inertia $$ \Delta m.r^2=\delta.\Delta A.r^2 \nonumber $$

Consider all the pieces $$ \shaded{ I_o=\iint_R r^2\,\delta\,dA } \nonumber $$ where \(r^2=x^2+y^2\) in \(xy\)-coordinates.

Rotation about the \(\ x\)-axis

In the \(xyz\)-space, the distance to the \(x\)-axis is \(|y|\).

Solid spinning around \(x\)-axis

Moment of inertia for a solid with density \(\delta\) rotaring about the \(x\)-axis $$ \shaded{ I_x=\iint_R y^\,\delta\,dA } \nonumber $$



Disk of radius \(a\) with uniform density \(\delta=1\) spinning around its center. What is the moment of inertia?

Disk spinning around origin

What is \(r^2\) for any point inside \(R\) in this formula? $$ I_o=\iint_R r^2\,\delta\,dA \nonumber $$

Using polar coordinates, \(r\) will go from \(0\) to \(a\) and \(dA=r\,dr\,d\theta\) $$ \begin{align*} I_o &= \iint_R r^2.1.dA \\ &= \int_0^{2\pi} \underline{\int_a^a r^2 r\,dr}\,d\theta \\ &= \int_0^{2\pi} \left[ \frac{r^4}{4} \right]_{r=0}^a\,d\theta = \int_0^{2\pi} \frac{a^4}{4}\,d\theta \\ &= \frac{a^2}{4}\Big[\theta\Big]_0^{2\pi} = \frac{1}{2}\pi a^4 \end{align*} \nonumber $$


How much harder is it to spin this disk around a point on its circumference?

Disk spinning around its circumference

The inertia $$ \begin{align*} I_o & =\iint r^2\,dA \\ &= \int_{-\pi/2}^{\pi/2} \underline{\int_0^{2a\cos\theta} r^2 r\,dr}\,d\theta \\ \end{align*} \nonumber $$

Evaluate the inner integral $$ \begin{align*} \int_0^{2a\cos\theta} r^2 r\,dr &=\left[\frac{r^4}{4}\right]_{r=0}^{2a\cos\theta} \\ &= 4a^4\cos^4\theta \end{align*} \nonumber $$

Evalutate the outer integral $$ \begin{align*} I_o & =\iint r^2\,dA \\ &= \int_{-\pi/2}^{\pi/2} 4a^4\cos^4\theta\,d\theta = \dots = \frac{3}{2}\pi a^4 \end{align*} \nonumber $$

It is three times harder to spin a Frisbee about a point on a circumference than around the center.

Change of variables

We change variables, when it simplifies the integrand or bounds, so it becomes easier to compute the double integral.



Determine the area of an ellipse with semi-axes \(a\) and \(b\). $$ \left(\frac{x}{a}\right)^2+\left(\frac{y}{b}\right)^2=1 \nonumber $$

The double integral for the area $$ \rm{Area} = \iint_{\left(\frac{x}{a}\right)^2+\left(\frac{y}{b}\right)^2\lt 1} dx\,dy \nonumber $$

Use substitution to make it look more like a circle $$ \left. \begin{array}{c} \text{set }\frac{x}{a}=u \Rightarrow du = \frac{1}{a}dx \\ \text{set }\frac{y}{a}=v \Rightarrow dv = \frac{1}{b}dy \end{array} \right\} \\ \begin{align*} \Rightarrow du\,dv &= \frac{1}{ab}dx\,dy \\ \Rightarrow dx\,dy &= ab\,du\,dv \end{align*} \nonumber $$

Substitute it back in the double integral $$ \begin{align*} \rm{Area} &= \iint_{u^+v^2\lt 1} ab\,du\,dv \\ &= ab\underbrace{\iint_{u^+v^2\lt 1} du\,dv}_{\text{area of unit disk}} = a\,b\,\pi \end{align*} \nonumber $$


To simply integrand or bounds, we set a change of variables as $$ \left\{ \begin{align*} u &= 3x-2y \\ v &= x+y \end{align*} \right. \nonumber $$

What is the relation between \(dA=dx\,dy\) and \(dA’=du\,dv\)?

\(\Delta x, \Delta y\)
\(\Delta u, \Delta v\)

The linear transformation changes it to a parallelogram. Because of the linear change of variables, the area scaling factor doesn’t depend on the choice of rectangle. So let’s take the simplest rectangle, the unit square.

Simplest rectangle in \(xy\)

Applying the transformation to the corners

Simplest rectangle in \(uv\)

The area \(A’\) is the determinant of the two vectors from the origin $$ A’ = \left| \begin{array}{rr} 3 & 1 \\ -2 & 1 \end{array} \right| = 3+2=5 \nonumber $$

For any other rectangle, area is also multiplied by \(5\) $$ \begin{align*} dA’ &= 5\,dA \\ \Rightarrow du\,dv &= 5\,dx\,dy \\ \Rightarrow \iint\ldots\,dx\,dy &= \iint\ldots\,\frac{1}{5}du\,dv \end{align*} \nonumber $$


Changing variables to \(u,v\) means $$ \newcommand{pdv}[2]{\frac{\partial #1}{\partial #2}} \left\{ \begin{align*} u = u(x,y) &\Rightarrow \Delta u\approx \pdv{u}{x}\Delta x+\pdv{u}{y}\Delta y = u_x\Delta x+u_y\Delta y \\ v = v(x,y) &\Rightarrow \Delta v\approx \pdv{v}{x}\Delta x+\pdv{v}{y}\Delta y = v_x\Delta x + v_y\Delta y \end{align*} \right. \nonumber $$

In matrix form $$ \left[ \begin{array}{c} \Delta u \\ \Delta v \end{array} \right] \approx \left[ \begin{array}{cc} u_x & u_y \\ v_x & v_y \end{array} \right] \left[ \begin{array}{c} \Delta x \\ \Delta y \end{array} \right] \nonumber $$

A small rectangle in \(xy\)-coordinates corresponds to a small parallelogram in \(uv\)-coordinates. The sides of the parallelogram from \((0,0)\), are the vectors \(\left\langle\Delta x,0\right\rangle\) and \(\left\langle 0,\Delta y\right\rangle\) $$ \left\{ \begin{align*} \left\langle\Delta x,0\right\rangle \rightarrow \left\langle\Delta u,\Delta v\right\rangle &\approx \left\langle u_x\Delta x, v_x\Delta x\right\rangle \\ \left\langle 0,\Delta y\right\rangle \rightarrow \left\langle\Delta u,\Delta v\right\rangle &\approx \left\langle u_y\Delta y, v_y\Delta y\right\rangle \end{align*} \right. \nonumber $$

The area \(\rm{Area}’\) of the parallelogram is the determinant $$ \text{Area}’ = \rm{det} \left( \left[ \begin{array}{cc} u_x & u_y \\ v_x & v_y \end{array} \right] \right) \Delta x\,\Delta y \nonumber $$

When you have a general change of variables, \(du\,dv\) versus \(dx\,dy\) is given by the determinant of the matrix of partial derivatives. $$ \rm{det}\left( \left[ \begin{array}{cc} u_x & u_y \\ v_x & v_y \end{array} \right] \right) \nonumber $$

The definition of Jacobian just means the ratio between \(du\,dv\) and \(dx\,dy\). (Not a partial derivative.) Here the vertical bars stand for determinant.

$$ \newcommand{pdv}[2]{\frac{\partial #1}{\partial #2}} \shaded{ J = \pdv{(u,v)}{(x,y)} = \left| \begin{array}{cc} u_x & u_y \\ v_x & v_y \end{array} \right| } \nonumber $$

Then, because area is always positive $$ \shaded{ du\,dv = |J|\,dx\,dy = \left|\pdv{(u,v)}{(x,y)}\right|\,dx\,dy } \nonumber $$



Switching to polar coordinates $$ \begin{align*} x &= r\cos\theta \\ y &= r\sin\theta \end{align*} \nonumber $$

The Jacobian $$ \newcommand{pdv}[2]{\frac{\partial #1}{\partial #2}} \begin{align*} \pdv{(x,y)}{(r,\theta)} &= \left| \begin{array}{cc} x_r & x_\theta \\ y_r & u_\theta \end{array} \right| \\ &= \left| \begin{array}{cc} \cos\theta & -r\sin\theta \\ \sin\theta & r\cos\theta \end{array} \right| \\ &= r\cos^2\theta – (-r\sin^2\theta) \\ &= r(\cos^2\theta + \sin^2\theta) = r \end{align*} $$

Not a constant, but a function of \(r\), so $$ \shaded{ \begin{align*} dx\,dy &= |r|\,dr\,d\theta \\ &= r\,dr\,d\theta \end{align*} } \nonumber $$

Remark: you can compute the one that easier to compute, because they are the inverse of each other. $$ \newcommand{pdv}[2]{\frac{\partial #1}{\partial #2}} \pdv{(u,v)}{(x,y)} \cdot \pdv{(x,y)}{(u,v)} = 1 \nonumber $$


Compute $$ \int_0^1\int_0^1 x^2y\,dx\,dy \nonumber $$

using change of variables to $$ \left\{ \begin{align*} u &= x \\ v &= xy \end{align*} \right. \nonumber $$

Step 1: Find the area element using the Jacobian $$ \newcommand{pdv}[2]{\frac{\partial #1}{\partial #2}} \begin{align*} \pdv{(x,y)}{(r,\theta)} &= \left| \begin{array}{cc} u_x & u_y \\ v_x & v_y \end{array} \right| \\ &= \left| \begin{array}{cc} 1 & 0 \\ y & x \end{array} \right| = x \end{align*} $$ With \(x\) positive in the region $$ \begin{align*} du\,dv &= |x|\,dx\,dy \\ &= x\,dx\,dy \end{align*} \nonumber $$

Step 2: Express the integrand in terms of \(u,v\) $$ \begin{align*} x^2y\,dx\,dy &= x^2y\,\frac{1}{x}\,du\,dv = xy\,du\,dv \\ &= u\frac{v}{u}\,du\,dv = v\,du\,dv \end{align*} \nonumber $$ Compute (or \(dv\,du\)) $$ \iint_\ldots v\,du\,dv \nonumber $$

Step 3: Find the bounds for \(u,v\) in the new integral $$ \begin{align*} \int_\ldots^\ldots \underbrace{\int_\ldots^\ldots v\,du}_{u \text{ changes},\\ v\text{ is constant}}\,dv \end{align*} \nonumber $$ \(v=\rm{constant} \rightarrow xy=\rm{constant} \rightarrow y=\frac{\rm{constant}}{x}\)

\(xy\) and \(uv\)-coordinates
What is the value of \(u\) when we enter the region from the top, where \(y=1\)? $$ \begin{align*} y &=1 \\ \Rightarrow y &=\frac{v}{u}=1 \\ \Rightarrow u &= v \end{align*} $$ What is the value of \(u\) when we exit the region, where \(x=1\)? $$ \begin{align*} x &=1 \\ \Rightarrow u &= 1 \end{align*} $$ The smallest value of \((x,y)\) is \((0,0)\), what corresponds to \(v=0\). The largest value of \((x,y)\) is \((1,1)\), what corresponds to \(v=1\).

Step 4: The double integral follows as $$ \begin{align*} \int_0^1 \int_v^1 v\,du\,dv \end{align*} \nonumber $$ $$ \nonumber $$

How could we have found the bounds easier? Draw the picture the \(uv\)-coordinates.




Vector calculus is about differentiation and integration of vector fields, primarily in \(\mathbb{R}^3\) with coordinates \(x,y,z\) and unit vectors \(\hat{\imath},\hat{\jmath},\hat{k}\). Here we will focus on differentiation.

Axis and unity vectors


Parametric curve

A parametric curve is

a function with one-dimensional input and a multi-dimensional output.

Parametric curves may be expressed as a set of equations, such as $$ f(t)= \left\{ \begin{array}{l} f_x(t)=t^3-3t \\ f_y(t)=3t^2 \end{array} \right. \nonumber $$ or as a vector $$ f(t) = \left\langle \;t^3-3t,\; 3t^2\; \right\rangle \label{eq:parmcurve} $$

Multi-variable functions

A multi-variable function is a function with more than one argument. This concept extends the idea of a function of one variable to several variables.

In other words, let \(f\) be a function of variables \(x, y, \cdots\), then function \(f(x,y.\cdots)\) is a multi-variable function.

Scalar field

When a multi-variable function returns a scalar value for each point, it is called a scalar field.

A scalar field maps \(n\)-dimensional space to real numbers. Scalar fields are commonly visualized as values on a grid of points in the plane. For instance, a weather map showing the temperature \(T\) at each point \((x,y)\) on a map.

For example: scalar field \(z=\sin x + \cos y\), can be plotted with the result encoded as color, or on the \(z\)-axis.

Plot of \(z=\sin x + \cos y\)
Plot of \(z=\sin x + \cos y\)

Vector field

When a multi-variable function assigns a vector to each point \((x,y)\), it is called a vector field.

Vector fields are commonly visualized as arrows from a grid of points in the plane. This allows a \(n\)-dimensional input and output to be visualized in a \(n\)-dimensional drawing, where the arrows further give an intuition of e.g. fluid or air flow. An example of a vector field is a weather map where the magnitude and angle of the vectors represent the speed and direction of the wind at each point \((x,y)\).

Hurricane Sandy (2021-10-28)

In other words, let \(M,N,\cdots\) be functions of variables \(x,y,\cdots\). Then the function \(\vec{F}\) defined below, is called a vector field. $$ \vec{F}=\hat\imath M + \hat\jmath N +\;\cdots = \left\langle M, N, \cdots \right\rangle $$

The vectors are drawn starting at input \((x,y)\) where the magnitude and direction is determined by \(\vec{F}(x,y)\). For example, the plots for \(\vec{F}=\left\langle\; x,\; y \right\rangle\), and \(\vec{F}=\left\langle\; -y,\; x \right\rangle\) are shown below.

Plot of \(\vec{F}=\left\langle\; x,\; y \right\rangle\)
Plot of \(\vec{F}=\left\langle\; -y,\; x \right\rangle\)
“uniform rotation at unit angular velocity”


If an object is rotating in two dimensions, you can describe the rotation completely with a single value: the angular velocity, \(\omega=\phi/t\). Where a positive value indicates a counter-clockwise​ rotation.

For an object rotating in three dimensions, the direction can be described using a 3D vector, \(\vec{\omega}\). The magnitude of the vector indicates the angular speed; the direction indicates the axis around which it tends to swirl.

Right-hand rule for rotation

The direction of the angular velocity is determined by the convention called the right-hand rule for rotation:

When the object is rotating counter-clockwise, the direction of angular velocity is along with the circular path directed upwards.


Del (\(\nabla\)) is a shorthand form to simplify long mathematical expressions such as the Maxwell equations. Think of this symbol as loosely representing a vector of partial derivative operators $$ \newcommand{pdv}[1]{\tfrac{\partial}{\partial #1}} \nabla = \hat{\imath}\pdv{x} + \hat{\jmath}\pdv{y} + \hat{k}\pdv{z} $$ Or, in vector notation $$ \newcommand{pdv}[1]{\tfrac{\partial}{\partial #1}} \nabla = \left\langle \pdv{x}, \pdv{y}, \pdv{z} \right\rangle $$

Depending how \(\nabla\) is applied, it may denote: a gradient scalar field; the divergence of a vector field; or the curl of a vector field. Each of these are described below.


Let \(f\) be a scalar field with variables \(x,y,z\). The vector derivative of the scalar field \(f(x,y,z)\) is defined as the gradient. Denoted as the \(\nabla\) “multiplied” by a scalar field \(f\) $$ \newcommand{pdv}[1]{\tfrac{\partial}{\partial #1}} \begin{align} \nabla f &=\left\langle \pdv{x}, \pdv{y}, \pdv{z} \right\rangle f \\ &= \left\langle \pdv{x}f, \pdv{y}f, \pdv{z}f \right\rangle \end{align} $$

The gradient of \(f\) at point \((x,y)\) is a vector that points in the direction that makes the function \(f\) increase the fastest. The magnitude of the gradient at point \((x,y)\) equals the slope in that direction.


Find the gradient for scalar field \(f(x,y,z)=x+y^2+z^3\) $$ \newcommand{pdv}[1]{\tfrac{\partial}{\partial #1}} \begin{align*} \nabla f &=\left\langle \pdv{x}, \pdv{y}, \pdv{z} \right\rangle f \\ &=\left\langle \pdv{x}f, \pdv{y}f, \pdv{z}f \right\rangle \\ &=\left\langle 1, 2y, 3z^2 \right\rangle \end{align*} $$


For intuition, picture the vector field as a fluid where each vector describes the velocity at that point. Around some points, where all vectors point outward, the fluid just springs in to existence, as if there is a source. A positive divergence tells you how much of a source it is. Divergence is also positive if there is more flowing out than in that point.

Let \(\vec{v}\) be a vector field where \(v_x,v_y,v_z\) are each functions of variables \(x,y,z\). $$ \vec{v} = \left\langle v_x, v_y, v_z \right\rangle $$

The divergence of vector field \(\vec{v}\) is written as a dot-product $$ \newcommand{pdv}[1]{\tfrac{\partial}{\partial #1}} \begin{align} \nabla \cdot \vec{v} &=\left\langle \pdv{x}, \pdv{y}, \pdv{z} \right\rangle \cdot \left\langle v_x, v_y, v_z \right\rangle \\ &=\pdv{x}v_x + \pdv{y}v_y + \pdv{z}v_z \end{align} $$

When the divergence at point \((x,y)\) is positive, the density increases. In other words, more is coming in than is leaving at that point. For example the electric field of two electric charges

Electric field of charges \(p\) and \(q\)


Find the divergence for vector field \(\vec{v}(x,y,z)=\left\langle xy,yz,xz\right\rangle\) $$ \newcommand{pdv}[1]{\tfrac{\partial}{\partial #1}} \begin{align*} \nabla \cdot \vec{v} &=\left\langle \pdv{x}, \pdv{y}, \pdv{z} \right\rangle \cdot \left\langle xy,yz,xz\right\rangle \\ &= \pdv{x}xy + \pdv{y}yz + \pdv{z}xz \\ &= y + z + x = x + y + z \end{align*} $$ The result is a scalar.


For intuition, think about the vector field as a fluid flow. Imagine placing a tiny paddlewheel into the vector field at a point. Would it spin around? If it spins clockwise, it is said to have positive curl.

Curl in water

Let \(\vec{v}\) be a vector field where \(v_x,v_y,v_z\) are each functions of variables \(x,y,z\). $$ \vec{v} = \left\langle v_x, v_y, v_z \right\rangle $$

The curl (rotation) of vector field \(\vec{v}\) is written as a cross-product. $$ \newcommand{pdv}[1]{\tfrac{\partial}{\partial #1}} \begin{align} \nabla \times \vec{v} &=\left\langle \pdv{x}, \pdv{y}, \pdv{z} \right\rangle \times \left\langle v_x, v_y, v_z \right\rangle \end{align} $$

The cross product can be computed using the pseudo-determinant. $$ \require{color} \newcommand{pdv}[1]{\tfrac{\partial}{\partial #1}} \begin{align} \nabla \times \vec{v} &=\begin{vmatrix} \color{red}{\hat{\imath}} & \color{green}{\hat{\jmath}} & \color{blue}{\hat{z}} \\ \color{red}{\pdv{x}} & \color{green}{\pdv{y}} & \color{blue}{\pdv{z}} \\ \color{red}{v_x} & \color{green}{v_y} & \color{blue}{v_z} \end{vmatrix} \\ &=\color{red}{\hat\imath} \begin{vmatrix} \color{green}{\pdv{y}} & \color{blue}{\pdv{z}} \\ \color{green}{v_y} & \color{blue}{v_z} \end{vmatrix} – \color{green}{\hat\jmath} \begin{vmatrix} \color{red}{\pdv{x}} & \color{blue}{\pdv{z}} \\ \color{red}{v_x} & \color{blue}{v_z} \end{vmatrix} + \color{blue}{\hat z} \begin{vmatrix} \color{red}{\pdv{x}} & \color{green}{\pdv{y}} \\ \color{red}{v_x} & \color{green}{v_y} \end{vmatrix} \\ &=\left\langle \begin{array}{c} \color{green}{\pdv{y}} \color{blue}{v_z} – \color{blue}{\pdv{z}} \color{green}{v_y} \\ \color{blue}{\pdv{z}} \color{red}{v_x} – \color{red}{\pdv{x}} \color{blue}{v_z} \\ \color{red}{\pdv{x}} \color{green}{v_y} – \color{green}{\pdv{y}} \color{red}{v_x} \end{array} \right\rangle \end{align} $$


Find the curl for vector field \(\vec{v}(x,y,z)=\left\langle xy,yz,xz\right\rangle\) $$ \newcommand{pdv}[1]{\tfrac{\partial}{\partial #1}} \begin{align*} \nabla \times \vec{v} &= \left\langle \pdv{x}, \pdv{y}, \pdv{z} \right\rangle \times \left\langle xy,yz,xz\right\rangle \\ &= \begin{vmatrix} \hat\imath & \hat\jmath & \hat z \\ \pdv{x} & \pdv{y} & \pdv{z} \\ xy & yz & xz \end{vmatrix} \\ &= \left\langle \pdv{y}xz – \pdv{z}yz, -\left(\pdv{x}xz – \pdv{z}xy\right), \pdv{x}yz – \pdv{y}xy \right\rangle \\ &= \left\langle 0 – y, -(z – 0), 0 – x \right\rangle \\ &=\left\langle -y, -z, -x \right\rangle \end{align*} $$ The result is a vector.


If you prefer a visual explanation of divergence and curl, refer to YouTube



The length of a curve is called the arc length.

Arc length


The arc length of function graphs is explained using two examples.


Let \(f\) be a function of variable \(x\). $$ y = f(x) $$

We can approximate the length of a small segment \(\Delta s\) using the Pythagorean theorem. $$ \Delta s=\sqrt{(\Delta x)^2 + (\Delta y)^2} \nonumber $$

Adding all the segments gives us the approximate length of the curve $$ \sum\sqrt{(\Delta x)^2 + (\Delta y)^2} \nonumber $$

When we bring \(\Delta s\rightarrow 0\), the approximation becomes the accurate representation. To find the arc length \(s\), we sum all the segments.

The arc length of function graph follows as $$ \shaded{ s=\int\sqrt{(dx)^2 + (dy)^2} } \label{eq:functionint} $$ Note that the limits are conveniently omitted for now. The examples show how to add these.


Consider function \(f(x)\) between \(x=-1\) and \(x=1\) $$ y = f(x) = x^2 \label{eq:functiongraph} $$

Express \(dy\) in terms of \(dx\) in the example equation \(\eqref{eq:functiongraph}\) $$ \newcommand{dv}[1]{\tfrac{d}{d #1}} \begin{align*} y &= x^2 \\ \dv{x}y &= 2x \\ dy &= 2x\;dx \end{align*} $$

Substitute \(dy\) in the integral \(\eqref{eq:functionint}\), and place the bounds \(x=-1\) to \(1\) to find the curve length $$ \begin{align*} s &= \int_{-1}^{1}\sqrt{(dx)^2 + (2x\;dx)^2}\;dx \\ &=\int_{-1}^{1}\sqrt{1+4x^2}\;dx \end{align*} $$

Solve using wolframalpha returns approximately \(3.2671\).


Functions such as a circle on the (x,y) plane are more naturally described using polar coordinates. Consider the polar function of a circle between \(0\) and \(\pi\): $$ r=1 \;\land\; \theta \in \left[0,\pi\right) $$

Since the radius is 1, the value of \(\theta\) reflects the arc length \(\Delta s\) in radians. Bringing \(\Delta \theta\rightarrow 0\), we find the arc length by summing all the tiny segments: $$ s=\int d\theta \label{eq:polarint} $$

The arc length is found by placing the bounds \(x=-1\) to \(1\) in integral \(\eqref{eq:polarint}\). length $$ s = \int_{0}^{\pi}d\theta = \left[ \theta \right]_{0}^{\pi} = \pi \nonumber $$

Parametric curve

A parametric curve is a function with one-dimensional input and a multi-dimensional output.

Determining the length of a parametric curve is best described using an example:

Consider parametric curve \(f(t)\) $$ f(t) = \left\{ \begin{array}{l} f_x(t)=t^3-3t \\ f_y(t)=3t^2 \end{array} \right. \nonumber $$

Abbreviated this using vector notation $$ f(t) = \left\langle\; t^3-3t,\; 3t^2\; \right\rangle \label{eq:parmcurve} $$

What is the length of the curve between \(-1.5\) to \(1.5\)?

We find the arc length similar to function graphs using the integral \(\int\sqrt{(dx)^2+(dy)^2}\) where \(dx\) and \(dy\) represent the tiny change in \(x\) and \(y\) values from the start to the end of the line.

With parametric curves, since \(x\) and \(y\) are given as functions of \(t\), we write \(dx\) and \(dy\) in terms of \(dt\) by taking the derivative of these two functions. $$ \left\{ \begin{array}{ c l l } x=t^3-3t & \Rightarrow \frac{d}{dt}x = 3t^2-3 & \Rightarrow dx=(3t^2-3)\;dt \\ y=3t^2 & \Rightarrow \frac{d}{dt}y=6t & \Rightarrow dy=6t\;dt \end{array} \right. \nonumber $$

Putting these into the integral $$ \begin{align} \int\sqrt{(dx)^2+(dx)^2} &= \int\sqrt{((3t^2-3)dt)^2 + (6t\ dt)^2} \;dt \nonumber \\ &= \int\sqrt{(3t^2-3)^2 + (6t)^2} \;dt \nonumber \\ &= 3\int t^2+1 \;dt \label{eq:parametricfnc} \end{align} $$

Now everything is written in terms of \(t\). Place the bounds on the integral equation \(\eqref{eq:parametricfnc}\) $$ \begin{align*} 3\int_{-2}^{2} t^2+1 \;dt &= \left[ t^3+3t \right]_{-2}^{2} \\ &= (2^3-3(2)) – (3(-2)) \\ &= 28 \end{align*} $$

Quadratic equations

Eukleides of Alexandria (Euclid), a Greek mathematician, produced a general method to solve quadratic equation around 300 BC.

Quadratic equations are problems with squares, or “quadratus” in Latin. We derive a method to solve quadratic equations. In 2000 BC, the Babylonians developed an approach to solve problems which, in current notation would be a quadratic equation [wiki]. A arbitrary example of this is:\(\)

A field has a perimeter of 40 and an area of 96. What are the dimensions of this field?”

Around 300 BC, the Greek mathematician Euclid produced a general method to solve quadratic equations.


Before we introduce quadratic polynomials, let’s start by multiplying two linear functions (lines).

Multiplying linear functions

Let \(g(x)\) and \(h(x)\) be linear functions

$$ \left\{ \begin{array}{c} g(x)=x-s\\[5pt] h(x)=x-t \end{array} \right. $$

where \(x\) is a variable and \(s\) and \(t\) are constants.

The function value equals \(0\) at the \(x\)-intercepts.

$$ \left\{ \begin{array}{c} g(x_0)\equiv 0 \Rightarrow & x_0-s\equiv 0 \Rightarrow & x_0=s\\[5pt] h(x_0)\equiv 0 \Rightarrow & x_0-t\equiv 0 \Rightarrow & x_0=t \end{array} \right. $$

First, we define function \(f(x)\) as a scalar \(a\) multiplied with the functions \(g(x)\) and \(h(x)\)

$$ f(x)\equiv a\,g(x)\,h(x)=a(x-s)(x-t) \label{eq:GxHxFactorized} $$

This equation results in the parabola \(f(x)\) that shares \(x\)-intercepts with the two lines.

The interactive graph above visualizes the zero product property:

If the product of two quantities is equal to zero, then at least one of the quantities must be equal to zero. zero-product property


The expression \((x-s)(x-t)\) from function \(\eqref{eq:GxHxFactorized}\) can be expanded geometrically by representing the quantities \(x, -s, -t\) as line segments, and representing the products of two quantities by the area of a rectangle.

Geometric representation of \((x-s)(x-t)=x^2-(s+t)x+st\)

Now we multiply this expanded form with scalar \(a\):

$$ ax^2-a(s+t)x+a\,s\,t\label{eq:exanded} $$

The equation now fits in the standard form for a single variable quadratic polynomial expression.

$$ \begin{array}{c} ax^2+bx+c \\ \text{where}\ \ t=-a(s+t),\ \ c=a\,s\,t \end{array} \label{eq:quadraticExp} $$

This is called a quadratic expression, or a second order polynomial since the greatest power in the equation is two.

Solutions to Quadratic Equation

In the previous section we showed that multiplying two linear functions creates a quadratic function. Here we will do the opposite and bring the standard form quadratic \(\eqref{eq:quadraticExp}\) back to its factorized form.

$$ ax^2 + bx + c \equiv a(x-r_1)(x-r_2) \label{eq:quadratic} $$

We replaced the constants \(s\) and \(t\) with \(r_1\) and \(r_2\) to clearly denote them as the roots.

According to \(\eqref{eq:quadraticExp}\) these roots should add up to \(-\frac{b}{a}\) and while their product equals \(\frac{c}{a}\)

$$ r_1+r_2=-\frac{b}{a}\quad\land\quad r_1\,r_2=\frac{c}{a} \label{eq:factorize} $$

In the following section, we will derive a general formula for the roots using a method called “completing the square”.

Derive the Quadratic Formula

In the factorized form, \(r_1\) and \(r_2\) are the values of \(x\) for which the expression equals \(0\).

$$ a(x-r_1)(x-r_2)=0\ \Rightarrow\ \left| \begin{array}{l} x=r_1 \\ x=r_2 \end{array} \right. $$

This implies that the expanded form \(\eqref{eq:quadratic}\) must equal \(0\) for the same values of \(x\).

$$ ax^2+bx+c\equiv 0\label{eq:derive0} $$

Now we solve the equation \(\eqref{eq:derive0}\) by isolating \(x\) on the left

$$ x^2+\frac{b}{a}x = -\frac{c}{a} \label{eq:derive1} $$

The variable \(x\) occurs twice, which makes the equation hard to solve. We can find the solutions by working towards the identity:

$$ \color{green}{p}^2+2\color{green}{p}\color{purple}{q}+\color{purple}{q}^2=(\color{green}{p}+\color{purple}q)^2\nonumber $$

Then we multiply equation \(\eqref{eq:derive1}\) by \(4a^2\)and add \(\color{purple}b^2\) to both sides.

$$ \begin{equation} \begin{split} x^2+\frac{b}{a}x &=-\frac{c}{a} & \times 4a^2 \nonumber\\ \Leftrightarrow\quad(\color{green}{2ax})^2+4abx &=-4ac & +b^2 \nonumber\\ \Leftrightarrow\quad(\color{green}{\underline{2ax}})^2+2(\color{green}{\underline{2ax}})\color{purple}{\underline{b}}+\color{purple}{\underline{b}}^2&=-4ac+\color{purple}{b}^2 \end{split} \label{eq:derive2} \end{equation} $$

The left side now fits the format of the identify, where \(\color{green}p=\color{green}{2ax}\) and \(\color{purple}q=\color{purple}b\). Now we apply the identity to equation \(\eqref{eq:derive2}\)

$$ \begin{equation} \begin{split} (\color{green}{2ax}+\color{purple}{b})^2&=b^2-4ac & \text{take the }\sqrt{\color{white}{1}}\nonumber\\ 2ax+b &= \pm\sqrt{b^2-4ac} & \text{solve for }x\nonumber\\ 2ax &= -b\pm\sqrt{b^2-4ac}\nonumber\\ x&=\frac{-b\pm\sqrt{b^2-4ac}}{2a} \end{split}\label{eq:roots} \end{equation} $$

The roots found by this equation are the values of \(x\) that are solutions to \(\eqref{eq:roots}\). This implies that the expression \(ax^2+bx+c\) can be factorized as:

$$ \shaded{ \begin{split} a^2+bx+c &=a(x-r_1)(x-r_2) \nonumber \\ \text{where}\ \ r_{1,2} &=\frac{-b\pm\sqrt{b^2-4ac}}{2a} \nonumber \end{split} } \label{eq:quadroots} $$


A polynomial of the second power always has two roots, though they may or may not be distinct or real.

The expression under the square root sign in \(\eqref{eq:quadroots}\) is called the discriminant, or \(D\).

$$ D = b^2-4ac $$

The discriminant determines the nature of the roots.

  • when \(D>0\), there are two distinct real roots.
  • when \(D=0\), there is a double real root.
  • when \(D\lt0\), there are two complex roots.

The unit Complex Numbers defined the imaginary unit \(i\) as \(i^2=-1\) and showed that each complex number \(z\) consists of a real part \(x\) and an imaginary part \(iy\), where \(x\) and \(y\) are real numbers. Using \(i\) as the imaginary unit, we can denote any complex number \(z\) as:

$$ z = x+iy\label{eq:zdef} $$

These complex numbers extend the number line to a two-dimensional \(\mathbb{C}\)-plane as shown below.

z on the C-plane
Point \(z\) on the \(\mathbb{C}\)-plane

Real roots

The distinct or double real roots can be easily visualized by plotting the function and finding the \(x\)-intercepts. These intercepts are the roots.


In the figures above the function depicted in the graph on the left has real roots at \(1\) and \(3\), and the function on the right has a double root at \(2\).

Where are the imaginary roots?

When the roots of a function are complex, the quadratic function \(\eqref{eq:quadratic}\) has a graph that doesn’t intersect the \(x\)-axis as shown below. So where are those roots?


The quadratic formula \(\eqref{eq:quadroots}\) tells us that the roots for the function depicted in the graph above are \(2+2j\) and \(2-2j\).

$$ x^2-4x+8=(x-2-2j)(x-2+2j)\label{eq:zquadratic} $$

To find these complex roots visually, we need to broaden our perspective and allow the independent variable \(x\) to have complex values. After all, these roots are complex values. From here on we will name the variable \(z\) instead of \(x\) to emphasize that \(z\in\mathbb{C}\). We will reuse \(x\) for the real part of \(z\).

Evaluate quadratic expression \(\eqref{eq:quadratic}\) with variable \(z\) follows

$$ \begin{equation} \begin{split} f(z) &= az^2+bz+c &\forall_{z\in\mathbb{C}}\\ \text{where}\quad z &\equiv x+jy &\forall_{x,y\in\mathbb{R}}\\ \text{and}\ \ f(z) &\equiv u+vj \end{split} \label{eq:fzdef} \end{equation} $$

With a complex function argument \(x+yj\) \(\eqref{eq:zdef}\) and a complex function value \(u+iv\) \(\eqref{eq:fzdef}\), we need four mutually perpendicular axes \(x,y,u,v\) to graph the function. The catch: we can’t graph a 4-dimensional function.

To reduced the graph to a 3-dimensions, either

  1. Only consider variables \(z\) for which the function value is a real number, and graph the function value on z-axis perpendicular to the \(\mathbb{C}\)-plane. The roots will be where the graph intersects the \(\mathbb{C}\)-plane.
  2. Consider all variables \(z\in\mathbb{C}\), but take only the modulus of the function value \(|f(z)|\).

The following sections describe each of two visualization techniques. [1]

Plotting real function values

Now we graph the function \(\eqref{eq:zquadratic}\), considering only values of \(z\) for which the function value \(f(z)=u+jv\) is real (v=0).

We can let the variable be \(z=x+jy\) and split the function value \(\eqref{eq:fzdef}\) into real and imaginary parts

$$ \begin{equation} \begin{split} f(z)&=az^2+bz+c & z\equiv x+iy\nonumber\\ &=a(x+jy)^2+b(x+jy)+c &\text{expand}\nonumber\\ &=ax^2+2axjy-ay^2+bx+jby+c &\text{split Re/Im}\nonumber\\ &=\underbrace{(ax^2-ay^2+bx+c)}_{\text{real part}} + \underbrace{y(2ax+b)j}_{\text{imaginary part}} \end{split} \end{equation} $$

The imaginary part of the function value must be \(0\).

$$ y(2ax+b)=0 \ \Rightarrow\ \left| \begin{array}{l} y=0 \\ x=-\frac{b}{2a} \end{array} \right. $$

Substitute the value \(y=0\) in \eqref{eq:zdef}

$$ \begin{equation} \begin{split} z_r&=x+jy &\text{subst }y=0 \nonumber\\ &=x+0j &\forall_{x\in\mathbb{R}} \end{split}\label{eq:zr} \end{equation} $$

Do the same for \(x=-\tfrac{b}{2a}\)

$$ \begin{equation} \begin{split} z_c &=x+yj &\text{subst }x=-\tfrac{b}{2a} \nonumber \\ &=-\tfrac{b}{2a}+yj &\forall_{y\in\mathbb{R}} \end{split}\label{eq:zc} \end{equation} $$

This implies that the function \(\eqref{eq:zquadratic}\) has a real-value

  1. when evaluated for \(z=x\) where \(x\in\mathbb{R}\), or
  2. when evaluated for \(z=-\frac{b}{2a}+yj\) where \(y\in\mathbb{R}\)

In the first case, evaluating for \(z=x\) where \(x\in\mathbb{R}\), means evaluating along the familiar \(x\)-axis. This is how we visualized the real roots.

$$ \begin{equation} \begin{split} f(z_r) &= az^2+bx+c &\text{subst }z_r=x\text{ from }\eqref{eq:zr}\nonumber\\ \Rightarrow\quad f(x) &= ax^2+bx+c \end{split}\label{eq:fx} \end{equation} $$

In the second case, evaluating for \(z=-\tfrac{b}{2a}+yj\) where \(y\in\mathbb{R}\), means evaluating along a line that intersects the point \(\left(-\frac{b}{2a}+0j\right)\) and runs parallel to the imaginary \(y\)-axis as shown in the figure below.

The lines \(x\) and \(-\frac{b}{2a}+yj\)

Substituting \(-\frac{b}{2a}+yj\) for \(z\) in \(\eqref{eq:zquadratic}\).

$$ \newcommand\ccancel[2][black]{\color{#1}{\cancel{\color{black}{#2}}}} \newcommand\cbcancel[2][black]{\color{#1}{\bcancel{\color{black}{#2}}}} \newcommand\ccancelto[3][black]{\color{#1}{\cancelto{#2}{\color{black}{#3}}}} \begin{equation} \begin{split} f(z) &= az^2+bz+c \quad\quad\quad\text{subst }z=-\frac{b}{2a}+yj\text{ from }\eqref{eq:zc}\\ f\left(-\frac{b}{2a}+yj\right) &= a\left(-\frac{b}{2a}+yj\right)^2+b\left(-\frac{b}{2a}+yj\right)+c\nonumber\\ &= {\ccancel[red]{a}}\frac{b^2}{4a^{\ccancel[red]{2}}}-{\ccancel[red]{2a}}\frac{\cbcancel[blue]{b}}{\ccancel[red]{2a}}{\cbcancel[blue]{yj}}-ay^2-\frac{b^2}{2a}+{\cbcancel[blue]{byj}}+c \end{split}\label{eq:fy0} \end{equation} $$

This makes \eqref{eq:fy0} a function of \(y\) because \(a,b\) and \(j\) are constants.

$$ \shaded{ f(y)=-ay^2-\frac{b^2}{4a}+c\quad\forall_{y\in\mathbb{R}} } \label{eq:fy} $$

Similar to how the graph for \(f(x)\) intersects the \(\mathbb{C}\)-plane at the real roots, the graph for \eqref{eq:fy} intersects the \(\mathbb{C}\)-plane at the complex roots of the function.

The interactive graph shown below visualizes this concept. Click Interact and wait for the model to load.

Interactive graph of quadratic equation showing complex roots

Plotting the modulus of the function values

Here we graph the function \(\eqref{eq:zquadratic}\) by considering all values of \(z\), but only plotting the modules of the function value \(f(z)\). The modulus \(|f(x+jy)|\) is defined as:

$$ |f(x+jy)|\triangleq\sqrt{x^2+y^2}\nonumber $$

To find the modulus of the function value, we can apply the definition \(\eqref{eq:zdef}\) to the quadratic equation \(\eqref{eq:zquadratic}\).

$$ \begin{equation} \begin{split} f(z) &= az^2+bz+c\nonumber\\\ f(x+jy) &= (ax^2-ay^2+bx+c) +y(2ax+b)j\nonumber\\ \Rightarrow\quad |f(x+jy)| &= \sqrt{(ax^2-ay^2+bx+c)^2 +y^2(2ax+b)^2}\\ \text{where}\quad x &\in\mathbb{R}\ \land\ y\in\mathbb{R}\nonumber \end{split} \end{equation} $$

This equation implies that the function arguments are two independent variables. In the graph we depict them as a horizontal complex plane with points \(z=x+jy\). The \(z\)-axis of the graph is used for the modulus of the function value. In this so called modulus surface, color is used to show the angle of the function value.

While this surface is well suited to depict real and complex values, it has some drawbacks. The most obvious one is that it shows the modulus of the function where \(|f(z)|\geq0\). As a consequence, the parabolic shape is harder to recognize.

In the graph below, we added the real function values \(f(z_c)\) from the previous visualization technique that showed negative values. In the surface plot, these same function values are represented on the positive \(z\)-axis but shown in red to signify an angle of \(\varphi=\pi\), where \(\mathrm{e}^{j\pi}=-1\).

To draw a modulus surface, you can either use the model below, or you can using the code shown in Appendix A. Click Interact, wait for the model to load and click on \(|f(z)|\).

Interactive graph of quadratic equation showing complex roots

Appendix A

[x,y] = meshgrid(-10: 0.1: 10);
z = x + i*y;
fz = z.^2 - 4.*z + 8;
surf(x,y,abs(fz), angle(fz));
xlabel("x"); ylabel("y"); zlabel ("|f(x+jy)|");
shading interp;


[1] The Complex Roots of a Quadratic Equation: A Visualization Carmen Q Artino, Professor in Mathematics at The College of Saint Rose, Albany, NY. Parabola Volume 45, Issue 3 (2009)

Functions of complex numbers

Perplexed-Girl doing math

\(\)This introduces the functions with complex arguments. The article Complex Numbers introduced a 2-dimensional number space called the complex-plane (\(\mathbb{C}\)-plane). The arithmetic functions, that we studied since first grade, gracefully extend from the one-dimensional number line onto this new \(\mathbb{C}\)-plane. Here we will introduce functions that operate on these complex numbers.

\(j\) We refer to the imaginary unit as “\(j\)”, to avoid confusion with electrical engineering, where the variable \(i\) is already used for current.

An overview of the functions is given for reference. We will proof the some of these functions in subsequent paragraphs.


Consider a complex number \(z\) expressed in either notation style

$$ z = x+jy=r\,(\cos\varphi+j\sin\varphi)=r\,\mathrm{e}^{j\varphi}\nonumber $$

As you wish

$$ \newcommand{\parallelsum}{\mathbin{\!/\mkern-5mu/\!}} \begin{align} z_1+z_2&=(x _1+x _2)+j\,(y _1+y _2) \\[6mu] z_1-z_2&=(x _1-x _2)+j\,(y _1-y _2) \\[6mu] z_1\,z_2&=r_1r_2\ \mathrm{e}^{j \cdot (\varphi_1+\varphi_2)} \\[6mu] \tfrac{1}{z} &= \tfrac{1}{r}\,\mathrm{e}^{-j\varphi}\\[6mu] \frac{z_2}{z_1} &= \frac{r_1}{r_2}\,\mathrm{e}^{j(\varphi _1-\varphi_2)}\\[6mu] z_1\parallelsum z_2 &= \frac{z_1\, z_2}{z_1+z_2}\\[6mu] \mathrm{e}^z &=\mathrm{e}^x\sin y + j\,\mathrm{e}^x\cos y\\[6mu] \ln z&= \ln r+j\,\varphi\\[6mu] {z_2}^{z_1} &= {r_1}^{x _2}\,\mathrm{e}^{-y_2\,\varphi_1}\,\mathrm{e}^{j \cdot (x _2\,\varphi_1+\,y_2\ln r_1)} \\[6mu] \sqrt[n]{z} &= \sqrt[n]{r}\,\mathrm{e}^{j\varphi/n} \end{align} $$

Circular based trigonometry

$$ \begin{align} \sin z &= \sin x\cosh y + j\,\cos x\sinh y \\[6mu] \cos z &= \cos x\cosh y + j\,\sin x\sinh y \\[6mu] \tan z &= \frac{\sin(2 x)}{\cosh(2 y) + \cos(2 x)} + j\,\frac{\sinh(2 y)}{\cosh(2 y) + \cos (2 x)} \\[6mu] \csc z &= {(\sin z)}^{-1} \\[6mu] \sec z &= {(\cos z)}^{-1} \\[6mu] \cot z &= {(\tan z)}^{-1} \\[6mu] \end{align} $$

Inverse circular based trigonometry

$$ \DeclareMathOperator{\asin}{asin} \DeclareMathOperator{\sgn}{sgn} \DeclareMathOperator{\acos}{acos} \DeclareMathOperator{\atan}{atan} \DeclareMathOperator{\acsc}{acsc} \DeclareMathOperator{\asec}{asec} \DeclareMathOperator{\acot}{acot} \begin{align} \asin z &= \asin b +j\,\sgn(y)\ln\left(a + \sqrt{a^{\mathrm{e}}}-1\right), \quad a\geq1 \land b \text{ in } [\mathrm{rad}]\\[6mu] \acos z &= \acos b +j \sgn(y) \ln\left(a + \sqrt{a^{\mathrm{e}}}-1\right),\quad a\geq1 \land b \text{ in } [\mathrm{rad}]\\[6mu] \text{where}\quad a &= \tfrac{1}{2} \left( \sqrt{(x +1)^{2} + y ^{2} } + \sqrt{ (x -1)^{2} + y^{2}} \right),\nonumber \\[6mu] b &= \tfrac{1}{2} \left( \sqrt{(x +1)^{2} + y ^{2} } – \sqrt{ (x -1)^{2} + y^{2}} \right),\nonumber \\[6mu] \sgn(a) &= \begin{cases}-1 & a \lt 0\\[6mu]1 & a \geq 0\end{cases} \nonumber \\[6mu] \atan z &= \tfrac{1}{2}\left(\pi – \atan\left(\frac{1+ y}{x}\right) -\atan\left(\frac{1-y}{x}\right)\right) \\ &\quad +j\,\tfrac{1}{4}\,\ln\left( \frac{\left(\frac{1+y}{x}\right)^2 +1}{\left(\frac{1-y}{x}\right)^2 +1} \right) \\[6mu] \acsc z &= \asin(z^{-1}) \\[6mu] \asec z &= \acos(z^{-1}) \\[6mu] \acot z &= \atan(z^{-1}) \\[6mu] \end{align} $$

Hyperbolic based trigonometry

$$ \DeclareMathOperator{\csch}{csch} \DeclareMathOperator{\sech}{sech} \begin{align} \sinh z &= \cos y \sinh x + j\,\sin y\cosh x \\[6mu] \cosh z &= \cos y \cosh x + j\,\sin y\sinh x \\[6mu] \tanh z &= \frac{\sinh(2y)}{\cosh(2x)} +j\,\frac{\sin(2 y)}{\cosh(2 x) + \cos(2y)}\\[6mu] \csch z &= {(\sinh z)}^{-1} \\[6mu] \sech z &= {(\cosh z)}^{-1} \\[6mu] \coth z &= {(\tanh z)}^{-1} \\[6mu] \end{align} $$

Inverse hyperbolic based trigonometry

$$ \DeclareMathOperator{\asin}{asin} \DeclareMathOperator{\acos}{acos} \DeclareMathOperator{\atan}{atan} \DeclareMathOperator{\acsc}{acsc} \DeclareMathOperator{\asec}{asec} \DeclareMathOperator{\acot}{acot} \DeclareMathOperator{\csch}{csch} \DeclareMathOperator{\asinh}{asinh} \DeclareMathOperator{\acosh}{acosh} \DeclareMathOperator{\atanh}{atanh} \DeclareMathOperator{\acsch}{acsch} \DeclareMathOperator{\asech}{asech} \DeclareMathOperator{\acoth}{acoth} \begin{align} \asinh z &= -j \asin(jz) \\[6mu] \acosh z &= j \acos z \\[6mu] \atanh z &= -j \atan(jz) \\[6mu] \acsch z &= j \acsc(jz) \\[6mu] \asech z &= -j \asec z \\[6mu] \acoth z &= j \acot(jz) \\[6mu] \end{align} $$

This table formed the basis of software like Complex Arithmetic for HP-41cv/cx.


Without further ado, we introduce the proofs for some common complex functions


Consider adding the numbers \(z_1\) and \(z_2\) in cartesian form

$$ z_1+z_2 = (x_1+i\,y_1)+(x_2+i\,y_2) $$


$$ \shaded{z_1+z_2=(x_1+x_2)+i(y_1+y_2)} $$

This can be visualized similar to adding real numbers by putting the vectors head to tail

Visualization of complex addition


Consider subtracting the numbers \(z_1\) and \(z_2\) in cartesian form

$$ z_1-z_2 = (x_1+i\,y_1) – (x_2+i\,y_2) $$

So that

$$ \shaded{ z_1-z_2=(x_1-x_2)+i(y_1-y_2) } $$

This can be visualized similar to the subtraction of real numbers by rotating the subtrahend by \(\pi\) and the putting them head to tail

Visualization of complex addition


Consider the product \(z_1\,z_2\) in polar form, using the trig identities

$$ \begin{align} \cos\alpha\cos\beta-\sin\alpha\sin\beta&=\sin(\alpha+\beta) \nonumber \\ \sin\alpha\cos\beta+\cos\alpha\sin\beta&=\sin(\alpha+\beta) \nonumber \end{align}\nonumber $$

$$ \require{enclose} \begin{align} z_1\,z_2&=r_1\enclose{phasorangle}{\small\varphi_1}\ r_2\enclose{phasorangle}{\small\varphi_2}\\ &=r_1 (\cos\varphi_1+i\sin\varphi_1)\ r_2 (\cos\varphi_2+i\sin\varphi_2)\nonumber\\ &=r_1r_2\,((\cos\varphi_1\cos\varphi_2-\sin\varphi_1\sin\varphi_2)+i\,(\cos\varphi_1\sin\varphi_2+\sin\varphi_1\cos\varphi_2)) \end{align} $$

From what follows that

$$ \shaded{ z_1\,z_2=r_1r_2\,(\cos(\varphi_1+\varphi_2)+i\,\sin(\varphi_1+\varphi_2) } $$


$$ \begin{align} |z_1\,z_2|&=|z_1|\,|z_2|\\[6mu] \angle(z_1\,z_2)&=\angle z_1+\angle z_2 \end{align} $$

This can be visualized as adding the angles and multiplying the lengths of the vectors

Visualization of complex addition


Consider the quotient \(\frac{z_1}{z_2}\) in polar form

$$ \require{enclose} \begin{align} \frac{z_1}{z_2}&=\frac{r_1\enclose{phasorangle}{\small\varphi_1}}{r_2\enclose{phasorangle}{\small\varphi_2}} =\frac{r_1 \left({\cos \varphi_1 + i \sin \varphi_1}\right)} {r_2 \left({\cos \varphi_2 + i \sin \varphi_2}\right)}\nonumber\\[6mu] &=\frac{r_1}{r_2}\,\frac{\cos\varphi_1+i\sin\varphi_1}{\cos\varphi_2+i\sin\varphi_2}\ \frac{\cos \varphi_2 – i \sin \varphi_2}{\cos \varphi_2 – i \sin \varphi_2},&\text{product rule}\nonumber\\[6mu] &=\frac{r_1}{r_2}\,\frac{\cos \left({\varphi_1 – \varphi_2}\right) + i \sin \left({\varphi_1 – \varphi_2}\right)}{\cos \left({\varphi_2 – \varphi_2}\right) + i \sin \left({\varphi_2 – \varphi_2}\right)}\nonumber\\[6mu] &=\frac{r_1}{r_2}\,\frac{\cos \left({\varphi_1 – \varphi_2}\right) + i \sin \left({\varphi_1 – \varphi_2}\right)} {\cos 0 + i \sin 0}\\[6mu] \end{align} $$

So that

$$ \shaded{ \frac{z_1}{z_2}=\frac{r_1}{r_2}{\Large(}{\cos \left({\varphi_1 – \varphi_2}\right) + i \sin \left({\varphi_1 – \varphi_2}\right)}{\Large)} } $$


$$ \begin{align} \left|\frac{z_1}{z_2}\right|&=\frac{|z_1|}{|z_2|}\\[6mu] \angle\frac{z_1}{z_2}&=\angle z_1-\angle z_2 \end{align} $$

This can be visualized as subtracting the angles and dividing the lengths of the vectors

Visualization of complex addition

\(n\)th power

Consider the power \(z^n\) in polar form, where \(n\in\mathbb{Z}^+\}\)

$$ z^n=(r(\cos\varphi+i\sin\varphi))^n\nonumber\\ $$

Using Euler’s formula

$$ \cos\phi+i\sin\phi=\mathrm{e}^{i\phi}\nonumber $$

$$ \require{enclose} \begin{align} z^n&=\left(r\enclose{phasorangle}{\small\varphi}\right)^n\\ &=\left(r\,\mathrm{e}^{i\varphi}\right)^n\nonumber\\ &=r^n\,\mathrm{e}^{in\varphi},&\text{Euler’s formula} \end{align} $$

So that

$$ \shaded{ z^n=r^n\,(\cos(n\varphi)+i\sin(n\varphi)) } $$


$$ \begin{align} |z^n| &= {|z|}^n\\[6mu] \angle(z^n) &= n\angle z \end{align} $$

This can be visualized multiplying the angles with \(n\) and taking the \(n\)th power of the length of the vector

Visualization of complex power with real exponent

\(n\)th root

Consider the \(n\)th root \(\sqrt[n]{z}\) in polar form, where \(n\in\mathbb{Z}^+\}\)

$$ \require{enclose} \begin{align} \sqrt[n]{z}&=\sqrt[n]{r\enclose{phasorangle}{\small\varphi}} =\left(r\,\mathrm{e}^{i\varphi}\right)^{\frac{1}{n}},&\text{Euler’s formula}\nonumber\\ &=r^{\frac{1}{n}}\,\mathrm{e}^{i(\varphi+2k\pi)/n},\quad k\in\mathbb{Z},&\text{Euler’s formula} \end{align} $$


$$ \shaded{ \sqrt[n]{z}=\sqrt[n]{r}\,\left(\cos\frac{\varphi+2k\pi}{n}+i\sin\frac{\varphi+2k\pi}{n}\right),\quad k\in\mathbb{Z} } \label{eq:root} $$


$$ \begin{align} |\sqrt[n]{z}|&=\sqrt[n]{|z|},\\[6mu] \angle\,\sqrt[n]{z}&=\frac{\angle z+2k\pi}{n},\quad k\in\mathbb{Z} \end{align} $$

This can be visualized dividing the angles by \(n\) and taking the \(n\)th root of the length of the vector. The other vectors will be separated by \(\frac{2\pi}{n}\) radians.

Visualization of complex root with real exponent

Wait a minute

Depending on how we measure the angle \(\varphi\), we get different answers? Correct, because adding \(2k\pi\) to \(\varphi\) still maps to the same complex number, but may give a different function value.

In comparison, the functions that we saw described do not produce different results when adding extra rotations to the angle. Other multivalued functions are \(\log{z}\), \(\mathrm{arcsin}z\) and \(\mathrm{arccos}z\).

In general:

the \(n\)th root has \(n\) values,
because when we add \(2k\pi\) to the angle \(\varphi\), for \(k\in\mathbb{Z}\), we may get different results.

The big question becomes: how do we define the angle \(\varphi\)?

Different ways of measuring \(\varphi\)


Even real valued functions can have multiple values. Remember \(\sqrt{1}=\{-1,1\}\)? Using equation \(\eqref{eq:root}\), we find the function values that we are familiar with.

$$ \begin{align} \sqrt{1}&=\sqrt{\cos\varphi+i\sin\varphi},&\text{polar notation}\nonumber\\ &=\cos\frac{\varphi+2k\pi}{2}+i\sin\frac{\varphi+2k\pi}{2},&\text{equation }\eqref{eq:root}\nonumber\\ &=\cos\frac{\varphi+2k\pi}{2},&k\in\mathbb{Z}\nonumber\\ &=\left\{1,-1\right\} \end{align} $$

all roots have magnitude \(1\), but their angles \(\varphi\) are \(\pi\) apart.

Similarly, the cube root \(\sqrt[3]{1}\) has three roots, two of which are complex. All roots have magnitude \(1\), but their angles \(\varphi\) are \(\frac{2\pi}{3}\) apart.

$$ \require{enclose} \begin{align} y_1&=1\,\enclose{phasorangle}{0}=\cos0+i\sin0=1\nonumber\\[8mu] y_2&=1\,\enclose{phasorangle}{\small\tfrac{2\pi}{3}}=\cos\frac{2\pi}{3}+i\sin\frac{2\pi}{3}=-\tfrac{1}{2}+\tfrac{1}{2}\sqrt{3}i\nonumber\\ y_3&=1\,\enclose{phasorangle}{\small\tfrac{4\pi}{3}}=\cos\frac{4\pi}{3}+i\sin\frac{4\pi}{3}=-\tfrac{1}{2}-\tfrac{1}{2}\sqrt{3}i\nonumber \end{align}\nonumber $$

Making it single-valued

For real valued arguments, we conventionally choose \(\varphi\) in the range \([0,2\pi)\) where the function is single-valued and where we find a positive function value. This default single-value is called the principal value.

Besides that the function \(\sqrt[n]{z}\) is not differentiable at \(0\), it has no discontinuities. To make the function single-valued, we can limit the range of \(\varphi\) similar to what we usually do for real valued arguments. The technical term for this is branch cut. We then only express \(\varphi\) so that it doesn’t cross the branch cut. Some common branch cuts are shown in the table below. In the table \(\mathbb{R}^-\) stands for the negative real axis.

Example branch cuts and their effect on the function value
Branch cut Range for \(\varphi\) Effect Consistent with
just under \(\mathbb{R}^-\) \((-\pi,\pi]\) \(\Re(z)\geq0\) Sqrt of real numbers
just under \(\mathbb{R}^+\) \([0,2\pi)\) \(\Im(z)\geq0\) Phase shift in waves

No matter where you define the branch cut, when \(z\) approaches a point on the branch cut from opposite sides, either the real or imaginary part of the function value abruptly changes signs. In practice, the best place for the branch cut depends on the application. For instance, it there is already a discontinuity at the point \(-1\), you may as well put the branch cut just under \(\mathbb{R}^-\).

Real and Imaginary part of \(\sqrt{z}\) as function of \(\varphi\)

We will use the \(\mathbb{C}\)-plane extensively as we explore the physic fields of electronics and domain transforms.

Complex numbers

Perplexed-Girl doing math

\(\)Instead of projecting the future merits of complex numbers, we will introduce them in an intuitive way. We draw a parallel to negative numbers that have been universally accepted around the same time.

We start this writing with a review of concepts that should be evident. Nevertheless, I encourage you to read through them, as we build on these concepts while introducing complex numbers.


Arithmetic gives us tools to manipulate numbers. It allows us to transform one number into another using transformations such as negation, addition, subtraction, multiplication and division.

Positive numbers

In first grade, we learned the concept of the number line and how numbers can be represented by vectors starting at \(0\). We visualized addition by putting these vectors head to tail, where the net length and direction is the answer.

Number line animation for \(5+3=8\)

Soon thereafter, we learned how to subtract numbers by rotating the subtrahend (the value that you subtract) vector and then putting the head to tail.

Number line animation for \(5-3=2\)

We will expand on this as we discuss negative numbers and imaginary numbers. Before we introduce such numbers, let’s also refresh on the concept of equations with squares and square roots.

Square and square root

When we solve the equation \(2x^2=8\), we look for a transformation (\(\times x\)) that, when applied twice, turns the number \(2\) into \(8\).

$$ 2 \times x \times x = 8 $$

As shown in the animation below, the two solutions \(x=2\) and \(x=-2\), both satisfy the equation \(2x^2=8\).

Number line animation for \(2 \times 2 \times 2=8\)

Negative numbers

Negative numbers have lingered around since 200 BC, but with mathematics based on geometrical ideas such as length and count, there was little place for negative numbers. After all, how can a pillar be less than nothing in height? How could you own something less than nothing?

Even a hundred years after the invention of algebra in 1637, the answer to \(4=4x+20\) would be thought of as absurd as illustrated by the quotes:

Negative numbers darken the very whole doctrines of the equations and make dark of the things which are in their nature excessively obvious and simple. Francis Maseres, British mathematician (1757)
To really obtain an isolated negative quantity, it would be necessary to cut off an effective quantity from zero, to remove something of nothing: impossible operation. Lazare Carnot, French mathematician (1803)


Some mathematicians in the 17th century discovered that negative numbers did have their use in solving cubic and quadratic equations, provided they didn’t worry about the meaning of these negative numbers. While the intermediate steps of their calculations may involve negative numbers, the solutions were typically real positive numbers.

Only in the 19th century were negative numbers truly accepted when mathematicians started to approach mathematics in terms of logical definitions.

Physical meaning has given way to algebraic use.

The English mathematician, John Wallis (1616 – 1703) is credited with giving meaning to negative numbers by inventing the number line. Even today, the number line helps students as they actively construct mathematical meaning, number sense and an understanding of number relationships. We learned to use negative numbers without a thinking about the thousands of years it took to develop the principle.

You were probably introduced to the concepts of absolute value and direction through the geometric representation on this number line. With negative numbers so embedded in our mathematics, we accept the solution to \(3-5\) without a second thought.

Number line animation for \(3-5=-2\)

In general, we learned that the negative symbol represents the “opposite” of a number. To change the sign of a number, we rotate its vector 180 degrees (\(\pi\)) around the point \(0\). We will extend on this important concept as we introduce imaginary numbers.

Imaginary numbers

First off, imaginary numbers are called such because for a long time they were poorly understood and regarded by some as fictitious. Even Euler, who used them extensively, once wrote:

Because all conceivable numbers are either greater than zero or less than 0 or equal to 0, then it is clear that the square roots of negative numbers cannot be included among the possible numbers [real numbers]. Consequently we must say that these are impossible numbers. And this circumstance leads us to the concept of such number, which by their nature are impossible, and ordinarily are called imaginary or fancied numbers, because they exist only in imagination. Leonhard Euler, Swiss mathematician and physicist, Introduction to Algebra pg.594


Around the same time that mathematicians were struggling with the concept of negative numbers, they also came across square roots of negative numbers.

Girolamo Cardano, an Italian mathematician who published the solution to cubic and quartic equations, studied a problem that in modern algebra would be expressed as \(x^2-10x+40=0\,\land\,y=10-x\)

Find two numbers whose sum is equal to 10 and whose product is equal to 40. Translation of Ars Magna chapter 37, pg 219 (1545) by Girolamo Cardano

Cardano states that it is impossible to find the two numbers. Nonetheless, he says, we will proceed. He goes on to provide two objects that satisfy the given condition. Cardano found a sensible answer by working through his algorithm, but he called these numbers “ficticious” because not only did they disappear during the calculation, but they did not seem to have any real meaning.

This subtlety results from arithmetic of which this final point is as I have said as subtle as it is useless. Translation of Ars Magna chapter 37, pg 219 (1545) by Girolamo Cardano


In 1849, Carl Friedrich Gauß, produced a rigorous proof for complex numbers what gave a big boost to the acceptance of these numbers in the mathematical community.

With this historic perspective, let’s see how these imaginary numbers fit into what we know about number theory.

We learned that the negative symbol represents the “opposite” of a number. On the number line we can represent this by rotating the vector that represents the number 180° (\(\pi\)) around the origin \(0\).

Let’s dive straight in and consider the equation

$$ \begin{align} x^2&=-1\\ \Rightarrow\quad 1\times x\times x&=-1\label{eq:1tomin1} \end{align} $$

What multiplication with \(x\), when applied twice, turn \(1\) into \(-1\)? Multiplying twice by a positive number gives a positive result. Same for a negative number.

Time to take a step back: we said that a negation represents a rotation of \(\pi\) around the origin. What if we rotate the vector \(1\) by \(\frac{\pi}{2}\) twice, and worry about its meaning later.

Number line animation multiplying \(1\) twice so that we get \(-1\)

Indeed, twice rotating the vector \(1\) around the origin by \(\frac{\pi}{2}\) gives us \(-1\). All that we have left to do, is to find a name for the vector where \(1\) is rotated by \(\frac{\pi}{2}\). To credit Euler’s “imaginary or fancied numbers”, we call it \(i\). The first multiplication turns \(1\) into \(i\), and the second multiplication turns \(i\) into \(-1\).

More over, rotating in the opposite direction works as well. The first multiplication turns \(1\) into \(-i\), and the second multiplication turns \(-i\) into \(-1\). So there are two square roots of \(-1\): \(i\) and \(-i\). We have two solutions to \(x^2=-1\)!

Alternate number line animation multiplying \(1\) twice so that we get \(-1\)

Interpretation of “\(i\)”

What is the meaning of this mysterious value \(i\)? The first rotation turned \(1\) into \(i\), so the rotation is a visualization of multiplying with \(i\). Rotating \(i\) once more turns \(i\) into \(-1\).

Number line animation for \(1 \times i \times i=-1\)

Substituting \(x=i\) in \(\eqref{eq:1tomin1}\) implies that \(i\times i=-1\) or written as

$$ \shaded{ i^2=-1 } $$

By introducing an axis perpendicular to the number line, we have extended or number space to a two dimensional plane called \(\mathbb{C}\). This \(\mathbb{C}\)-plane includes the real numbers from the real number line, along with imaginary numbers on the \(i\)-axis and every combination thereof. We call this new number set “Complex Numbers“.


By introducing \(i\), we have added a dimension to the number line. With that come three new notation forms that each have their own use case. We will express the complex number \(z\) as shown below in these forms.

Point in complex plane as cartesian, polar and exponential form
Relation between cartesian, polar and exponential forms

Cartesian form

Each complex number has a real part \(x\) and an imaginary part \(y\), where \(x\) and \(y\) are real numbers. Using \(i\) as the imaginary unit, we can denote a complex number \(z\) as

$$ \shaded{z=x+iy} $$

The point \(z\) can be specified by its rectangular coordinates \((x,y)\), where \(x\) and \(y\) are the signed distances to the imaginary \(y\) and real \(x\)-axis. This \(xy\)-plane is commonly called the complex plane \(\mathbb{C}\).

This cartesian form is a logical extension of the number line and will prove useful when adding or subtracting complex numbers.

Polar form

The same point \(z\) can be specified by its polar coordinates \((r,\varphi)\), where \(r\) is the distance to the origin and \(\varphi\) is the angle of the vector, in radians, with the positive \(x\)-axis. With \(r\in\mathbb{R}^+\) and \(\varphi\in\mathbb{R}\), we can describe point \(z\) as

$$ \require{enclose} \shaded{ r\enclose{phasorangle}{\small\varphi}=r\,(\cos\varphi+i\sin\varphi) } $$

here \(r\) corresponds to modulus \(|z|\), and \(\varphi\) is called the argument. The value \(z=0\) was excluded because the angle \(\varphi\) is not defined that that point.

The polar form simplifies the arithmetic when used in multiplication or powers of complex numbers.

From the illustration, it is clear how to convert from cartesian to polar form

$$ \shaded{ \begin{align} x&=r\cos\varphi\nonumber\\ y&=r\sin\varphi\nonumber \end{align}} $$ and back $$ \shaded{\begin{align} r&=\sqrt{x^2+y^2}\nonumber\\[6mu] \varphi&=\mathrm{atan2}(y,x)\nonumber\\[10mu] \end{align} } $$

Here \(\mathrm{atan2(y,x)}\) prevents negative signs in \(\arctan\frac{y}{x}\) from canceling each other out. Otherwise, we would not be able to distinguish \(\varphi\) in the 1st from that in the 3rd quadrant, or \(\varphi\) in the 2nd from that in the 4th.

$$ \begin{align} \mathrm{atan2}(y,x) &= \begin{cases} \arctan\left(\frac{y}{x}\right) & x\gt0\nonumber\\ \arctan\left(\frac{y}{x}\right)+\pi & x\lt0 \land y\geq0\nonumber\\ \arctan\left(\frac{y}{x}\right)-\pi & x\lt0 \land y\lt0\nonumber\\ \frac{\pi}{2} & x= 0 \land y\gt0\nonumber\\ -\frac{\pi}{2} & x= 0 \land y\lt0\nonumber\\ \text{undefined} & x= 0 \land y = 0\nonumber \end{cases} \end{align} $$

Consider complex number \(z\) with angle \(\varphi_0\). If you make any integer number of rotation rotate around the origin, you will be back the your initial starting point. Since a full rotation corresponds to an angle of \(2\pi\), the same point \(z\) can be described as

$$ \require{enclose} r\enclose{phasorangle}{\small\varphi+2k\pi}=r\,(\cos(\varphi+2k\pi)+i\sin(\varphi+2k\pi)),\quad k\in\mathbb{Z} $$

We will the effects of this further when discussing multi-valued functions such as square root.

Exponential form

Euler’s formula was introduced in a separate write-up as

$$ \mathrm{e}^{i\varphi} = \cos\varphi+j\sin\varphi \nonumber $$

Using Euler’s formula we can rewrite the polar form of a complex number into its exponential form

$$ \shaded{ z=r\,\mathrm{e}^{i\varphi} } $$

Similar to the polar form, the angle can be expressed in infinite different ways

$$ \begin{align} z &= r\,\mathrm{e}^{i(\varphi+2k\pi)}, & k\in\mathbb{Z} \end{align} $$

This exponential form is often preferred over the polar form, because it reduces the need for trigonometry.

What next?

My follow-up article Complex Functions introduces functions that operate on complex numbers. Such functions include addition, subtraction, multiplication to the most obscure trigonometry functions.

We will use the \(\mathbb{C}\)-plane extensively as we explore electronics and domain transforms. From here on

we will refer to the imaginary unit as “\(j\)”, to avoid confusion with electronics where the variable \(i\) is already used for electrical current.

The same Leonard Euler that once said these numbers “only exist in our imagination” also used imaginary numbers to unite trigonometry and analysis in his most beautiful formula.

Linear differential equations

\(\)For a linear non-homogeneous differential equation with constant coefficients \(a_1\ldots a_n\) in the form

$$ \begin{align} \frac{\text{d}^nf(t)}{\text{d}t^n} + a_1\frac{\text{d}^{n-1}f(t)}{\text{d}t^{n-1}} + \cdots + a_{n-1}\frac{\text{d}f(t)}{\text{d}t} + a_nf(t)&=g(t)\nonumber\\[10mu] \overset{abbrev}{\Rightarrow}\quad f^{(n)}(t)+a_1f^{n-1}(t)+\cdots+a_{n-1}f'(t)+a_nf(t)&=g(t)\label{eq:bDV} \end{align} $$

the solution is the superposition of the natural response and the forced response of the system. In math speak, these are called the homogeneous solution \(f_h(t)\) and the particular solution \(f_p(t)\)

$$ \shaded{ f(t)=f_h(t)+f_p(t) } $$

To solve the linear non-homogeneous differential equation, we

  1. set the force \(g(t)=0\) and solve the natural response \(f_h(t)\),
  2. set the initial conditions \(f(0)=f^\prime(0)=f^{\prime\prime}(0)=\ldots=0\) and solve the forced response \(f_p(t)\),
  3. sum the forced response to the natural response to get the total response,
  4. use the initial conditions to resolve any constants.

Natural (homogeneous solution)

The natural response, \(f_h(t)\), is the behavior of a circuit due to initial conditions, but without input force. We suppress the input force \(g(t)=0\) and solve just the circuit itself. This makes the non-homogeneous differential equation \(\eqref{eq:bDV}\) into a homogeneous differential equation.

$$ {f_h}^{(n)}(t)+a_1{f_h}^{(n-1)}(t)+\cdots+a_{n-1}{f_h}^\prime(t)+a_nf_h(t) = 0 \label{eq:bDVh} $$

Leonhard Euler, in E10 (1728) and E62 (1739), realized that general homogeneous solutions have the form

$$ \shaded{ f_{h,i}=e^{pt} } $$

where \(p\in \mathbb{C}\). Substituting \(f_h(t)=\mathrm{e}^{pt}\) in homogeneous differential equation \(\eqref{eq:bDVh}\) gives the so-called characteristic equation

$$ \begin{align} p^n\mathrm{e}^{pt}+ a_1 p^{n-1}\mathrm{e}^{pt} + \cdots + a_n\mathrm{e}^{pt}&=0,&\div \mathrm{e}^{pt} \nonumber \\ \Rightarrow\quad p^n+ a_1 p^{n-1} + \cdots + a_n&=0 \end{align} $$

Substituting any of the roots of the polynomial \(p_1,p_2,\ldots p_n\) in \(\mathrm{e}^{pt}\) results in a solution base \(f_i(t)=\mathrm{e}^{p_it}\).

Given that homogeneous linear differential equations obey the superposition principle, any linear combination of these functions also satisfies the differential equation. Therefore, combining the \(n\) linear independent solutions \(f_1(t), f_2(t),\ldots,f_n(t)\), leads to the homogeneous solution with real arbitrary constants \(c_1,c_2,\ldots,c_n\)

$$ \shaded{ f_h(t)=c_1f_1(t)+c_2f_2(t)+\cdots+c_nf_n(t) } $$


  1. For double roots, or in general when a root \(p_i\) has multiplicity \(m\), the solution base is \(f(t)=t^{k-1}\,\mathrm{e}^{p_it}\) where \(k\in {0,1,\ldots,m-1}\).
  2. If the differential equation \(\eqref{eq:bDV}\) has real coefficients \(a_i\), complex solutions for \(p\) will only occur in complex conjugate pairs. The real-valued solutions are obtained by replacing each pair with their real-valued linear combinations \(\Re(f_1)\) and \(\Im(f_2)\), as in
    $$ \begin{align} \Re(f_1)=\frac{f_1+f_2}{2}\\ \Im(f_1)=\frac{f_1-f_2}{2j} \end{align} $$
    Applying Euler’s trigonometry identities, these will solutions will turn into \(\cos\) and \(\sin\) terms.

The example below shows how an homogeneous linear differential equation is solved.


Assume an homogeneous linear differential equation

$$ \DeclareMathOperator*{\dprime}{\prime\prime} \DeclareMathOperator*{\tprime}{\prime\prime\prime} \DeclareMathOperator*{\qprime}{\prime\prime\prime\prime} f^{\qprime}(t) – 2f^{\tprime}(t) + 2f^{\dprime}(t) – 2f^{\prime}(t) + f(t) = 0 $$

The characteristic equation and its factorized form follow as

$$ \begin{align} p^4-2p^3+2p^2-2p+1&=0\nonumber \\[6mu] \Rightarrow\quad (p-j)\,(p+j)\,(p-1)^2& = 0 \end{align} $$

the solution basis becomes

$$ \begin{align} f_1 &= \mathrm{e}^{jt}&\text{based on }p_1=j \nonumber \\ f_2 &= \mathrm{e}^{-jt}&\text{based on }p_2=-j \nonumber \\ f_3 &= \mathrm{e}^{t}&\text{based on }p_3=p_4=1 \nonumber \\ f_4 &= t\mathrm{e}^{t}&\text{based on }p_3=p_4=1 \nonumber \end{align} $$

Using Euler’s trigonometry identities

$$ \begin{align} \Re(f_1) &= \frac{f_1+f_2}{2}=\frac{\mathrm{e}^{jt}+\mathrm{e}^{-jt}}{2}=\cos(t) \nonumber \\ \Im(f_1) &= \frac{f_1-f_2}{2j}=\frac{\mathrm{e}^{jt}-\mathrm{e}^{-jt}}{2j}=\sin(t) \nonumber \end{align} $$

simplifies the solution basis to

$$ \begin{align} f_1 &= \cos(t)&\text{based on }p_1={p_2}^*=j\nonumber\\ f_2 &= \sin(t)&\text{based on }p_1={p_2}^*=j\nonumber\\ f_3 &= \mathrm{e}^{t}&\text{based on }p_3=p_4=1\nonumber\\ f_4 &= t\,\mathrm{e}^{t}&\text{based on }p_3=p_4=1\nonumber \end{align} $$

The homogeneous solution follows as a linear combination

$$ f_h(t) = c_1\cos (t) + c_2\sin(t) + c_3\,\mathrm{e}^{t} + c_4\,t\,\mathrm{e}^{t} $$
where the constants \(c_{1,\ldots,4}\) follow from the initial conditions.

Forced (particular solution)

The forced response \(f_p(t)\), is the part of the response caused directly by the input force assuming all initial conditions are zero.

$$ {f_p}^{(n)}(t)+a_1{f_p}^{(n-1)}(t)+\cdots+a_{n-1}{f_p}^\prime(t)+a_nf_p(t)=g(t)\label{eq:bDVp} $$

The solution for the forced response is usually a scaled version of the input. In the examples below we will show two methods of finding the particular solution. As you will learn the using a complex forcing function is the easiest way of obtaining the particular solution. In all other cases, we find \(f_p(t)\) by either the method of undetermined coefficients or the variation of parameters method. [link]

The particular solution is typically found using trigonometry identities, as shown in the examples in RC Low-pass Filter Appendix B, and RC Low-pass Filter Appendix B. Even for these linear first order linear systems this is a fairly painstaking process. Here we explain a less involved method to find the response to a sinusoid forcing function.

Complex superposition

The superposition property:
When two signals are added together and forced on a linear system, the system response is the same as if one had forced each signal through the system separately and then added the responses.

The linearity of the system implies that if we use an input of the form \(=\hat{u}\cos(\omega t)\) then the output will have the same frequency but with a different phase and amplitude. As shown in the table below, if we scale the input by a factor \(k\) then the output will scaled by the same factor. This applies even when that factor is the imaginary number \(j\).

Linear system
input output
$$\hat{u}\cos(\omega t+\theta)\nonumber$$ $$\longrightarrow\nonumber$$ $$A\hat{u}\cos(\omega t+\phi)\nonumber$$
$$\color{olive}{k}\hat{u}\cos(\omega t+\theta)\nonumber$$ $$\longrightarrow\nonumber$$ $$\color{olive}{k}A\hat{u}\cos(\omega t+\phi)\nonumber$$
$$\color{blue}{j}\hat{u}\cos(\omega t+\theta)\nonumber$$ $$\longrightarrow\nonumber$$ $$\color{blue}{j}A\hat{u}\cos(\omega t+\phi)\nonumber$$

By the superposition principle of linear systems, a forcing function of a summed \(\cos\) and \(j\sin\), will produce a scaled response of \(\cos\) and \(j\sin\)

$$ \hat{u}\cos(\omega t) + \color{green}{j}\,\hat{u}\sin(\omega t) \longrightarrow A\cos(\omega t+\phi) + \color{green}{j}A\sin(\omega t+\phi) $$

By applying Euler’s formula

$$ \mathrm{e}^{j\varphi}=\cos\varphi+j\sin\varphi\nonumber $$

the complex input \(\underline{u}(t)\) and output \(\underline{f}(t)\) can be expressed as

$$ \underline{u}(t)=\hat{u}\,\mathrm{e}^{j\omega t} \longrightarrow \underline{f}(t)=A\,\mathrm{e}^{j(\omega t+\phi)} $$

Even if the forcing function is only the real-part of \(\underline{u}\), to derive the system response we may assume that it is the mathematically more convenient \(\underline{u}(t)\) even though that also includes an imaginary part, for as long as we ignore the imaginary part of the response.

In other words: if the forcing function is a \(\hat{u}\cos(\omega t)\), we may pretend that the forcing function is \(\underline{u}(t)=\hat{u}\cos(\omega t)+\color{green}{j}\,\hat{u}\sin(\omega t)=\hat{u}\,\mathrm{e}^{j\omega t}\), derive the response and then consider only the real part of the complex solution.

An example can be found under the heading “complex arithmetic method” in the examples in RC Low-pass Filter Appendix B. The main section of that article describes the use of an even more convenient method using a Laplace Transform.

Copyright © 1996-2022 Coert Vonk, All Rights Reserved