Calculus for Vector-Valued Functions - Part I (Quick Start)
Basic Object | $\mathbb{R}^{n}$ |
Basic Map | Differentiable functions $f:\mathbb{R}^{n}\to\mathbb{R}^{m}$ |
Basic Goal | Inverse Function Theorem |
Notice: This article is mainly composed by combining essays and textbooks together, with [1] providing the structure.
前置知识 Prerequsites
- Calculus Fundamentals for non-mathematics students 一点点高数微积分基础
- Simple Linear Algebra 一点点线代知识
1. 向量值函数 Vector-Valued Functions
A vector-valued function, is a function $f:\mathbb{R}{n}\to\mathbb{R}{m}$, since for any vector $x$ in $\mathbb{R}^{n}$, the image, $f(x)$ is a vector in $\mathbb{R}^{m}$.
In this section, we're just to introduce a simple notation for vector-valued functions, under the language of matrix and vector, so that we could define calculus for this kind of function.
Let $(x_{1},...,x_{n})$ be a coordinate system for $\mathbb{R}^{n}$, $f$ can be described in terms of m real-valued functions:
$$f(x_{1},\cdots,x_{n})=
\begin{pmatrix} f_{1}(x_{1},\cdots,x_{n}) \\ \vdots \\ \vdots \\ f_{m}(x_{1},\cdots,x_{n}) \end{pmatrix}$$
Example:
- $f:\mathbb{R}\to\mathbb{R}^{2}$, where coordinate $t\in\mathbb{R}$:
$$f(t) = \begin{pmatrix} \cos{(t)}\\\sin{(t)}\end{pmatrix}$$
(also $x=\cos{(t)}$ and $y=\sin{(t)}$)
This is apparently the unit circle parameterized by its angle with the x-axis.
- $f:\mathbb{R}{2}\to\mathbb{R}{3}$, where coordinate $x_{1},x_{2}\in\mathbb{R}$:
$$f(x_{1},x_{2}) = \begin{pmatrix} \cos{x_{1}}\\\sin{x_{1}}\\x_{2}\end{pmatrix}$$
This function $f$ maps the $(x_{1},x_{2})$ plane to a cylinder in space.
2. 向量值函数的极限和连续性 Limits and Continuity of Vector-Valued Functions
Both the definitions of limits and continuity rely on the existence of a distance. That's to say, different definitions of norms, or distances, could have corresponding definitions for limits and continuity.
Fortunately, however, Pythagorean(毕达哥拉斯) Theorem gives a natural way for measuring distance in $\mathbb{R}^{n}$!
[Definition - Distance] Let $a = (a_{1},\cdots,a_{n})$ and $b = (b_{1},\cdots,b_{n})$ be two points in $\mathbb{R}^{n}$. Then the distance between $a$ and $b$, denoted by $|a-b|$, is
$$|a-b| = \sqrt{(a_{1}-b_{1})2+(a_{2}-b_{2})2+\cdots+(a_{n}-b_{n})^2}$$
We can think of the point $a$ in $\mathbb{R}^{n}$ as a vector form the origin $O$ to the point. Thus, the length of $a$, as a vector, is defined by the distance between $a$ and $O$:
$$|a|=\sqrt{a_{1}{2}+\cdots+a_{n}{2}}$$
Now that we have got this standard tools of distance, we can apply it to $\epsilon$ and $\delta$ style real analysis, getting the reasonable definition of limit as this:
[Definition - Limit] The function $f:\mathbb{R}{n}\to\mathbb{R}{m}$ has limit
$$L=(L_{1},\cdots,L_{m})\in \mathbb{R}^m$$
at the point $a=(a_{1},\cdots,a_{n})\in\mathbb{R}^{n}, if$
given any $\epsilon>0$, $\exists\delta>0$, such that $\forall x\in \mathbb{R}^{n}$, if
$$0<|x-a|<\delta$$
we have
$$|f(x)-L|<\epsilon$$
This is denoted by
$$\lim_{x\to a}f(x)=L$$
or
$$by\ f(x)\to L\ ,as\ x\to a$$
By the definition of Limit, continuity now be defined by:
[Definition - Continuity] The function $f:\mathbb{R}{n}\to\mathbb{R}{m}$ is continuous at a point $a$ in $\mathbb{R}^{n}$ if $$\lim_{x\to a}f(x)=f(a)$$
3. 微分和雅可比矩阵 Differentiation and Jacobians
We want the derivative of a vector-valued function to be a tool that can be used to find the best linear approximation to the function. Just like what we've done for single variable functions, whose derivative is the slope of the tangent line.
We will first give the definition for the vector-valued derivative, and then discuss the intuitions behind it.
[Definition - Differentiable Vector-valued Funtcion] A function $f:\mathbb{R}{n}\to\mathbb{R}{m}$ is differentiable at $a\in\mathbb{R}^{n}$ if there is an $m\times n$ matrix $A:\mathbb{R}{n}\to\mathbb{R}{m}$ such that
$$\lim_{x\to a}\frac{|f(x)-f(a)-A\cdot(x-a)|}{|x-a|}=0.$$
If such a limit exists, the matrix $A$ is denoted by $Df(a)$ and is called the Jacobian
(if the Jacobian $Df(a)$ exists, it is unique up to the change of bases for \mathbb{R}^{n} and \mathbb{R}^{m})
We can use the definition of differentiable to get the usual definition of derivative.
Recall that for a function $f:\mathbb{R}\to\mathbb{R}$, the derivative $f'(a)$ was defined to be the limit
$$f'(a)=\lim_{x\to a}\frac{f(x)-f(a)}{x-a}$$
Since we cannot divide vectors, this one-variable definition is nonsensical for a vector-valued function $f:\mathbb{R}{n}\to\mathbb{R}{m}$. However, if we manipulate the above one-variable limit algebraically, we can get a statement naturally generalized to functions $f:\mathbb{R}{n}\to\mathbb{R}{m}$.
Notice that, for $f:\mathbb{R}\to\mathbb{R}$
$$f'(a)=\lim_{x\to a}\frac{f(x)-f(a)}{x-a}$$
is true, if and only if
$$0 = \lim_{x\to a}\frac{f(x)-f(a)}{x-a}-f'(a)$$
which is equivalent to
$$0 = \lim_{x\to a}\frac{f(x)-f(a)-f'(a)(x-a)}{x-a}$$
or
$$0 = \lim_{x\to a}\frac{|f(x)-f(a)-f'(a)(x-a)|}{|x-a|}$$
The last statement
$$0 = \lim_{x\to a}\frac{|f(x)-f(a)-f'(a)(x-a)|}{|x-a|}$$
at least formally, makes sense for functions $f:\mathbb{R}{n}\to\mathbb{R}{m}$, provided we replace $f'(a)$ by an $m\times n$ matrix, namely the Jacobian $Df(a)$.
This is possible, because
$$|f(x)-f(a)-A\cdot (x-a)|$$
is the length of a vector in $\mathbb{R}^{m}$, and likewise,
$$|x-a|$$ is the length of a vector in $\mathbb{R}^{n}$.
The theorem below gives us a straightforward method for computing the derivative (the Jacobian) without resorting to the actually taking of a limit.
[Theorem - Calculation of Jacobian]
Let the function $f:\mathbb{R}{n}\to\mathbb{R}{m}$ be given by the $m$ differentiable functions $f_{1}(x_{1},\cdots,x_{n}),\cdots,f_{m}(x_{1},\cdots,x_{n})$, so that
$$f(x_{1},\cdots,x_{n})=
\begin{pmatrix} f_{1}(x_{1},\cdots,x_{n}) \\ \vdots \\ \vdots \\ f_{m}(x_{1},\cdots,x_{n}) \end{pmatrix}$$
Then $f$ is differentiable.
And the Jacobian is
$$Df(x)=\begin{pmatrix}
\frac{\partial f_{1}}{\partial x_{1}} & \cdots & \frac{\partial f_{1}}{\partial x_{n}} \\
\vdots & \ddots & \vdots \\
\frac{\partial f_{m}}{\partial x_{1}} & \cdots & \frac{\partial f_{m}}{\partial x_{n}} \\
\end{pmatrix}$$
The proof, which can be found in most books on vector calculus, is ommited here.
Example: consider the earier example of function
$f:\mathbb{R}{2}\to\mathbb{R}{3}$, where coordinate $x_{1},x_{2}\in\mathbb{R}$, given by
$$f(x_{1},x_{2}) = \begin{pmatrix} \cos{x_{1}}\\\sin{x_{1}}\\x_{2}\end{pmatrix}$$
which maps the $(x_{1},x_{2})$ plane to a cylinder in space.
Then the Jacobian, the derivative of this vector-valued function, will be
$$Df(x_{1},x_{2})=\begin{pmatrix}
\frac{\partial\cos{x_{1}}}{\partial x_{1}} & \frac{\partial\cos{x_{1}}}{\partial x_{2}} \\
\frac{\partial\sin{x_{1}}}{\partial x_{1}} & \frac{\partial\sin{x_{1}}}{\partial x_{2}} \\
\frac{\partial x_{2}}{\partial x_{1}} &
\frac{\partial x_{2}}{\partial x_{2}} \\
\end{pmatrix}
=
\begin{pmatrix}
-\sin{x_{1}} & 0 \\
\cos{x_{1}} & 0 \\
0 & 1
\end{pmatrix}$$
For vector-valued functions, the chain rule can be easily stated. It relates the derivative of the composition of functions with the derivatives of each component part, namely:
[Theorem - Chain rule] Let $f:\mathbb{R}{n}\to\mathbb{R}{m}$ and $g:\mathbb{R}{m}\to\mathbb{R}{l}$ be differentiable functions. Then the composition function
$$g\circ f:\mathbb{R}{n}\to\mathbb{R}{l}$$
is also differentiable with derivative given by:
if $f(a)=b$, then
$$D(g\circ f)(a) = Dg(b)\cdot Df(a)$$
i.e.
$$D(g\circ f)(a) = Dg(f(a))\cdot Df(a)$$
Let's take a look at the intuitions behind the derivative.
For one-variable, $f'(a)$ is the slope of the tangent line to the curve $y=f(x)$ at the point $(a,f(a))$ in the plane $\mathbb{R}^{2}$.
The tangent line through $(a,f(a))$ will have the euqation
$$y=f(a)+f'(a)(x-a)$$
This line is the closest linear approximation to the function $f(x)$ at $x=a$.
We should be able to use the derivative of $f:\mathbb{R}{n}\to\mathbb{R}{m}$, to find a linear approximation to the geometric object $y=f(x)$, which lies in the space $\mathbb{R}^{n+m}$(notice that $y$ is a vector with $m$ dimensions, and $f(x)$ with $n$ variables). This is precisely what the definition does:
$$\lim_{x\to a}\frac{|f(x)-f(a)-Df(a)(x-a)|}{|x-a|}=0$$
namely $f$ is approximately equal to the linear function
$$f(a)+Df(a)\cdot (x-a)$$
$Df(a)$ as an $m\times n$ matrix is a linear map from $\mathbb{R}{n}\to\mathbb{R}{m}$, and $f(a)$ as an element of $\mathbb{R}^{m}$ is a translation.
Thus the vector $y=f(x)$ can be approximated by
$$y\approx f(a)+Df(a)\cdot(x-a)$$
4. 反函数定理 The Inverse Function Theorem
5. 隐函数定理 Implicit Function Theorem
词汇表 Vocabulary
Word | Pronunciation | Meaning |
---|---|---|
Pythagorean | - | adj.毕达哥拉斯的(信徒)(的);毕达哥拉斯哲学的 |
Theorem | ['θiərəm] | 定理 |
notation | [noʊ'teɪʃ(ə)n] | n.符号;记号 |
coordinate | [koʊ'ɔrdɪ.neɪt] | n.坐标 |
norm | [nɔrm] | n.模、范数(用于定义距离) |
plane | [pleɪn] | n.平面 |
axis | ['æksɪs] | n.坐标轴 |
parameter | [pə'ræmɪtər] | n.参数 |
parameterize | n.参数化 | |
continuity | [.kɑntɪ'nuəti] | n.连续性 |
denote | [dɪ'noʊt] | v.表示 |
derivative | [dɪ'rɪvətɪv] | n.导数 |
slope | [sloʊp] | n.斜率 |
tangent | ['tændʒənt] | n.切线 |
jacobian | [dʒə'kəʊbɪən] | n.雅克比矩阵 |
differentiable | [ˌdɪfə'renʃɪəbl] | adj.可微的 |
nonsensical | [nɑn'sensɪk(ə)l] | adj.荒谬、无意义的 |
manipulate | [mə'nɪpjə.leɪt] | v.操作、处理 |
algebraically | [ældʒəb'reɪklɪ] | adj.代数学地 |
equivalent | [ɪ'kwɪvələnt] | adj.等价的 |
formally | ['fɔːməli] | adj.形式上 |
resort | [rɪ'zɔrt] | v.求助于 |
omit | [oʊ'mɪt] | v.省略 |
参考文献 References
[1] Garrity, T. A..数学拾遗 —— 研究生必备数学知识 All the Mathematics You Missed —— But Need to Know for Graduate School[M].清华大学出版社:北京,2004:47.