削除された内容 追加された内容
m →‎実関数の微分法: en:Derivative 19:22, 22 October 2015 より
タグ: サイズの大幅な増減
148行目:
 
== 多変数関数の微分法 ==
{{main|ベクトル解析|多変数微分積分学}}
=== ベクトル値函数の微分 ===
実一変数の関数の局所的な挙動を記述する微分法は、適当な記号法の下で多変数関数の場合にほとんどそのまま拡張を受ける。一変数関数 {{math|''y'' {{=}} ''f''(''x'')}} の {{math|''x'' {{=}} ''a''}} なる点の周りにおける可微分性を[[ランダウの記号]]を用いて、{{math|''h'' → 0}} なるとき
実数 {{mvar|t}} を適当な[[ベクトル空間]] {{math|'''R'''{{msup|''n''}}}} のベクトルへ写す実変数{{仮リンク|ベクトル値函数|en|vector-valued function}} {{math|'''y'''(''t'')}} は成分ごとの函数に分けて {{math|1='''y'''(''t'') = (''y''{{ind|1}}(''t''), …, ''y''{{ind|''n''}}(''t''))}} と書くことができる。例えば {{math|'''R'''{{msup|2}}}} または {{math|'''R'''{{msup|3}}}} 内の{{仮リンク|曲線の媒介変数表示|en|parametric curve}} はベクトル値函数である。成分函数(座標函数)は実函数だから上で述べた意味において微分を考えることができる。任意の成分函数が {{mvar|t}} において微分係数を持つときかつそのときに限り、{{math|'''y'''(''t'')}} の微分係数
: <math>f(a+h) = f(a) + A h + o(|h|)</math>
: <math>\mathbf{y}'(t)=\lim_{h\to 0}\frac{\mathbf{y}(t+h) - \mathbf{y}(t)}{h} = (y'_1(t), \ldots, y'_n(t))</math>
となる定数 {{mvar|A}} が存在することという形に表せば、{{mvar|x}} をベクトル値の変数 {{math|'''x''' {{=}} (''x''<sub>1</sub>, ''x''<sub>2</sub>, ..., ''x<sub>n</sub>'')}} に取り替えて、{{math|''u'' {{=}} ''u''('''x''')}} に対して {{math|{{!}}'''h'''{{!}} &rarr; 0}} なるとき、つまり {{math|'''x''' + '''h'''}} が {{math|'''x'''}} のある近傍に属すとき
は存在して {{mvar|t}} における[[接ベクトル]]と呼ばれるベクトルを定める。任意の {{mvar|t}} に対して {{math|'''y'''}} の微分係数が存在するとき、導函数 {{math|'''y'''{{'}}}} はそれ自身ベクトル値函数を定める。
: <math>u(\mathbf{a}+\mathbf{h}) = u(\mathbf{a}) + \mathbf{A} \cdot \mathbf{h} + o(|\mathbf{h}|)</math>
を満たす定ベクトル {{math|'''A'''}} の存在を仮定することによって多変数関数 {{mvar|u}} の点 {{math|'''x''' = '''a'''}} の周りにおける可微分性を定義することができる。この条件が満たされるとき、{{math|''u'' {{=}} ''u''('''x''')}} は {{math|'''x''' {{=}} '''a'''}} において微分可能、特に[[全微分]]可能であるという。このとき特に {{math|'''A''' {{=}} (''A''<sub>1</sub>, ''A''<sub>2</sub>, ..., ''A<sub>n</sub>'')}} と置くと
: <math>A_i = \left.\frac{\partial u}{\partial x_i}\right|_{x_i=a_i}</math>
が {{math|''i'' {{=}} 1, 2, ..., ''n''}} に対して成立する。つまり、全微分可能な関数は各変数に関して[[偏微分]]可能である。
 
また、一変数関数 {{math|''y'R' {{=}} ''f''(''x'')}} の場合に {{mathmsup|''hn'' &rarr; 0}}}} の極限で標準基底 {{math|''f'e'(''x'' + ''h'') {{=ind|1}}, ''f''(''x'') +…, ''A'e'(''x'')''h'' + ''o''({{!}}ind|''hn''{{!}})}} なる関数に対して {{math|''A'y'(''x'')}} が存在するとき、{{math|1=''f{{'}}y'''(''xt'') = ''y''{{=ind|1}} (''At''()''x'e')''{{ind|1}} + … + {{math|''y'' {{=ind|''n''}} (''ft''()''x'e'''{{ind|''n'')}}}} の導関数呼ん書くことがきるが、ベクトル値函数の微分が{{仮リンク|微分の線型性|label=線型性|en|linearity of differentiation}}を持つようにするためには、各基底ベクトルは定ベクトルであるから
: <math>\mathit{dy} = fy'_1(xt) \mathitmathbf{dxe}_1 =+ \frac{dy}{\mathit{dx}}cdots + y'_n(t)\mathitmathbf{dxe}_n</math>
となる以外は無い。これは上記の結果と整合する。
と記すと同様に、ベクトル変数の関数 {{math|''u'' {{=}} ''u''('''x''')}} においても
: <math>\mathit{du} = \mathbf{A} \cdot \mathit{d\mathbf{x}}
= \sum_{i=1}^{n} \frac{\partial u}{\partial x_i} \mathit{dx}_i
= \frac{\partial u}{\partial x_1} \mathit{dx}_1 + \frac{\partial u}{\partial x_2} \mathit{dx}_2
+ \dotsb + \frac{\partial u}{\partial x_n} \mathit{dx}_n
</math>
と記し、この {{mvar|du}} を関数 {{math|''u'' {{=}} ''u''('''x''')}} の'''全微分''' ({{en|''total derivative''}}) と称する。
 
=== 偏微分 ===
同様の議論をベクトル値関数 {{math|'''u''' {{=}} '''u'''('''x''')}} に対して適用することができる。[[関数行列]] {{math|&part;'''u'''/&part;'''x'''}} の存在によって
{{Main|偏微分}}
: <math>\mathbf{u}(\mathbf{a} + \mathbf{h}) = \mathbf{u}(\mathbf{a})
 
+ \left.\frac{\partial \mathbf{u}}{\partial \mathbf{x}}\right|_{\mathbf{x}=\mathbf{a}}
{{mvar|f}} は多変数の函数とすれば、これを一つの変数に関する(そのほかの変数で添字付けられる)函数として解釈しなおすことができる。これは例えば {{math|''f''(''x'',''y'') {{=}} ''f''{{ind|''x''}}(''y'') {{=}} ''x''{{exp|2}} + ''xy'' + ''y''{{exp|2}}}} とすれば、{{mvar|x}} の任意の値を決めたとき一変数函数 {{mvar|f{{ind|x}}}} が定まるということであり、実際に {{mvar|x}} の値を {{math|''x'' {{=}} ''a''}} と選べば、{{mvar|a}} は変数ではなく定数であり、二変数函数 {{math|''f''(''x'', ''y'')}} は一変数函数 {{math|''f''{{ind|''a''}}(''y'') {{=}} ''a''{{exp|2}} + ''ay'' + ''y''{{exp|2}}}} を決定する。これに対して一変数函数の微分を適用すれば導函数として {{math|''f''{{ind|a}}{{'}}(''y'') {{=}} ''a'' + 2''y''}} を得るが、このやり方は {{mvar|x}} の値として {{mvar|a}} の選び方に依らない。任意の {{mvar|a}} に対するこの導函数を総称的に扱って、二変数函数 {{mvar|f}} の {{mvar|y}}-方向への変分を記述する函数 {{math|{{sfrac|∂''f''|∂''y''}}(''x'',''y'') {{=}} ''x'' + 2''y''}} が得られる。これを {{mvar|f}} の {{mvar|y}} に関する偏微分と呼び、丸い d の記号 [[∂]] は偏微分記号と呼ばれる。
+ o(|\mathbf{h}|)</math>
 
が成り立つ点 {{math|'''x''' {{=}} '''a'''}} を {{math|'''u'''}} の'''正則点'''と呼ぶ。
一般に、点 {{math|(''a''{{ind|1}}, …, ''a''{{ind|''n''}})}} における函数 {{math|''f''(''x''{{ind|1}}, …, ''x''{{ind|''n''}})}} の {{mvar|x{{ind|i}}}}-方向への'''偏微分'''は
:<math>\frac{\part f}{\part x_i}(a_1,\ldots,a_n) = \lim_{h \to 0}\frac{f(a_1,\ldots,a_i+h,\ldots,a_n) - f(a_1,\ldots,a_i,\ldots,a_n)}{h}</math>
で定義される。上記の極限を取る差分商は一つの変数 {{mvar|x{{ind|i}}}} を除いて固定したもので、変数を固定することで得られる一変数函数
: <math>f_{a_1,\ldots,\widehat{a}_i,\ldots,a_n}(x_i) = f(a_1,\ldots,a_{i-1},x_i,a_{i+1},\ldots,a_n)</math>
に対して、定義により
: <math>\frac{df_{a_1,\ldots,\widehat{a}_i,\ldots,a_n}}{dx_i}(a_i) = \frac{\part f}{\part x_i}(a_1,\ldots,a_n)</math>
が成り立つ。
 
多変数函数の場合の重要な例として、ユークリッド空間 {{math|'''R'''{{msup|''n''}}}} 内の領域上で定義された {{仮リンク|スカラー値函数|en|scalar-valued function}} {{math|''f''(''x''{{ind|1}}, …, ''x''{{ind|''n''}})}} がある。この場合 {{mvar|f}} は各変数 {{mvar|x{{ind|j}}}} に関する偏微分 {{math|∂''f''/∂''x''{{ind|''j''}}}} を持ち、点 {{mvar|a}} においてこれらの偏微分は {{mvar|f}} の {{mvar|a}} における[[勾配 (ベクトル解析)|勾配]]と呼ばれるベクトル {{math|∇''f''(''a'') {{=}} ({{sfrac|∂''f''|∂''x''{{ind|1}}}}(''a''), …, {{sfrac|∂''f''|∂''x''{{ind|''n''}}}}(''a''))}} を定める。{{mvar|f}} が適当な領域上の各点で微分可能ならば、勾配は {{mvar|a}} をベクトル {{math|∇''f''(''a'')}} へ写すベクトル値函数、すなわち[[ベクトル場]]を定める。
 
=== 方向微分 ===
{{Main|方向微分}}
<!--
If ''f'' is a real-valued function on '''R'''<sup>n</sup>, then the partial derivatives of ''f'' measure its variation in the direction of the coordinate axes. For example, if ''f'' is a function of ''x'' and ''y'', then its partial derivatives measure the variation in ''f'' in the ''x'' direction and the ''y'' direction. They do not, however, directly measure the variation of ''f'' in any other direction, such as along the diagonal line {{nowrap|1=''y'' = ''x''}}. These are measured using directional derivatives. Choose a vector
:<math>\mathbf{v} = (v_1,\ldots,v_n).</math>
The '''directional derivative''' of ''f'' in the direction of '''v''' at the point '''x''' is the limit
:<math>D_{\mathbf{v}}{f}(\mathbf{x}) = \lim_{h \rightarrow 0}{\frac{f(\mathbf{x} + h\mathbf{v}) - f(\mathbf{x})}{h}}.</math>
In some cases it may be easier to compute or estimate the directional derivative after changing the length of the vector. Often this is done to turn the problem into the computation of a directional derivative in the direction of a unit vector. To see how this works, suppose that {{nowrap|1='''v''' = λ'''u'''}}. Substitute {{nowrap|1=''h'' = ''k''/λ}} into the difference quotient. The difference quotient becomes:
:<math>\frac{f(\mathbf{x} + (k/\lambda)(\lambda\mathbf{u})) - f(\mathbf{x})}{k/\lambda}
= \lambda\cdot\frac{f(\mathbf{x} + k\mathbf{u}) - f(\mathbf{x})}{k}.</math>
This is λ times the difference quotient for the directional derivative of ''f'' with respect to '''u'''. Furthermore, taking the limit as ''h'' tends to zero is the same as taking the limit as ''k'' tends to zero because ''h'' and ''k'' are multiples of each other. Therefore, {{nowrap|1=''D''<sub>'''v'''</sub>(''f'') = λ''D''<sub>'''u'''</sub>(''f'')}}. Because of this rescaling property, directional derivatives are frequently considered only for unit vectors.
 
If all the partial derivatives of ''f'' exist and are continuous at '''x''', then they determine the directional derivative of ''f'' in the direction '''v''' by the formula:
:<math>D_{\mathbf{v}}{f}(\boldsymbol{x}) = \sum_{j=1}^n v_j \frac{\partial f}{\partial x_j}.</math>
This is a consequence of the definition of the [[total derivative]]. It follows that the directional derivative is [[linear map|linear]] in '''v''', meaning that {{nowrap|1=''D''<sub>'''v''' + '''w'''</sub>(''f'') = ''D''<sub>'''v'''</sub>(''f'') + ''D''<sub>'''w'''</sub>(''f'')}}.
 
The same definition also works when ''f'' is a function with values in '''R'''<sup>m</sup>. The above definition is applied to each component of the vectors. In this case, the directional derivative is a vector in '''R'''<sup>m</sup>.
-->
=== 全微分 ===
{{Main|全微分}}
<!--
When ''f'' is a function from an open subset of '''R'''<sup>''n''</sup> to '''R'''<sup>''m''</sup>, then the directional derivative of ''f'' in a chosen direction is the best linear approximation to ''f'' at that point and in that direction. But when {{nowrap|''n'' &gt; 1}}, no single directional derivative can give a complete picture of the behavior of ''f''. The total derivative gives a complete picture by considering all directions at once. That is, for any vector '''v''' starting at '''a''', the linear approximation formula holds:
:<math>f(\mathbf{a} + \mathbf{v}) \approx f(\mathbf{a}) + f'(\mathbf{a})\mathbf{v}.</math>
Just like the single-variable derivative, {{nowrap|''f''&thinsp;&prime;('''a''')}} is chosen so that the error in this approximation is as small as possible.
 
If ''n'' and ''m'' are both one, then the derivative {{nowrap|''f''&thinsp;′(''a'')}} is a number and the expression {{nowrap|''f''&thinsp;′(''a'')''v''}} is the product of two numbers. But in higher dimensions, it is impossible for {{nowrap|''f''&thinsp;′('''a''')}} to be a number. If it were a number, then {{nowrap|''f''&thinsp;′('''a''')'''v'''}} would be a vector in '''R'''<sup>''n''</sup> while the other terms would be vectors in '''R'''<sup>''m''</sup>, and therefore the formula would not make sense. For the linear approximation formula to make sense, {{nowrap|''f''&thinsp;′('''a''')}} must be a function that sends vectors in '''R'''<sup>''n''</sup> to vectors in '''R'''<sup>''m''</sup>, and {{nowrap|''f''&thinsp;′('''a''')'''v'''}} must denote this function evaluated at '''v'''.
 
To determine what kind of function it is, notice that the linear approximation formula can be rewritten as
:<math>f(\mathbf{a} + \mathbf{v}) - f(\mathbf{a}) \approx f'(\mathbf{a})\mathbf{v}.</math>
Notice that if we choose another vector '''w''', then this approximate equation determines another approximate equation by substituting '''w''' for '''v'''. It determines a third approximate equation by substituting both '''w''' for '''v''' and {{nowrap|'''a''' + '''v'''}} for '''a'''. By subtracting these two new equations, we get
:<math>f(\mathbf{a} + \mathbf{v} + \mathbf{w}) - f(\mathbf{a} + \mathbf{v}) - f(\mathbf{a} + \mathbf{w}) + f(\mathbf{a})
\approx f'(\mathbf{a} + \mathbf{v})\mathbf{w} - f'(\mathbf{a})\mathbf{w}.</math>
If we assume that '''v''' is small and that the derivative varies continuously in '''a''', then {{nowrap|''f''&thinsp;′('''a''' + '''v''')}} is approximately equal to {{nowrap|''f''&thinsp;′('''a''')}}, and therefore the right-hand side is approximately zero. The left-hand side can be rewritten in a different way using the linear approximation formula with {{nowrap|'''v''' + '''w'''}} substituted for '''v'''. The linear approximation formula implies:
:<math>\begin{align}
0
&\approx f(\mathbf{a} + \mathbf{v} + \mathbf{w}) - f(\mathbf{a} + \mathbf{v}) - f(\mathbf{a} + \mathbf{w}) + f(\mathbf{a}) \\
&= (f(\mathbf{a} + \mathbf{v} + \mathbf{w}) - f(\mathbf{a})) - (f(\mathbf{a} + \mathbf{v}) - f(\mathbf{a})) - (f(\mathbf{a} + \mathbf{w}) - f(\mathbf{a})) \\
&\approx f'(\mathbf{a})(\mathbf{v} + \mathbf{w}) - f'(\mathbf{a})\mathbf{v} - f'(\mathbf{a})\mathbf{w}.
\end{align}</math>
This suggests that {{nowrap|''f''&thinsp;′('''a''')}} is a [[linear transformation]] from the vector space '''R'''<sup>''n''</sup> to the vector space '''R'''<sup>''m''</sup>. In fact, it is possible to make this a precise derivation by measuring the error in the approximations. Assume that the error in these linear approximation formula is bounded by a constant times ||'''v'''||, where the constant is independent of '''v''' but depends continuously on '''a'''. Then, after adding an appropriate error term, all of the above approximate equalities can be rephrased as inequalities. In particular, {{nowrap|''f''&thinsp;′('''a''')}} is a linear transformation up to a small error term. In the limit as '''v''' and '''w''' tend to zero, it must therefore be a linear transformation. Since we define the total derivative by taking a limit as '''v''' goes to zero, {{nowrap|''f''&thinsp;′('''a''')}} must be a linear transformation.
 
In one variable, the fact that the derivative is the best linear approximation is expressed by the fact that it is the limit of difference quotients. However, the usual difference quotient does not make sense in higher dimensions because it is not usually possible to divide vectors. In particular, the numerator and denominator of the difference quotient are not even in the same vector space: The numerator lies in the codomain '''R'''<sup>''m''</sup> while the denominator lies in the domain '''R'''<sup>''n''</sup>. Furthermore, the derivative is a linear transformation, a different type of object from both the numerator and denominator. To make precise the idea that {{nowrap|''f''&thinsp;′('''a''')}} is the best linear approximation, it is necessary to adapt a different formula for the one-variable derivative in which these problems disappear. If {{nowrap|''f'' : '''R''' → '''R'''}}, then the usual definition of the derivative may be manipulated to show that the derivative of ''f'' at ''a'' is the unique number {{nowrap|''f''&thinsp;′(''a'')}} such that
:<math>\lim_{h \to 0} \frac{f(a + h) - f(a) - f'(a)h}{h} = 0.</math>
This is equivalent to
:<math>\lim_{h \to 0} \frac{|f(a + h) - f(a) - f'(a)h|}{|h|} = 0</math>
because the limit of a function tends to zero if and only if the limit of the absolute value of the function tends to zero. This last formula can be adapted to the many-variable situation by replacing the absolute values with [[norm (mathematics)|norm]]s.
 
The definition of the '''total derivative''' of ''f'' at '''a''', therefore, is that it is the unique linear transformation {{nowrap|''f''&thinsp;′('''a''') : '''R'''<sup>''n''</sup> → '''R'''<sup>''m''</sup>}} such that
:<math>\lim_{\mathbf{h}\to 0} \frac{\lVert f(\mathbf{a} + \mathbf{h}) - f(\mathbf{a}) - f'(\mathbf{a})\mathbf{h}\rVert}{\lVert\mathbf{h}\rVert} = 0.</math>
Here '''h''' is a vector in '''R'''<sup>''n''</sup>, so the norm in the denominator is the standard length on '''R'''<sup>''n''</sup>. However, ''f''′('''a''')'''h''' is a vector in '''R'''<sup>''m''</sup>, and the norm in the numerator is the standard length on '''R'''<sup>''m''</sup>. If ''v'' is a vector starting at ''a'', then {{nowrap|''f''&thinsp;′('''a''')'''v'''}} is called the [[pushforward (differential)|pushforward]] of '''v''' by ''f'' and is sometimes written {{nowrap|''f''<sub>∗</sub>'''v'''}}.
 
If the total derivative exists at '''a''', then all the partial derivatives and directional derivatives of ''f'' exist at '''a''', and for all '''v''', {{nowrap|''f''&thinsp;′('''a''')'''v'''}} is the directional derivative of ''f'' in the direction '''v'''. If we write ''f'' using coordinate functions, so that {{nowrap|1=''f'' = (''f''<sub>1</sub>, ''f''<sub>2</sub>, ..., ''f''<sub>''m''</sub>)}}, then the total derivative can be expressed using the partial derivatives as a [[matrix (mathematics)|matrix]]. This matrix is called the '''[[Jacobian matrix]]''' of ''f'' at '''a''':
 
:<math>f'(\mathbf{a}) = \operatorname{Jac}_{\mathbf{a}} = \left(\frac{\partial f_i}{\partial x_j}\right)_{ij}.</math>
 
The existence of the total derivative ''f''′('''a''') is strictly stronger than the existence of all the partial derivatives, but if the partial derivatives exist and are continuous, then the total derivative exists, is given by the Jacobian, and depends continuously on '''a'''.
 
The definition of the total derivative subsumes the definition of the derivative in one variable. That is, if ''f'' is a real-valued function of a real variable, then the total derivative exists if and only if the usual derivative exists. The Jacobian matrix reduces to a 1×1 matrix whose only entry is the derivative ''f''&prime;(''x''). This 1×1 matrix satisfies the property that {{nowrap|''f''(''a'' + ''h'') − ''f''(''a'') − ''f''&thinsp;′(''a'')''h''}} is approximately zero, in other words that
 
:<math>f(a+h) \approx f(a) + f'(a)h.</math>
 
Up to changing variables, this is the statement that the function <math>x \mapsto f(a) + f'(a)(x-a)</math> is the best linear approximation to ''f'' at ''a''.
 
The total derivative of a function does not give another function in the same way as the one-variable case. This is because the total derivative of a multivariable function has to record much more information than the derivative of a single-variable function. Instead, the total derivative gives a function from the [[tangent bundle]] of the source to the tangent bundle of the target.
 
The natural analog of second, third, and higher-order total derivatives is not a linear transformation, is not a function on the tangent bundle, and is not built by repeatedly taking the total derivative. The analog of a higher-order derivative, called a [[jet (mathematics)|jet]], cannot be a linear transformation because higher-order derivatives reflect subtle geometric information, such as concavity, which cannot be described in terms of linear data such as vectors. It cannot be a function on the tangent bundle because the tangent bundle only has room for the base space and the directional derivatives. Because jets capture higher-order information, they take as arguments additional coordinates representing higher-order changes in direction. The space determined by these additional coordinates is called the [[jet bundle]]. The relation between the total derivative and the partial derivatives of a function is paralleled in the relation between the ''k''th order jet of a function and its partial derivatives of order less than or equal to ''k''.
 
By repeatedly taking the total derivative, one obtains higher versions of the [[Fréchet derivative]], specialized to '''R'''<sup>''p''</sup>. The ''k''th order total derivative may be interpreted as a map
:<math>D^k f: \mathbb{R}^n \to L^k(\mathbb{R}^n \times \cdots \times \mathbb{R}^n, \mathbb{R}^m)</math>
which takes a point '''x''' in '''R'''<sup>n</sup> and assigns to it an element of the space of ''k''-linear maps from '''R'''<sup>n</sup> to '''R'''<sup>m</sup> – the "best" (in a certain precise sense) ''k''-linear approximation to ''f'' at that point. By precomposing it with the [[Diagonal functor|diagonal map]] Δ, {{nowrap|'''x''' → ('''x''', '''x''')}}, a generalized Taylor series may be begun as
:<math>\begin{align}
f(\mathbf{x}) & \approx f(\mathbf{a}) + (D f)(\mathbf{x}) + (D^2 f)(\Delta(\mathbf{x-a})) + \cdots\\
& = f(\mathbf{a}) + (D f)(\mathbf{x - a}) + (D^2 f)(\mathbf{x - a}, \mathbf{x - a})+ \cdots\\
& = f(\mathbf{a}) + \sum_i (D f)_i (\mathbf{x-a})^i + \sum_{j, k} (D^2 f)_{j k} (\mathbf{x-a})^j (\mathbf{x-a})^k + \cdots
\end{align}</math>
where f('''a''') is identified with a constant function, {{nowrap|('''x''' − '''a''')<sup>''i''</sup>}} are the components of the vector {{nowrap|'''x''' − '''a'''}}, and {{nowrap|(D ''f'')<sub>''i''</sub>}} and {{nowrap|(D<sup>2</sup> ''f'')<sub>''j k''</sub>}} are the components of {{nowrap|D ''f''}} and {{nowrap|D<sup>2</sup> ''f''}} as linear transformations.
-->
 
== 一般化 ==