AdaBoost は、それぞれの標本に対し、弱い分類器（[weak classifier]）${\displaystyle t}$を、${\displaystyle t=1}$から${\displaystyle t=T}$まで順に適用し、それぞれの分類器が正解したか否かを判断する。間違って分類された標本に対応する重み${\displaystyle D_{t}}$は、より重くされる（あるいは、正しく分類された標本の場合は、重みを減らす）。これらの標本に対する重みから、次のtのループでは正しい分類器を早く探す事が出来る。

二分類のアルゴリズム

Given: ${\displaystyle (x_{1},y_{1}),\ldots ,(x_{m},y_{m})}$  where ${\displaystyle x_{i}\in X,\,y_{i}\in Y=\{-1,+1\}}$

Initialize ${\displaystyle D_{1}(i)={\frac {1}{m}},i=1,\ldots ,m.}$

For ${\displaystyle t=1,\ldots ,T}$ :

• Find the classifier ${\displaystyle h_{t}:X\to \{-1,+1\}}$  that minimizes the error with respect to the distribution ${\displaystyle D_{t}}$ :
${\displaystyle h_{t}={\underset {h_{j}\in {\mathcal {H}}}{\operatorname {argmin} }}\;\epsilon _{j}}$ , where ${\displaystyle \epsilon _{j}=\sum _{i=1}^{m}D_{t}(i)[y_{i}\neq h_{j}(x_{i})]}$
• if ${\displaystyle \epsilon _{t}>0.5}$  then stop.
• Choose ${\displaystyle \alpha _{t}\in \mathbf {R} }$ , typically ${\displaystyle \alpha _{t}={\frac {1}{2}}{\textrm {ln}}{\frac {1-\epsilon _{t}}{\epsilon _{t}}}}$  where ${\displaystyle \epsilon _{t}}$  is the weighted error rate of classifier ${\displaystyle h_{t}}$ .
• Update:
${\displaystyle D_{t+1}(i)={\frac {D_{t}(i)\exp(-\alpha _{t}y_{i}h_{t}(x_{i}))}{Z_{t}}}}$

where ${\displaystyle Z_{t}}$  is a normalization factor (chosen so that ${\displaystyle D_{t+1}}$  will be a probability distribution, i.e. sum one over all x).

Output the final classifier:

${\displaystyle H(x)={\textrm {sign}}\left(\sum _{t=1}^{T}\alpha _{t}h_{t}(x)\right)}$

The equation to update the distribution ${\displaystyle D_{t}}$  is constructed so that:

${\displaystyle -\alpha _{t}y_{i}h_{t}(x_{i}){\begin{cases}<0,&y(i)=h_{t}(x_{i})\\>0,&y(i)\neq h_{t}(x_{i})\end{cases}}}$

Thus, after selecting an optimal classifier ${\displaystyle h_{t}\,}$  for the distribution ${\displaystyle D_{t}\,}$ , the examples ${\displaystyle x_{i}\,}$  that the classifier ${\displaystyle h_{t}\,}$  identified correctly are weighted less and those that it identified incorrectly are weighted more. Therefore, when the algorithm is testing the classifiers on the distribution ${\displaystyle D_{t+1}\,}$ , it will select a classifier that better identifies those examples that the previous classifer missed.

ブースティングの統計的理解

Boosting can be seen as minimization of a convex loss function over a convex set of functions. [2] Specifically, the loss being minimized is the exponential loss

${\displaystyle \sum _{i}e^{-y_{i}f(x_{i})}}$

and we are seeking a function

${\displaystyle f=\sum _{t}\alpha _{t}h_{t}}$

脚注

1. ^ Yoav Freund, Robert E. Schapire. "A Decision-Theoretic Generalization of on-Line Learning and an Application to Boosting", 1995
2. ^ T. Zhang, "Convex Risk Minimization", Annals of Statistics, 2004.

外部リンク

• icsiboost, an open source implementation of Boostexter
• NPatternRecognizer , a fast machine learning algorithm library written in C#. It contains support vector machine, neural networks, bayes, boost, k-nearest neighbor, decision tree, ..., etc.