AlphaGo Zero

AlphaGo Zero（アルファ・ゴ・ゼロ）は、DeepMindの囲碁ソフトウェア（英語版）AlphaGoのバージョンである。AlphaGoのチームは2017年10月19日に学術誌Natureの論文でAlphaGo Zeroを発表した。このバージョンは人間の対局からのデータを使わずに作られており、それ以前の全てのバージョンよりも強い^[1]。自分自身との対局を行うことで、AlphaGo Zeroは3日でAlphaGo Leeの強さを超え（100勝0敗）、21日でAlphaGo Masterのレベルに達し、40日で全ての旧バージョンを超えた^[2]。

人間の熟練者から得られたデータは「しばしば高価で、信頼性が低く、あるいは単に利用ができない^[3]」ため、こういったデータセットなしでの人工知能（AI）の訓練は超人的な能力を持つAIの開発にとって重要な影響をもたらす。DeepMindの共同創立者でCEOのデミス・ハサビスは、AlphaGo Zeroはもはや「人間の知識の限界によって制約されなかった」ため非常に強力だ、と述べた^[4]。AlphaGoに関してNatureで発表されたDeepMindの論文の筆頭著者の一人であるデビッド・シルバー（英語版）は、人間からの学習の必要性を取り除くことによって、汎用AIアルゴリズムを得ることが可能である、と述べた^[5]。

2017年12月、AlphaZeroと名付けられたAlphaGo Zeroの汎用バージョンが、AlphaGo Zero、トップチェスプログラム（Stockfish）、トップ将棋プログラム（elmo）を破った^[6]^[7]。

訓練

AlphaGo Zeroのニューラルネットワークは、64個のGPUワーカーと19個のCPUパラメータサーバーを使ったTensorFlowを用いて訓練された。推論にはわずか4個のTPUが使われた。ニューラルネットワークは初めにルール以外は囲碁に関して何も教えられていなかった。ZeroはAlphaGoのそれ以前のバージョンと異なり、異常な盤上の石の位置を認識するために人間がプログラムした希なエッジケースを持たずに、盤上の石のみを知覚する。このAIは「教師なし学習」に取り組み、自分自身の手とそれらの手が試合の結果にどのように影響するかを予測できるまで自身との対戦を行なった^[8]。最初の3日で、AlphaGo Zeroは自分自身と間断なく490万回対局した^[9]。以前のAlphaGoが人間のトッププロを破るのに必要なスキルを身に付けるためには数カ月の訓練を要したが、AlphaGo Zeroはわずか数日で同じレベルに達した^[10]。

比較のため、著者らは人間の棋譜を用いたAlphaGo Zeroの訓練も行い、学習はより迅速であるが、実際には長期的にはより劣った成績となることを明らかにした^[11]。DeepMindは2017年4月にNatureに論文を投稿し、2017年10月に発表された^[1]。

応用

ハサビスによれば、AlphaGoのアルゴリズムは、タンパク質の折り畳みや化学反応の正確なシミュレーションといった膨大な可能性空間全体にわたる知的探索を必要とする分野に対して最も有効そうである^[12]。AlphaGoの技術は、自動車の運転の仕方の学習といったシミュレーションが困難な分野ではおそらく有用性が低い^[13]。

受容

AlphaGo Zeroは、画期的な進歩を成し遂げた前バージョンのAlphaGoと比較した時でさえも、重大な進歩であると広く考えられた。アレン人工知能研究所（英語版）のオーレン・エチオーニ（英語版）はAlphaGo Zeroを「獲得した能力と、4個のTPU上で40日でシステムを訓練する能力の両方」における「非常に見事な技術的成果」と評した^[8]。ガーディアン誌は、シェフィールド大学のEleni Vasilakiとカーネギーメロン大学のTom Mitchellを引用して、AlphaGoを「人工知能に関する大躍進」と評した。AlphaGo Zeroについて、Vasilaskiは見事な離れ業、Mitchellは「傑出した工学的偉業」とそれぞれ評した^[13]。シドニー大学のマーク・ペシー（英語版）はAlphaGo Zeroについて、我々を「未知の領域」へと連れていく「大きな技術的進歩」と評した^[14]。

ニューヨーク大学の心理学者ゲイリー・マーカスは、恐らくAlphaGoは「囲碁のような問題をプレーする機械をどのように構築するかに関してプログラマーが持っている暗黙の知識」を含み、その基礎アーキテクチャが囲碁を打つ以上のことに対して有効であると考える前に、他の分野において検証される必要がある、と戒めた。対照的に、DeepMindは「このアプローチが数多くの分野に汎化が可能であることを確信」している^[9]。

DeepMindの論文に応えて、韓国の囲碁棋士李世乭は「AlphaGoの以前のバージョンは完璧ではなかった、それがAlphaGo Zeroが作られた理由だと考えます」と述べた。AlphaGoの発展の潜在性に関して李は、静観しなければならないだろうと述べたが、若い囲碁棋士にはインパクトがあるだろうとも述べた。囲碁の韓国代表チームを指揮する睦鎮碩は、囲碁の世界は既に以前のバージョンのAlphaGoの戦い方を模倣し、それらから新しいアイデアを生み出しており、新たなアイデアがAlphaGoZeroから出てくることに期待を寄せている、と述べた。睦は、囲碁界における一般的傾向は現在AlphaGoの戦い方に現在進行形で影響されているとも付け加えた。「最初は、理解するのが大変で、ほとんど異星人と対局してるように感じました。しかし、多くの経験を積むにつれ、慣れました」と睦は述べた。「私達は現在AlphaGoと人間の能力間の実力差について討論するところは過ぎてしまっています。今はコンピュータ間の差が問題になっています」。報道によれば睦は既に代表チームの選手らと共にAlphaGo Zeroの戦い方の分析を始めている。「わずか数局しか見ていませんが、AlphaGo Zeroは以前のバージョンよりも人間のように打つという印象を私達は受けています」と睦は述べた^[15]。中国のプロ棋士柯潔は微博のアカウントで、「純粋な自己学習型AlphaGoは最強だ、人間はAlphaGoの自己改善の前では不必要なようだ」とコメントした^[16]。

以前のバージョンとの比較

機器構成と強さ^[17]
バージョン	ハードウェア	イロレーティング	試合
AlphaGo Fan（英語版）	176 GPUs,^[2] 分散	3,144^[1]	5:0 対樊麾
AlphaGo Lee	48 TPUs,^[2] 分散	3,739^[1]	4:1 対李世乭
AlphaGo Master	4 TPUs^[2] v2, シングルマシン	4,858^[1]	60:0 対プロ棋士 3:0 対柯潔
AlphaGo Zero	4 TPUs^[2] v2, シングルマシン	5,185^[1]	100:0 対 AlphaGo Lee 89:11 対 AlphaGo Master

AlphaZero

詳細は「AlphaZero」を参照

2017年12月、DeepMindチームは汎化されたAlphaGo Zeroのアプローチを用いたプログラムであるAlphaZeroの論文をarXiv上で発表した。AlphaZeroは24時間以内にチェス、将棋、囲碁の世界チャンピオンプログラムであるStockfish、elmo、3日間学習させたAlphaGo Zeroを破る超人的レベルに達した^[18]。

脚注

^ ^a ^b ^c ^d ^e ^f “Mastering the game of Go without human knowledge”. Nature (19 October 2017). 19 October 2017閲覧。
^ ^a ^b ^c ^d ^e “AlphaGo Zero: Learning from scratch”. DeepMind official website (18 October 2017). 19 October 2017閲覧。
^ “Google's New AlphaGo Breakthrough Could Take Algorithms Where No Humans Have Gone”. Yahoo! Finance (19 October 2017). 19 October 2017閲覧。
^ “AlphaGo Zero: Google DeepMind supercomputer learns 3,000 years of human knowledge in 40 days”. Telegraph.co.uk (18 October 2017). 19 October 2017閲覧。
^ “DeepMind AlphaGo Zero learns on its own without meatbag intervention”. ZDNet (19 October 2017). 20 October 2017閲覧。
^ “Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm”. 2017年12月23日閲覧。
^ “Entire human chess knowledge learned and surpassed by DeepMind's AlphaZero in four hours”. 2017年12月23日閲覧。
^ ^a ^b Greenemeier, Larry. “AI versus AI: Self-Taught AlphaGo Zero Vanquishes Its Predecessor” (英語). Scientific American 20 October 2017閲覧。
^ ^a ^b “Computer Learns To Play Go At Superhuman Levels 'Without Human Knowledge'” (英語). NPR. (18 October 2017) 20 October 2017閲覧。
^ “Google's New AlphaGo Breakthrough Could Take Algorithms Where No Humans Have Gone” (英語). Fortune. (19 October 2017) 20 October 2017閲覧。
^ “This computer program can beat humans at Go—with no human instruction” (英語). Science | AAAS. (18 October 2017) 20 October 2017閲覧。
^ “The latest AI can work things out without being taught” (英語). The Economist 20 October 2017閲覧。
^ ^a ^b Sample, Ian (18 October 2017). “'It's able to create knowledge itself': Google unveils AI that learns on its own”. The Guardian 20 October 2017閲覧。
^ “How Google's new AI can teach itself to beat you at the most complex games” (英語). Australian Broadcasting Corporation. (19 October 2017) 20 October 2017閲覧。
^ “Go Players Excited About ‘More Humanlike’ AlphaGo Zero” (英語). Korea Bizwire. (19 October 2017) 21 October 2017閲覧。
^ “New version of AlphaGo can master Weiqi without human help” (英語). China News Service. (19 October 2017) 21 October 2017閲覧。
^ “【柯洁战败解密】AlphaGo Master最新架构和算法，谷歌云与TPU拆解” (Chinese). Sohu (24 May 2017). 1 June 2017閲覧。
^ David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, Demis Hassabis (5 December 2017). "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm". arXiv:1712.01815 [cs.AI]。

外部リンク

[Nature2017-1] ^ ^a ^b ^c ^d ^e ^f “Mastering the game of Go without human knowledge”. Nature (19 October 2017). 19 October 2017閲覧。

[Deepmind20171018-2] “AlphaGo Zero: Learning from scratch”. DeepMind official website (18 October 2017). 19 October 2017閲覧。

[3] “Google's New AlphaGo Breakthrough Could Take Algorithms Where No Humans Have Gone”. Yahoo! Finance (19 October 2017). 19 October 2017閲覧。

[4] “AlphaGo Zero: Google DeepMind supercomputer learns 3,000 years of human knowledge in 40 days”. Telegraph.co.uk (18 October 2017). 19 October 2017閲覧。

[5] “DeepMind AlphaGo Zero learns on its own without meatbag intervention”. ZDNet (19 October 2017). 20 October 2017閲覧。

[6] “Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm”. 2017年12月23日閲覧。

[7] “Entire human chess knowledge learned and surpassed by DeepMind's AlphaZero in four hours”. 2017年12月23日閲覧。

[Scientific_American-8] Greenemeier, Larry. “AI versus AI: Self-Taught AlphaGo Zero Vanquishes Its Predecessor” (英語). Scientific American 20 October 2017閲覧。

[npr-9] “Computer Learns To Play Go At Superhuman Levels 'Without Human Knowledge'” (英語). NPR. (18 October 2017) 20 October 2017閲覧。

[10] “Google's New AlphaGo Breakthrough Could Take Algorithms Where No Humans Have Gone” (英語). Fortune. (19 October 2017) 20 October 2017閲覧。

[11] “This computer program can beat humans at Go—with no human instruction” (英語). Science | AAAS. (18 October 2017) 20 October 2017閲覧。

[12] “The latest AI can work things out without being taught” (英語). The Economist 20 October 2017閲覧。

[guardian-13] Sample, Ian (18 October 2017). “'It's able to create knowledge itself': Google unveils AI that learns on its own”. The Guardian 20 October 2017閲覧。

[14] “How Google's new AI can teach itself to beat you at the most complex games” (英語). Australian Broadcasting Corporation. (19 October 2017) 20 October 2017閲覧。

[15] “Go Players Excited About ‘More Humanlike’ AlphaGo Zero” (英語). Korea Bizwire. (19 October 2017) 21 October 2017閲覧。

[16] “New version of AlphaGo can master Weiqi without human help” (英語). China News Service. (19 October 2017) 21 October 2017閲覧。

[sohu0524-17] “【柯洁战败解密】AlphaGo Master最新架构和算法，谷歌云与TPU拆解” (Chinese). Sohu (24 May 2017). 1 June 2017閲覧。

[18] David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, Demis Hassabis (5 December 2017). "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm". arXiv:1712.01815 [cs.AI]。

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]