強化学習

強化学習（Reinforcement Learning）では、数値化された報酬信号を最大にするために、「どのようにして状況に基づく動作選択を行うか」を学習する。

強化学習朱鷺の森

強化学習は「半教師あり学習」に分類される場合がある。ただし「半教師あり学習」という言葉は「ラベルありデータとラベルなしデータを併用する機械学習」という意味で用いられることもある。

教科書

強化学習（森北出版）

Richard S. Sutton, Andrew G. Barto 三上貞芳, 皆川雅章訳強化学習森北出版 (2000/12)

Reinforcement Learning: An Introduction

Richard S. Sutton and Andrew G. Barto The MIT Press, Cambridge, Massachusetts, London, England

http://www.cs.ualberta.ca/~sutton/book/ebook/the-book.html

<html> <div class="amazlet-box" style="margin-bottom:0px;"><div class="amazlet-image" style="float:left;margin:0px 12px 1px 0px;"><a href="http://www.amazon.co.jp/exec/obidos/ASIN/4627826613/r4wh-22/ref=nosim/" name="amazletlink" target="_blank"><img src="http://ecx.images-amazon.com/images/I/51PNPA99TZL._SL160_.jpg" alt="強化学習" style="border: none;" /></a></div><div class="amazlet-info" style="line-height:120%; margin-bottom: 10px"><div class="amazlet-name" style="margin-bottom:10px;line-height:120%"><a href="http://www.amazon.co.jp/exec/obidos/ASIN/4627826613/r4wh-22/ref=nosim/" name="amazletlink" target="_blank">強化学習</a><div class="amazlet-powered-date" style="font-size:80%;margin-top:5px;line-height:120%">posted with <a href="http://www.amazlet.com/browse/ASIN/4627826613/r4wh-22/ref=nosim/" title="強化学習" target="_blank">amazlet</a> at 10.11.20</div></div><div class="amazlet-detail">Richard S.Sutton Andrew G.Barto 三上貞芳皆川雅章 <br />森北出版 <br />売り上げランキング: 110952<br /></div><div class="amazlet-sub-info" style="float: left;"><div class="amazlet-link" style="margin-top: 5px"><a href="http://www.amazon.co.jp/exec/obidos/ASIN/4627826613/r4wh-22/ref=nosim/" name="amazletlink" target="_blank">Amazon.co.jp で詳細を見る</a></div></div></div><div class="amazlet-footer" style="clear: left"></div></div> </html>

<html> <div class="amazlet-box" style="margin-bottom:0px;"><div class="amazlet-image" style="float:left;margin:0px 12px 1px 0px;"><a href="http://www.amazon.co.jp/exec/obidos/ASIN/0262193981/r4wh-22/ref=nosim/" name="amazletlink" target="_blank"><img src="http://ecx.images-amazon.com/images/I/51h2QCCfYJL._SL160_.jpg" alt="Reinforcement Learning: An Introduction (Adaptive Computation and Machine Learning)" style="border: none;" /></a></div><div class="amazlet-info" style="line-height:120%; margin-bottom: 10px"><div class="amazlet-name" style="margin-bottom:10px;line-height:120%"><a href="http://www.amazon.co.jp/exec/obidos/ASIN/0262193981/r4wh-22/ref=nosim/" name="amazletlink" target="_blank">Reinforcement Learning: An Introduction (Adaptive Computation and Machine Learning)</a><div class="amazlet-powered-date" style="font-size:80%;margin-top:5px;line-height:120%">posted with <a href="http://www.amazlet.com/browse/ASIN/0262193981/r4wh-22/ref=nosim/" title="Reinforcement Learning: An Introduction (Adaptive Computation and Machine Learning)" target="_blank">amazlet</a> at 10.11.20</div></div><div class="amazlet-detail">Richard S. Sutton Andrew G. Barto <br />The MIT Press <br />売り上げランキング: 64582<br /></div><div class="amazlet-sub-info" style="float: left;"><div class="amazlet-link" style="margin-top: 5px"><a href="http://www.amazon.co.jp/exec/obidos/ASIN/0262193981/r4wh-22/ref=nosim/" name="amazletlink" target="_blank">Amazon.co.jp で詳細を見る</a></div></div></div><div class="amazlet-footer" style="clear: left"></div></div> </html>

links

http://www.scholarpedia.org/article/Reinforcement_learning ScholarPedia

http://en.wikipedia.org/wiki/Reinforcement_learning WikiPedia

http://biorobot.ise.ibaraki.ac.jp/~inoue/links.html

http://sysplan.nams.kyushu-u.ac.jp/gen/edu/RL_intro.html

木村元，宮崎和光，小林重信：強化学習システムの設計指針，計測と制御, Vol.38, No.10, pp.618–623 (1999), 計測自動制御学会.

POMDP

http://www.cs.brown.edu/research/ai/pomdp/tutorial/index.html POMDPs for Dummies

http://www.cs.duke.edu/~mlittman/topics/pomdp-page.html POMDP information page

matlab and python software list

http://www.dia.fi.upm.es/~jamartin/download.htm

orange

install linux : http://www.ailab.si/orange/downloads-linux.asp

http://www.ailab.si/orange/doc/modules/orngReinforcement.htm

RLSarsa

Class RLSarsa is an implementation of linear, gradient-descendant Sarsa(lambda) with tile coding. The implementation closely follows the boxed algorithm in Figure 8.8 on page 212 in Sutton, 1998. It is a descendant of mountain car example from the book.

音声対話への応用

Partially observable Markov decision processes for spoken dialog systems

J. Williams et al., IJCAI, pp. 75-82, 2005

http://portal.acm.org/citation.cfm?id=1221967

Source Computer Speech and Language Volume 21 , Issue 2 (April 2007) Pages 393-422 Year of Publication: 2007 ISSN:0885-2308

Authors Jason D. Williams AT&T Labs - Research, 180 Park Avenue, Florham Park, NJ 07932, USA Steve Young Cambridge University, Engineering Department, UK

Publisher Academic Press Ltd. London, UK, UK

Bibliometrics Downloads (6 Weeks): n/a, Downloads (12 Months): n/a, Citation Count: 18

Williams et al., 2005a.

Williams, J.D., Poupart, P., Young, S.J., 2005a. Factored partially observable Markov decision processes for dialogue management. In: Proc. Workshop on Knowledge and Reasoning in Practical Dialog Systems, Int. Joint Conf. on Artificial Intelligence (IJCAI), Edinburgh.

Nicholas Roy, Joelle Pineau, Sebastian Thrun

risky.cs.inf.shizuoka.ac.jp/~cs3073/eiron_hosogai.pdf

Nicholas Roy, Joelle Pineau, Sebastian Thrun. Spoken Dialogue Management Using Probabilistic Reasoning. Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics (ACL2000). Hong Kong.

確率ロボティクス

書籍名確率ロボティクス
著者セバスチャンスラン, ウルフラムバーガード, Sebastian Thrun・Wolfram Burgard・Dieter Fox・上田隆一, ディーターフォックス
翻訳 Sebastian Thrun・Wolfram Burgard・Dieter Fox・上田隆一
出版社毎日コミュニケーションズ, 2007
ISBN 4839924015, 9784839924010
ページ数 577 ページ

<html> <div class="amazlet-box" style="margin-bottom:0px;"><div class="amazlet-image" style="float:left;margin:0px 12px 1px 0px;"><a href="http://www.amazon.co.jp/exec/obidos/ASIN/4839924015/r4wh-22/ref=nosim/" name="amazletlink" target="_blank"><img src="http://ecx.images-amazon.com/images/I/41ZEN58YtGL._SL160_.jpg" alt="確率ロボティクス (ROBOT books)" style="border: none;" /></a></div><div class="amazlet-info" style="line-height:120%; margin-bottom: 10px"><div class="amazlet-name" style="margin-bottom:10px;line-height:120%"><a href="http://www.amazon.co.jp/exec/obidos/ASIN/4839924015/r4wh-22/ref=nosim/" name="amazletlink" target="_blank">確率ロボティクス (ROBOT books)</a><div class="amazlet-powered-date" style="font-size:80%;margin-top:5px;line-height:120%">posted with <a href="http://www.amazlet.com/browse/ASIN/4839924015/r4wh-22/ref=nosim/" title="確率ロボティクス (ROBOT books)" target="_blank">amazlet</a> at 10.11.20</div></div><div class="amazlet-detail">Sebastian Thrun Wolfram Burgard Dieter Fox <br />毎日コミュニケーションズ <br />売り上げランキング: 388781<br /></div><div class="amazlet-sub-info" style="float: left;"><div class="amazlet-link" style="margin-top: 5px"><a href="http://www.amazon.co.jp/exec/obidos/ASIN/4839924015/r4wh-22/ref=nosim/" name="amazletlink" target="_blank">Amazon.co.jp で詳細を見る</a></div></div></div><div class="amazlet-footer" style="clear: left"></div></div> </html>

ja.nishimotz.com

目次