目次
強化学習
強化学習(Reinforcement Learning)では、数値化された報酬信号を最大にするために、 「どのようにして状況に基づく動作選択を行うか」 を学習する。
強化学習 朱鷺の森
強化学習は「半教師あり学習」に分類される場合がある。 ただし 「半教師あり学習」という言葉は「ラベルありデータとラベルなしデータを併用する機械学習」という意味で用いられることもある。
教科書
強化学習(森北出版)
Richard S. Sutton, Andrew G. Barto 三上 貞芳, 皆川 雅章 訳 強化学習 森北出版 (2000/12)
Reinforcement Learning: An Introduction
Richard S. Sutton and Andrew G. Barto The MIT Press, Cambridge, Massachusetts, London, England
http://www.cs.ualberta.ca/~sutton/book/ebook/the-book.html
<html> <div class="amazlet-box" style="margin-bottom:0px;"><div class="amazlet-image" style="float:left;margin:0px 12px 1px 0px;"><a href="http://www.amazon.co.jp/exec/obidos/ASIN/4627826613/r4wh-22/ref=nosim/" name="amazletlink" target="_blank"><img src="http://ecx.images-amazon.com/images/I/51PNPA99TZL._SL160_.jpg" alt="強化学習" style="border: none;" /></a></div><div class="amazlet-info" style="line-height:120%; margin-bottom: 10px"><div class="amazlet-name" style="margin-bottom:10px;line-height:120%"><a href="http://www.amazon.co.jp/exec/obidos/ASIN/4627826613/r4wh-22/ref=nosim/" name="amazletlink" target="_blank">強化学習</a><div class="amazlet-powered-date" style="font-size:80%;margin-top:5px;line-height:120%">posted with <a href="http://www.amazlet.com/browse/ASIN/4627826613/r4wh-22/ref=nosim/" title="強化学習" target="_blank">amazlet</a> at 10.11.20</div></div><div class="amazlet-detail">Richard S.Sutton Andrew G.Barto 三上 貞芳 皆川 雅章 <br />森北出版 <br />売り上げランキング: 110952<br /></div><div class="amazlet-sub-info" style="float: left;"><div class="amazlet-link" style="margin-top: 5px"><a href="http://www.amazon.co.jp/exec/obidos/ASIN/4627826613/r4wh-22/ref=nosim/" name="amazletlink" target="_blank">Amazon.co.jp で詳細を見る</a></div></div></div><div class="amazlet-footer" style="clear: left"></div></div> </html>
<html> <div class="amazlet-box" style="margin-bottom:0px;"><div class="amazlet-image" style="float:left;margin:0px 12px 1px 0px;"><a href="http://www.amazon.co.jp/exec/obidos/ASIN/0262193981/r4wh-22/ref=nosim/" name="amazletlink" target="_blank"><img src="http://ecx.images-amazon.com/images/I/51h2QCCfYJL._SL160_.jpg" alt="Reinforcement Learning: An Introduction (Adaptive Computation and Machine Learning)" style="border: none;" /></a></div><div class="amazlet-info" style="line-height:120%; margin-bottom: 10px"><div class="amazlet-name" style="margin-bottom:10px;line-height:120%"><a href="http://www.amazon.co.jp/exec/obidos/ASIN/0262193981/r4wh-22/ref=nosim/" name="amazletlink" target="_blank">Reinforcement Learning: An Introduction (Adaptive Computation and Machine Learning)</a><div class="amazlet-powered-date" style="font-size:80%;margin-top:5px;line-height:120%">posted with <a href="http://www.amazlet.com/browse/ASIN/0262193981/r4wh-22/ref=nosim/" title="Reinforcement Learning: An Introduction (Adaptive Computation and Machine Learning)" target="_blank">amazlet</a> at 10.11.20</div></div><div class="amazlet-detail">Richard S. Sutton Andrew G. Barto <br />The MIT Press <br />売り上げランキング: 64582<br /></div><div class="amazlet-sub-info" style="float: left;"><div class="amazlet-link" style="margin-top: 5px"><a href="http://www.amazon.co.jp/exec/obidos/ASIN/0262193981/r4wh-22/ref=nosim/" name="amazletlink" target="_blank">Amazon.co.jp で詳細を見る</a></div></div></div><div class="amazlet-footer" style="clear: left"></div></div> </html>
links
http://www.scholarpedia.org/article/Reinforcement_learning ScholarPedia
http://en.wikipedia.org/wiki/Reinforcement_learning WikiPedia
http://biorobot.ise.ibaraki.ac.jp/~inoue/links.html
http://sysplan.nams.kyushu-u.ac.jp/gen/edu/RL_intro.html
- 木村 元,宮崎 和光,小林 重信:強化学習システムの設計指針, 計測と制御, Vol.38, No.10, pp.618–623 (1999), 計測自動制御学会.
POMDP
http://www.cs.brown.edu/research/ai/pomdp/tutorial/index.html POMDPs for Dummies
http://www.cs.duke.edu/~mlittman/topics/pomdp-page.html POMDP information page
matlab and python software list
orange
install linux : http://www.ailab.si/orange/downloads-linux.asp
http://www.ailab.si/orange/doc/modules/orngReinforcement.htm
RLSarsa
Class RLSarsa is an implementation of linear, gradient-descendant Sarsa(lambda) with tile coding. The implementation closely follows the boxed algorithm in Figure 8.8 on page 212 in Sutton, 1998. It is a descendant of mountain car example from the book.
音声対話への応用
Partially observable Markov decision processes for spoken dialog systems
J. Williams et al., IJCAI, pp. 75-82, 2005
http://portal.acm.org/citation.cfm?id=1221967
Source Computer Speech and Language Volume 21 , Issue 2 (April 2007) Pages 393-422 Year of Publication: 2007 ISSN:0885-2308
Authors Jason D. Williams AT&T Labs - Research, 180 Park Avenue, Florham Park, NJ 07932, USA Steve Young Cambridge University, Engineering Department, UK
Publisher Academic Press Ltd. London, UK, UK
Bibliometrics Downloads (6 Weeks): n/a, Downloads (12 Months): n/a, Citation Count: 18
Williams et al., 2005a.
Williams, J.D., Poupart, P., Young, S.J., 2005a. Factored partially observable Markov decision processes for dialogue management. In: Proc. Workshop on Knowledge and Reasoning in Practical Dialog Systems, Int. Joint Conf. on Artificial Intelligence (IJCAI), Edinburgh.
Nicholas Roy, Joelle Pineau, Sebastian Thrun
risky.cs.inf.shizuoka.ac.jp/~cs3073/eiron_hosogai.pdf
Nicholas Roy, Joelle Pineau, Sebastian Thrun. Spoken Dialogue Management Using Probabilistic Reasoning. Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics (ACL2000). Hong Kong.
確率ロボティクス
- 書籍名 確率ロボティクス
- 著者 セバスチャンスラン, ウルフラムバーガード, Sebastian Thrun・Wolfram Burgard・Dieter Fox・上田隆一, ディーター フォックス
- 翻訳 Sebastian Thrun・Wolfram Burgard・Dieter Fox・上田隆一
- 出版社 毎日コミュニケーションズ, 2007
- ISBN 4839924015, 9784839924010
- ページ数 577 ページ
<html> <div class="amazlet-box" style="margin-bottom:0px;"><div class="amazlet-image" style="float:left;margin:0px 12px 1px 0px;"><a href="http://www.amazon.co.jp/exec/obidos/ASIN/4839924015/r4wh-22/ref=nosim/" name="amazletlink" target="_blank"><img src="http://ecx.images-amazon.com/images/I/41ZEN58YtGL._SL160_.jpg" alt="確率ロボティクス (ROBOT books)" style="border: none;" /></a></div><div class="amazlet-info" style="line-height:120%; margin-bottom: 10px"><div class="amazlet-name" style="margin-bottom:10px;line-height:120%"><a href="http://www.amazon.co.jp/exec/obidos/ASIN/4839924015/r4wh-22/ref=nosim/" name="amazletlink" target="_blank">確率ロボティクス (ROBOT books)</a><div class="amazlet-powered-date" style="font-size:80%;margin-top:5px;line-height:120%">posted with <a href="http://www.amazlet.com/browse/ASIN/4839924015/r4wh-22/ref=nosim/" title="確率ロボティクス (ROBOT books)" target="_blank">amazlet</a> at 10.11.20</div></div><div class="amazlet-detail">Sebastian Thrun Wolfram Burgard Dieter Fox <br />毎日コミュニケーションズ <br />売り上げランキング: 388781<br /></div><div class="amazlet-sub-info" style="float: left;"><div class="amazlet-link" style="margin-top: 5px"><a href="http://www.amazon.co.jp/exec/obidos/ASIN/4839924015/r4wh-22/ref=nosim/" name="amazletlink" target="_blank">Amazon.co.jp で詳細を見る</a></div></div></div><div class="amazlet-footer" style="clear: left"></div></div> </html>