強化学習

強化学習(Reinforcement Learning)では、数値化された報酬信号を最大にするために、 「どのようにして状況に基づく動作選択を行うか」 を学習する。

強化学習 朱鷺の森

強化学習は「半教師あり学習」に分類される場合がある。 ただし 「半教師あり学習」という言葉は「ラベルありデータとラベルなしデータを併用する機械学習」という意味で用いられることもある。

教科書

強化学習(森北出版)

Richard S. Sutton, Andrew G. Barto 三上 貞芳, 皆川 雅章 訳 強化学習 森北出版 (2000/12)

Reinforcement Learning: An Introduction

Richard S. Sutton and Andrew G. Barto The MIT Press, Cambridge, Massachusetts, London, England

http://www.cs.ualberta.ca/~sutton/book/ebook/the-book.html

<html> <div class="amazlet-box" style="margin-bottom:0px;"><div class="amazlet-image" style="float:left;margin:0px 12px 1px 0px;"><a href="http://www.amazon.co.jp/exec/obidos/ASIN/4627826613/r4wh-22/ref=nosim/" name="amazletlink" target="_blank"><img src="http://ecx.images-amazon.com/images/I/51PNPA99TZL._SL160_.jpg" alt="強化学習" style="border: none;" /></a></div><div class="amazlet-info" style="line-height:120%; margin-bottom: 10px"><div class="amazlet-name" style="margin-bottom:10px;line-height:120%"><a href="http://www.amazon.co.jp/exec/obidos/ASIN/4627826613/r4wh-22/ref=nosim/" name="amazletlink" target="_blank">強化学習</a><div class="amazlet-powered-date" style="font-size:80%;margin-top:5px;line-height:120%">posted with <a href="http://www.amazlet.com/browse/ASIN/4627826613/r4wh-22/ref=nosim/" title="強化学習" target="_blank">amazlet</a> at 10.11.20</div></div><div class="amazlet-detail">Richard S.Sutton Andrew G.Barto 三上 貞芳 皆川 雅章 <br />森北出版 <br />売り上げランキング: 110952<br /></div><div class="amazlet-sub-info" style="float: left;"><div class="amazlet-link" style="margin-top: 5px"><a href="http://www.amazon.co.jp/exec/obidos/ASIN/4627826613/r4wh-22/ref=nosim/" name="amazletlink" target="_blank">Amazon.co.jp で詳細を見る</a></div></div></div><div class="amazlet-footer" style="clear: left"></div></div> </html>

<html> <div class="amazlet-box" style="margin-bottom:0px;"><div class="amazlet-image" style="float:left;margin:0px 12px 1px 0px;"><a href="http://www.amazon.co.jp/exec/obidos/ASIN/0262193981/r4wh-22/ref=nosim/" name="amazletlink" target="_blank"><img src="http://ecx.images-amazon.com/images/I/51h2QCCfYJL._SL160_.jpg" alt="Reinforcement Learning: An Introduction (Adaptive Computation and Machine Learning)" style="border: none;" /></a></div><div class="amazlet-info" style="line-height:120%; margin-bottom: 10px"><div class="amazlet-name" style="margin-bottom:10px;line-height:120%"><a href="http://www.amazon.co.jp/exec/obidos/ASIN/0262193981/r4wh-22/ref=nosim/" name="amazletlink" target="_blank">Reinforcement Learning: An Introduction (Adaptive Computation and Machine Learning)</a><div class="amazlet-powered-date" style="font-size:80%;margin-top:5px;line-height:120%">posted with <a href="http://www.amazlet.com/browse/ASIN/0262193981/r4wh-22/ref=nosim/" title="Reinforcement Learning: An Introduction (Adaptive Computation and Machine Learning)" target="_blank">amazlet</a> at 10.11.20</div></div><div class="amazlet-detail">Richard S. Sutton Andrew G. Barto <br />The MIT Press <br />売り上げランキング: 64582<br /></div><div class="amazlet-sub-info" style="float: left;"><div class="amazlet-link" style="margin-top: 5px"><a href="http://www.amazon.co.jp/exec/obidos/ASIN/0262193981/r4wh-22/ref=nosim/" name="amazletlink" target="_blank">Amazon.co.jp で詳細を見る</a></div></div></div><div class="amazlet-footer" style="clear: left"></div></div> </html>

http://www.scholarpedia.org/article/Reinforcement_learning ScholarPedia

http://en.wikipedia.org/wiki/Reinforcement_learning WikiPedia

http://biorobot.ise.ibaraki.ac.jp/~inoue/links.html

http://sysplan.nams.kyushu-u.ac.jp/gen/edu/RL_intro.html

  • 木村 元,宮崎 和光,小林 重信:強化学習システムの設計指針, 計測と制御, Vol.38, No.10, pp.618–623 (1999), 計測自動制御学会.

POMDP

matlab and python software list

orange

install linux : http://www.ailab.si/orange/downloads-linux.asp

http://www.ailab.si/orange/doc/modules/orngReinforcement.htm

RLSarsa

Class RLSarsa is an implementation of linear, gradient-descendant Sarsa(lambda) with tile coding. The implementation closely follows the boxed algorithm in Figure 8.8 on page 212 in Sutton, 1998. It is a descendant of mountain car example from the book.

音声対話への応用

Partially observable Markov decision processes for spoken dialog systems

J. Williams et al., IJCAI, pp. 75-82, 2005

http://portal.acm.org/citation.cfm?id=1221967

Source Computer Speech and Language Volume 21 , Issue 2 (April 2007) Pages 393-422 Year of Publication: 2007 ISSN:0885-2308

Authors Jason D. Williams AT&T Labs - Research, 180 Park Avenue, Florham Park, NJ 07932, USA Steve Young Cambridge University, Engineering Department, UK

Publisher Academic Press Ltd. London, UK, UK

Bibliometrics Downloads (6 Weeks): n/a, Downloads (12 Months): n/a, Citation Count: 18

Williams et al., 2005a.

Williams, J.D., Poupart, P., Young, S.J., 2005a. Factored partially observable Markov decision processes for dialogue management. In: Proc. Workshop on Knowledge and Reasoning in Practical Dialog Systems, Int. Joint Conf. on Artificial Intelligence (IJCAI), Edinburgh.

Nicholas Roy, Joelle Pineau, Sebastian Thrun

risky.cs.inf.shizuoka.ac.jp/~cs3073/eiron_hosogai.pdf

Nicholas Roy, Joelle Pineau, Sebastian Thrun. Spoken Dialogue Management Using Probabilistic Reasoning. Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics (ACL2000). Hong Kong.

確率ロボティクス

  • 書籍名 確率ロボティクス
  • 著者 セバスチャンスラン, ウルフラムバーガード, Sebastian Thrun・Wolfram Burgard・Dieter Fox・上田隆一, ディーター フォックス
  • 翻訳 Sebastian Thrun・Wolfram Burgard・Dieter Fox・上田隆一
  • 出版社 毎日コミュニケーションズ, 2007
  • ISBN 4839924015, 9784839924010
  • ページ数 577 ページ

<html> <div class="amazlet-box" style="margin-bottom:0px;"><div class="amazlet-image" style="float:left;margin:0px 12px 1px 0px;"><a href="http://www.amazon.co.jp/exec/obidos/ASIN/4839924015/r4wh-22/ref=nosim/" name="amazletlink" target="_blank"><img src="http://ecx.images-amazon.com/images/I/41ZEN58YtGL._SL160_.jpg" alt="確率ロボティクス (ROBOT books)" style="border: none;" /></a></div><div class="amazlet-info" style="line-height:120%; margin-bottom: 10px"><div class="amazlet-name" style="margin-bottom:10px;line-height:120%"><a href="http://www.amazon.co.jp/exec/obidos/ASIN/4839924015/r4wh-22/ref=nosim/" name="amazletlink" target="_blank">確率ロボティクス (ROBOT books)</a><div class="amazlet-powered-date" style="font-size:80%;margin-top:5px;line-height:120%">posted with <a href="http://www.amazlet.com/browse/ASIN/4839924015/r4wh-22/ref=nosim/" title="確率ロボティクス (ROBOT books)" target="_blank">amazlet</a> at 10.11.20</div></div><div class="amazlet-detail">Sebastian Thrun Wolfram Burgard Dieter Fox <br />毎日コミュニケーションズ <br />売り上げランキング: 388781<br /></div><div class="amazlet-sub-info" style="float: left;"><div class="amazlet-link" style="margin-top: 5px"><a href="http://www.amazon.co.jp/exec/obidos/ASIN/4839924015/r4wh-22/ref=nosim/" name="amazletlink" target="_blank">Amazon.co.jp で詳細を見る</a></div></div></div><div class="amazlet-footer" style="clear: left"></div></div> </html>

reinforcement_learning.txt · 最終更新: 2010/11/20 09:24 by Takuya Nishimoto
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0