菜单 学习猿地 - LMONKEY

VIP

开通学习猿地VIP

尊享10项VIP特权 持续新增

知识通关挑战

打卡带练!告别无效练习

接私单赚外块

VIP优先接,累计金额超百万

学习猿地私房课免费学

大厂实战课仅对VIP开放

你的一对一导师

每月可免费咨询大牛30次

领取更多软件工程师实用特权

入驻
417
0

To discount or not to discount in reinforcement learning: A case study comparing R learning and Q learning

原创
05/13 14:22
阅读数 87422

 

 

https://www.cs.cmu.edu/afs/cs/project/jair/pub/volume4/kaelbling96a-html/node26.html

【平均-打折奖励】

Schwartz [106] examined the problem of adapting Q-learning to an average-reward framework. Although his R-learning algorithm seems to exhibit convergence problems for some MDPs, several researchers have found the average-reward criterion closer to the true problem they wish to solve than a discounted criterion and therefore prefer R-learning to Q-learning [69].

发表评论

0/200
417 点赞
0 评论
收藏