Sample-Based Learning and Search with Permanent and Transient Memories

时间：2023-03-30 12:11:12浏览次数：38

标签：Search Based Memories transient search sample permanent learning based

发表时间：2008（ICML 2008）
文章要点：这篇文章提出Dyna-2算法，把sample-based learning and sample-based search结合起来，并在Go上进行测试。作者认为，search算法是一种transient的算法，就是短期记忆用了就忘了，而像Sarsa这类learning算法是长期的永久性的记忆。所以作者同时维护两个memory，还维护了两个$Q$用来表示permanent value以及permanent和transient的组合

然后learning就学的$Q$，search的时候就学的$\bar Q$,然后在学习的时候，permanent memory是不被清空的，而transient memory每个episode都会被清空。具体的，作者用的Sarsa来更新两个$Q$，动作的选择就是$\epsilon$-greedy。
总结：一篇很早很早的model-based方法的文章了，现在看起来感觉没啥新意，不过回想一下AlphaGo的思想，是不是也相当于一个permanent的network，在加一个transient的MCTS。这个思想贯穿了silver的整个研究生涯啊。
这里还学到了planning和search的一点区别，planning是在model里去做的，效果取决于model的准确度，而search是在真实状态上做的（Sample-based planning applies sample-based reinforcement learning methods to simulated experience. This requires a sample model of the world. In sample-based search, experience is simulated from the real state s, so as to identify the best action from this state.）。
疑问：因为当时还是线性函数近似，里面讲了很多构造特征的东西，看不明白。

标签：Search,Based,Memories,transient,search,sample,permanent,learning,based
From： https://www.cnblogs.com/initial-h/p/17272111.html

PHP 多维数组搜索 PHP multi dimensional array search
array_column()返回input数组中键值为column_key的列，如果指定了可选参数index_key，那么input数组中的这一列的值将作为返回数组中对应值的键。参数input需要取出数组......
基于 Elasticsearch + kibana 实现 IP 地址分布地图可视化
地址库在ELK中，我们可以使用地址库，来对IP进行分析，对日志进行分析，在ELKstack中只有Logstash可以做到，但是出图，是Kibana来出的，所以我们首先需要下载地址库数据文件，然后对Logstas......
CF(2E) Keshi in Search of AmShZ (图论,最短路,建边权值变形)
思路: 关键是操作2的性质:随机找->找一个路径最长的点操作1,阻止建边顾名思义, 发现和最短路很想,从n到每一个点的权值嘛改变权值更新方式,边的权值为:va......
elasticsearch-head 安装
概念elasticsearch-head是elasticsearch的可视化工具，能够比较简便的查看、删除索引，查看索引数据，执行查询命令。它需要安装node和grunt才能使用安装ubuntu安装：下......
Learning model-based planning from scratch
发表时间：2017文章要点：这篇文章想说，之前的文章去做planning的时候，都会去设计一个planning的方法。这篇文章提出了一个端到端的方法，Imagination-basedPlanner，不去设计plan......
2023最新ELK日志平台（elasticsearch+logstash+kibana）搭建
前言去年公司由于不断发展，内部自研系统越来越多，所以后来搭建了一个日志收集平台，并将日志收集功能以二方包形式引入自研系统，避免每个自研系统都要建立一套自己的日志模块，节......
elasticsearch服务类封装
<?phpnamespaceapp\service;useElasticsearch\ClientBuilder;useapp\service\Service;classElasticsearchServiceextendsService{private$client;p......
Elasticsearch 学习--安装（windows版本）,基本操作（使用Postman）
Elasticsearch学习--安装（windows版本）,基本操作（使用Postman）2.1Elasticsearch安装2.1.1下载软件Elasticsearch的官方地址：https://www.elastic.co/cn/Elasticsearch......
可搜索加密(Searchable Encryption)机制概述
引言[1]：可搜索加密(searchableencryption，SE)是近年来发展的一种支持用户在密文上进行关键字查找的密码学原语，能够为用户节省大量的网络和计算开销，并充分利用云端服务......
Xcode的Search Paths配置
在Xcode中的文件搜索路径配置有两个地方，一个是Project层的配置，一个是Target的配置。Project-BuildSettings-SearchPathsTarget-BuildSettings-SearchPaths在Target......

Sample-Based Learning and Search with Permanent and Transient Memories

相关文章

赞助商

阅读排行