首页 > 其他分享 >Q-learning and RL implementation

Q-learning and RL implementation

时间:2023-09-01 19:22:05浏览次数:29  
标签:play implementation actions game learning RL model class

Aim: Train a model to properly play vintage video games...

Deep Q-learning Algo~

Very short Brief of Notations:

{A,pi(Policy),Q(quality of action-at a state),R ((s,a,s') - Reward, s state doing a to go to s' and get a specific r)}

 

So, if we want to train a model to play a video game like master. Modules are to be implemented as minimum, listed. below:

  • a class that can catch enough frames(typically consequtive) for game env analysis -> might need preprocessing to lower the memory overhead
  • a class of NN based model for training, weights init/update/storage/write/fork/reset; also the actions in a single play is recorded for optimization
  • a class that utilize the possible actions and abstrct to humble level to do anything player is going to do w/o generative issue at the beginning(can go general when model matured)
  • game to model/pre-processing module

This is just the minimum...

 

标签:play,implementation,actions,game,learning,RL,model,class
From: https://www.cnblogs.com/selfmade-Henderson/p/17672711.html

相关文章

  • 迁移学习(CLDA)《CLDA: Contrastive Learning for Semi-Supervised Domain Adaptation》
    Note:[wechat:Y466551|可加勿骚扰,付费咨询]论文信息论文标题:CLDA:ContrastiveLearningforSemi-SupervisedDomainAdaptation论文作者:AnkitSingh论文来源:NeurIPS2021论文地址:download 论文代码:download视屏讲解:click1简介动机:半监督导致来自标记源和目标样本的......
  • 解决VR中UGUI world space UI会被其他物体遮挡的问题
    问题:在制作VR内容时,通常使用的都是UGUI,一般会将Canvas的rendermode设置为worldspace,但是这样设置过后,因为UI是直接放在了场景里面,很容易被其他物体挡住。解决方法:Unity官方的一个VR例子中给出了一个shader的解决方案。将如下的shader文件挂在材质上将这个材质赋予需要总......
  • 【五期邹昱夫】CCF-A(TIFS'23)SAFELearning: Secure Aggregation in Federated Learning
    "Zhang,Zhuosheng,etal."SAFELearning:SecureAggregationinFederatedLearningwithBackdoorDetectability."IEEETransactionsonInformationForensicsandSecurity(2023)."  本文提出了一种在联邦学习场景下可以保护隐私并防御后门攻击的聚合方法。作者认......
  • windows10创建conda环境失败:CondaHTTPError: HTTP 000 CONNECTION FAILED for url <htt
    问题描述创建新环境时,报错,创建不成功Collectingpackagemetadata(current_repodata.json):doneSolvingenvironment:doneCondaHTTPError:HTTP000CONNECTIONFAILEDforurl<https://conda.anaconda.org/conda-forge/linux-64/current_repodata.json>Elapsed:-AnHTTP......
  • 工作中你会使用到 grpcurl 吗?
    在平时的开发过程中,我们一般是http接口对外,grpc接口对内部微服务相信对于如何去请求http接口,大家都很熟悉了如果是inux里面使用curl命令在windows里面我们可以使用postman来请求接口如果对于一个云上开发的接口的话,我们可能会使用apifox来进行请求那么......
  • Learning Auxiliary Monocular Contexts Helps Monocular 3D Object Detection (2)
    Featurebackbone采用DLA,输入维度为3×H×W的RGB图,得到维度D×h×w的特征图F,然后将特征图送入几个轻量级regressionheads,2Dboudingboxes的中心特征图用下面的模块得到:其中AN是AttentiveNormalization.用公式表示:类似的,2D和3Dboudingboxes的中心之间的offset用公......
  • sizeof和strlen的区别及数组名的2个例外
    sizeof是一个操作符,是用来计算变量所占内存空间的大小,不关注内存中存放的具体内容,单位是字节。strlen是一个库函数,专门求字符串长度的,只能针对字符串,从参数给定的地址向后一直找’\0‘,统计’\0‘之前出现的字符个数。数组名确实是可表示首元素地址,但有2个例外:1.sizeof(数组名),这里......
  • centos7 中 configure: error: libcurl library not found
     001、configure:error:libcurllibrarynotfound 002、解决方法[root@pc1test01]#yum-yinstalllibcurl-devel 。 ......
  • 20230627 java.net.URL
    介绍java.net.URLpublicfinalclassURLimplementsjava.io.SerializableURI是个纯粹的语法结构,包含用来指定Web资源的字符串的各种组成部分URL是URI的一个特例,它包含了用于定位Web资源的足够信息URL语法authority部分具有以下形式:[user-info@]host[:port]......
  • 【问题记录】The TLS connection was non-properly terminated.
    系统:ubuntu22.04TLSgit克隆到本地时报错root@mail:/mail#gitclonehttps://github.com/mailcow/mailcow-dockerizedCloninginto'mailcow-dockerized'...fatal:unabletoaccess'https://github.com/mailcow/mailcow-dockerized/':GnuTLSrecverror......