首页 > 其他分享 >2.1寒假每日总结23

2.1寒假每日总结23

时间:2024-02-01 21:46:51浏览次数:31  
标签:loss clip 23 寒假 time 2.1 ----------------------------------------- ep mean

最最简单的超级马里奥训练过程

from nes_py.wrappers import JoypadSpace
import gym_super_mario_bros
from gym_super_mario_bros.actions import SIMPLE_MOVEMENT
import time
from matplotlib import pyplot as plt
from stable_baselines3 import PPO
env = gym_super_mario_bros.make('SuperMarioBros-v0')
env = JoypadSpace(env, SIMPLE_MOVEMENT)
tensorboard_log = r'./tensorboard_log/'

model = PPO("CnnPolicy", env, verbose=1,
            tensorboard_log = tensorboard_log)
model.learn(total_timesteps=25000)
model.save("mario_model")
Using cuda device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
Wrapping the env in a VecTransposeImage.
Logging to ./tensorboard_log/PPO_1


D:\software\e_anaconda\envs\pytorch\lib\site-packages\gym_super_mario_bros\smb_env.py:148: RuntimeWarning: overflow encountered in ubyte_scalars
  return (self.ram[0x86] - self.ram[0x071c]) % 256


-----------------------------
| time/              |      |
|    fps             | 116  |
|    iterations      | 1    |
|    time_elapsed    | 17   |
|    total_timesteps | 2048 |
-----------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 81          |
|    iterations           | 2           |
|    time_elapsed         | 50          |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.025405666 |
|    clip_fraction        | 0.274       |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.92       |
|    explained_variance   | 0.00504     |
|    learning_rate        | 0.0003      |
|    loss                 | 0.621       |
|    n_updates            | 10          |
|    policy_gradient_loss | 0.0109      |
|    value_loss           | 17.4        |
-----------------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 73          |
|    iterations           | 3           |
|    time_elapsed         | 83          |
|    total_timesteps      | 6144        |
| train/                  |             |
|    approx_kl            | 0.010906073 |
|    clip_fraction        | 0.109       |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.92       |
|    explained_variance   | 0.0211      |
|    learning_rate        | 0.0003      |
|    loss                 | 0.101       |
|    n_updates            | 20          |
|    policy_gradient_loss | -0.00392    |
|    value_loss           | 0.187       |
-----------------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 69          |
|    iterations           | 4           |
|    time_elapsed         | 117         |
|    total_timesteps      | 8192        |
| train/                  |             |
|    approx_kl            | 0.009882288 |
|    clip_fraction        | 0.0681      |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.9        |
|    explained_variance   | 0.101       |
|    learning_rate        | 0.0003      |
|    loss                 | 0.0738      |
|    n_updates            | 30          |
|    policy_gradient_loss | -0.00502    |
|    value_loss           | 0.13        |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1.01e+04    |
|    ep_rew_mean          | 891         |
| time/                   |             |
|    fps                  | 65          |
|    iterations           | 5           |
|    time_elapsed         | 156         |
|    total_timesteps      | 10240       |
| train/                  |             |
|    approx_kl            | 0.008186281 |
|    clip_fraction        | 0.105       |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.87       |
|    explained_variance   | 0.0161      |
|    learning_rate        | 0.0003      |
|    loss                 | 0.28        |
|    n_updates            | 40          |
|    policy_gradient_loss | -0.00649    |
|    value_loss           | 0.811       |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1.01e+04    |
|    ep_rew_mean          | 891         |
| time/                   |             |
|    fps                  | 64          |
|    iterations           | 6           |
|    time_elapsed         | 190         |
|    total_timesteps      | 12288       |
| train/                  |             |
|    approx_kl            | 0.024062362 |
|    clip_fraction        | 0.246       |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.9        |
|    explained_variance   | 0.269       |
|    learning_rate        | 0.0003      |
|    loss                 | 0.54        |
|    n_updates            | 50          |
|    policy_gradient_loss | 0.0362      |
|    value_loss           | 10.8        |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1.01e+04    |
|    ep_rew_mean          | 891         |
| time/                   |             |
|    fps                  | 63          |
|    iterations           | 7           |
|    time_elapsed         | 225         |
|    total_timesteps      | 14336       |
| train/                  |             |
|    approx_kl            | 0.024466533 |
|    clip_fraction        | 0.211       |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.89       |
|    explained_variance   | 0.839       |
|    learning_rate        | 0.0003      |
|    loss                 | 0.435       |
|    n_updates            | 60          |
|    policy_gradient_loss | 0.023       |
|    value_loss           | 3.06        |
-----------------------------------------
----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 1.01e+04   |
|    ep_rew_mean          | 891        |
| time/                   |            |
|    fps                  | 63         |
|    iterations           | 8          |
|    time_elapsed         | 259        |
|    total_timesteps      | 16384      |
| train/                  |            |
|    approx_kl            | 0.01970315 |
|    clip_fraction        | 0.242      |
|    clip_range           | 0.2        |
|    entropy_loss         | -1.9       |
|    explained_variance   | 0.486      |
|    learning_rate        | 0.0003     |
|    loss                 | 0.526      |
|    n_updates            | 70         |
|    policy_gradient_loss | 0.00486    |
|    value_loss           | 1.57       |
----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1.01e+04    |
|    ep_rew_mean          | 891         |
| time/                   |             |
|    fps                  | 62          |
|    iterations           | 9           |
|    time_elapsed         | 293         |
|    total_timesteps      | 18432       |
| train/                  |             |
|    approx_kl            | 0.012460884 |
|    clip_fraction        | 0.217       |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.87       |
|    explained_variance   | 0.74        |
|    learning_rate        | 0.0003      |
|    loss                 | 0.139       |
|    n_updates            | 80          |
|    policy_gradient_loss | -0.000311   |
|    value_loss           | 0.734       |
-----------------------------------------
----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 1.01e+04   |
|    ep_rew_mean          | 891        |
| time/                   |            |
|    fps                  | 62         |
|    iterations           | 10         |
|    time_elapsed         | 327        |
|    total_timesteps      | 20480      |
| train/                  |            |
|    approx_kl            | 0.02535792 |
|    clip_fraction        | 0.298      |
|    clip_range           | 0.2        |
|    entropy_loss         | -1.88      |
|    explained_variance   | 0.405      |
|    learning_rate        | 0.0003     |
|    loss                 | 1.17       |
|    n_updates            | 90         |
|    policy_gradient_loss | 0.0205     |
|    value_loss           | 6.6        |
----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1.01e+04    |
|    ep_rew_mean          | 891         |
| time/                   |             |
|    fps                  | 62          |
|    iterations           | 11          |
|    time_elapsed         | 361         |
|    total_timesteps      | 22528       |
| train/                  |             |
|    approx_kl            | 0.019694094 |
|    clip_fraction        | 0.243       |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.91       |
|    explained_variance   | 0.952       |
|    learning_rate        | 0.0003      |
|    loss                 | 0.39        |
|    n_updates            | 100         |
|    policy_gradient_loss | -0.00434    |
|    value_loss           | 1.31        |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1.19e+04    |
|    ep_rew_mean          | 884         |
| time/                   |             |
|    fps                  | 61          |
|    iterations           | 12          |
|    time_elapsed         | 398         |
|    total_timesteps      | 24576       |
| train/                  |             |
|    approx_kl            | 0.013096321 |
|    clip_fraction        | 0.227       |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.91       |
|    explained_variance   | 0.0132      |
|    learning_rate        | 0.0003      |
|    loss                 | 0.669       |
|    n_updates            | 110         |
|    policy_gradient_loss | -0.000837   |
|    value_loss           | 1.42        |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1.19e+04    |
|    ep_rew_mean          | 884         |
| time/                   |             |
|    fps                  | 61          |
|    iterations           | 13          |
|    time_elapsed         | 432         |
|    total_timesteps      | 26624       |
| train/                  |             |
|    approx_kl            | 0.014833134 |
|    clip_fraction        | 0.239       |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.9        |
|    explained_variance   | 0.452       |
|    learning_rate        | 0.0003      |
|    loss                 | 18.1        |
|    n_updates            | 120         |
|    policy_gradient_loss | -7.3e-05    |
|    value_loss           | 26.3        |
-----------------------------------------

测试代码

from nes_py.wrappers import JoypadSpace
import gym_super_mario_bros
from gym_super_mario_bros.actions import SIMPLE_MOVEMENT
import time
from matplotlib import pyplot as plt
from stable_baselines3 import PPO
env = gym_super_mario_bros.make('SuperMarioBros-v0')
env = JoypadSpace(env, SIMPLE_MOVEMENT)
model = PPO.load("mario_model")

obs = env.reset()
obs=obs.copy()
done = True
while True:
    if done:
        state = env.reset()
    action, _states = model.predict(obs)
    obs, rewards, done, info = env.step(action)
    obs=obs.copy()
    env.render()

标签:loss,clip,23,寒假,time,2.1,-----------------------------------------,ep,mean
From: https://www.cnblogs.com/2351920019xin/p/18002169

相关文章

  • 痞子衡嵌入式:我入选了2023年度与非网(eefocus)最佳创作者Top15
    最近收到了「与非网」发来的2023年度最佳创作者证书,证书做得一如既往地有质感,这是与非网第二次给痞子衡发证书了,足见与非网对痞子衡的认可。与非网自2021年起,每年都会评选一次年度创作者,第一年叫星选创作者TOP10,第二年叫影响力创作者TOP10,第三年也就是今年变成了最佳创......
  • 2.1闲话 - 『奏起我的幻想曲』
    打算换个闲话风格,现在的看起来非常不好今天手滑交了个图片上去卡住了评测集导致被D了漆黑的夜古城之中遗失的传说漆黑的翼苏醒之后陌生的轮廓是谁在我的身后呼唤一时的怔忪是谁在我的心中呼唤为何不将一切掌控快速傅立叶之二我们可以进行卷积的式子标准形......
  • 2023年度全年学术论文参考文献清单汇总
    状态时间详情结果 2023-07-2508:55'新媒体时代博物馆数字化,人文化,品牌化传播策略——以湖北省博物馆为例'全文链接:'https://wenku.baidu.com/view/bc9fdd13fac75fbfc77da26925c52cc58ad690d0?fr=xueshu_top'VP:病毒潜水艇时间:2023-07-2508:57  2023-0......
  • 2.1
    今天写得有点早,主要经历了一些事情。实际上也没什么不正常的。上午先打\(AC\)自动机,确切的说压根没理解是个什么东西,打病毒这道题,因为没理解费解了好长时间,几乎大半个上午,最后发到博客里请各位大佬指教,恍然大悟。挺尴尬的就把博删了。下午继续搞,先把病毒那题\(A\)了,继续......
  • 2.1
    二月第一天!整个emoji里最抽象的字符串:......
  • 证书-23
    ####证书申请说明查找《数字证书管理服务》服务购买后得到记录值如下:尾......
  • Klocwork 2023.4发布:问题匹配算法升级,编码标准全面支持!
    Klocwork2023.4的新增功能Klocwork2023.4改进了问题匹配的算法,为桌面端和CI集成构建之间的结果提供了更大的一致性,以及连续构建之间的问题匹配。Klocwork的最新版本还改进了C/C++语言的分析引擎,减少了误报/漏报,跨过程跟踪数组索引中的值和具有常量表达式的值。此外,还对IDE插......
  • 寒假生活指导24
    #coding:utf8#指定源代码编码格式为UTF-8frompyspark.sqlimportSparkSession#导入SparkSession类,用于创建和管理Spark应用上下文frompyspark.sql.functionsimportconcat,expr,col#导入SparkSQL中的函数,这里并未使用但可能在后续操作中用于数据转换或计算f......
  • Mac 安装goland2023.3
    DataGrip/Goland相关工具链接:https://pan.baidu.com/s/1UTSusTKPPnIqxdKCAi1oKg提取码:9wej对应的激活码此处获取:https://docs.qq.com/doc/DZWFmak1WcVBhdENumac使用命令shxxx.sh执行如果原来有安装goland的话,需要先卸载干净访达中在资源库中清除......
  • 23种设计模式
    https://www.bilibili.com/video/BV1Yr4y157Ci?p=26&spm_id_from=pageDriver&vd_source=26936cf2df4b6c321f63de2ec139cfdc八大原则依赖倒置原则(DIP)•高层模块(稳定)不应该依赖于低层模块(变化),二者都应该依赖于抽象(稳定)。•抽象(稳定)不应该依赖于实现细节(变化),实现细......