首页 > 其他分享 >2024.2.2寒假每日总结24

2024.2.2寒假每日总结24

时间:2024-02-02 20:44:24浏览次数:42  
标签:24 loss 2024.2 clip 寒假 time ----------------------------------------- ep mean

算法题:1686. 石子游戏 VI - 力扣(LeetCode)

最最简单的超级马里奥训练过程
from nes_py.wrappers import JoypadSpace
import gym_super_mario_bros
from gym_super_mario_bros.actions import SIMPLE_MOVEMENT
import time
from matplotlib import pyplot as plt
from stable_baselines3 import PPO
env = gym_super_mario_bros.make('SuperMarioBros-v0')
env = JoypadSpace(env, SIMPLE_MOVEMENT)
tensorboard_log = r'./tensorboard_log/'

model = PPO("CnnPolicy", env, verbose=1,
            tensorboard_log = tensorboard_log)
model.learn(total_timesteps=25000)
model.save("mario_model")
Using cuda device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
Wrapping the env in a VecTransposeImage.
Logging to ./tensorboard_log/PPO_1


D:\software\e_anaconda\envs\pytorch\lib\site-packages\gym_super_mario_bros\smb_env.py:148: RuntimeWarning: overflow encountered in ubyte_scalars
  return (self.ram[0x86] - self.ram[0x071c]) % 256


-----------------------------
| time/              |      |
|    fps             | 116  |
|    iterations      | 1    |
|    time_elapsed    | 17   |
|    total_timesteps | 2048 |
-----------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 81          |
|    iterations           | 2           |
|    time_elapsed         | 50          |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.025405666 |
|    clip_fraction        | 0.274       |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.92       |
|    explained_variance   | 0.00504     |
|    learning_rate        | 0.0003      |
|    loss                 | 0.621       |
|    n_updates            | 10          |
|    policy_gradient_loss | 0.0109      |
|    value_loss           | 17.4        |
-----------------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 73          |
|    iterations           | 3           |
|    time_elapsed         | 83          |
|    total_timesteps      | 6144        |
| train/                  |             |
|    approx_kl            | 0.010906073 |
|    clip_fraction        | 0.109       |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.92       |
|    explained_variance   | 0.0211      |
|    learning_rate        | 0.0003      |
|    loss                 | 0.101       |
|    n_updates            | 20          |
|    policy_gradient_loss | -0.00392    |
|    value_loss           | 0.187       |
-----------------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 69          |
|    iterations           | 4           |
|    time_elapsed         | 117         |
|    total_timesteps      | 8192        |
| train/                  |             |
|    approx_kl            | 0.009882288 |
|    clip_fraction        | 0.0681      |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.9        |
|    explained_variance   | 0.101       |
|    learning_rate        | 0.0003      |
|    loss                 | 0.0738      |
|    n_updates            | 30          |
|    policy_gradient_loss | -0.00502    |
|    value_loss           | 0.13        |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1.01e+04    |
|    ep_rew_mean          | 891         |
| time/                   |             |
|    fps                  | 65          |
|    iterations           | 5           |
|    time_elapsed         | 156         |
|    total_timesteps      | 10240       |
| train/                  |             |
|    approx_kl            | 0.008186281 |
|    clip_fraction        | 0.105       |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.87       |
|    explained_variance   | 0.0161      |
|    learning_rate        | 0.0003      |
|    loss                 | 0.28        |
|    n_updates            | 40          |
|    policy_gradient_loss | -0.00649    |
|    value_loss           | 0.811       |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1.01e+04    |
|    ep_rew_mean          | 891         |
| time/                   |             |
|    fps                  | 64          |
|    iterations           | 6           |
|    time_elapsed         | 190         |
|    total_timesteps      | 12288       |
| train/                  |             |
|    approx_kl            | 0.024062362 |
|    clip_fraction        | 0.246       |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.9        |
|    explained_variance   | 0.269       |
|    learning_rate        | 0.0003      |
|    loss                 | 0.54        |
|    n_updates            | 50          |
|    policy_gradient_loss | 0.0362      |
|    value_loss           | 10.8        |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1.01e+04    |
|    ep_rew_mean          | 891         |
| time/                   |             |
|    fps                  | 63          |
|    iterations           | 7           |
|    time_elapsed         | 225         |
|    total_timesteps      | 14336       |
| train/                  |             |
|    approx_kl            | 0.024466533 |
|    clip_fraction        | 0.211       |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.89       |
|    explained_variance   | 0.839       |
|    learning_rate        | 0.0003      |
|    loss                 | 0.435       |
|    n_updates            | 60          |
|    policy_gradient_loss | 0.023       |
|    value_loss           | 3.06        |
-----------------------------------------
----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 1.01e+04   |
|    ep_rew_mean          | 891        |
| time/                   |            |
|    fps                  | 63         |
|    iterations           | 8          |
|    time_elapsed         | 259        |
|    total_timesteps      | 16384      |
| train/                  |            |
|    approx_kl            | 0.01970315 |
|    clip_fraction        | 0.242      |
|    clip_range           | 0.2        |
|    entropy_loss         | -1.9       |
|    explained_variance   | 0.486      |
|    learning_rate        | 0.0003     |
|    loss                 | 0.526      |
|    n_updates            | 70         |
|    policy_gradient_loss | 0.00486    |
|    value_loss           | 1.57       |
----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1.01e+04    |
|    ep_rew_mean          | 891         |
| time/                   |             |
|    fps                  | 62          |
|    iterations           | 9           |
|    time_elapsed         | 293         |
|    total_timesteps      | 18432       |
| train/                  |             |
|    approx_kl            | 0.012460884 |
|    clip_fraction        | 0.217       |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.87       |
|    explained_variance   | 0.74        |
|    learning_rate        | 0.0003      |
|    loss                 | 0.139       |
|    n_updates            | 80          |
|    policy_gradient_loss | -0.000311   |
|    value_loss           | 0.734       |
-----------------------------------------
----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 1.01e+04   |
|    ep_rew_mean          | 891        |
| time/                   |            |
|    fps                  | 62         |
|    iterations           | 10         |
|    time_elapsed         | 327        |
|    total_timesteps      | 20480      |
| train/                  |            |
|    approx_kl            | 0.02535792 |
|    clip_fraction        | 0.298      |
|    clip_range           | 0.2        |
|    entropy_loss         | -1.88      |
|    explained_variance   | 0.405      |
|    learning_rate        | 0.0003     |
|    loss                 | 1.17       |
|    n_updates            | 90         |
|    policy_gradient_loss | 0.0205     |
|    value_loss           | 6.6        |
----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1.01e+04    |
|    ep_rew_mean          | 891         |
| time/                   |             |
|    fps                  | 62          |
|    iterations           | 11          |
|    time_elapsed         | 361         |
|    total_timesteps      | 22528       |
| train/                  |             |
|    approx_kl            | 0.019694094 |
|    clip_fraction        | 0.243       |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.91       |
|    explained_variance   | 0.952       |
|    learning_rate        | 0.0003      |
|    loss                 | 0.39        |
|    n_updates            | 100         |
|    policy_gradient_loss | -0.00434    |
|    value_loss           | 1.31        |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1.19e+04    |
|    ep_rew_mean          | 884         |
| time/                   |             |
|    fps                  | 61          |
|    iterations           | 12          |
|    time_elapsed         | 398         |
|    total_timesteps      | 24576       |
| train/                  |             |
|    approx_kl            | 0.013096321 |
|    clip_fraction        | 0.227       |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.91       |
|    explained_variance   | 0.0132      |
|    learning_rate        | 0.0003      |
|    loss                 | 0.669       |
|    n_updates            | 110         |
|    policy_gradient_loss | -0.000837   |
|    value_loss           | 1.42        |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1.19e+04    |
|    ep_rew_mean          | 884         |
| time/                   |             |
|    fps                  | 61          |
|    iterations           | 13          |
|    time_elapsed         | 432         |
|    total_timesteps      | 26624       |
| train/                  |             |
|    approx_kl            | 0.014833134 |
|    clip_fraction        | 0.239       |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.9        |
|    explained_variance   | 0.452       |
|    learning_rate        | 0.0003      |
|    loss                 | 18.1        |
|    n_updates            | 120         |
|    policy_gradient_loss | -7.3e-05    |
|    value_loss           | 26.3        |
-----------------------------------------

测试代码

from nes_py.wrappers import JoypadSpace
import gym_super_mario_bros
from gym_super_mario_bros.actions import SIMPLE_MOVEMENT
import time
from matplotlib import pyplot as plt
from stable_baselines3 import PPO
env = gym_super_mario_bros.make('SuperMarioBros-v0')
env = JoypadSpace(env, SIMPLE_MOVEMENT)
model = PPO.load("mario_model")

obs = env.reset()
obs=obs.copy()
done = True
while True:
    if done:
        state = env.reset()
    action, _states = model.predict(obs)
    obs, rewards, done, info = env.step(action)
    obs=obs.copy()
    env.render()

标签:24,loss,2024.2,clip,寒假,time,-----------------------------------------,ep,mean
From: https://www.cnblogs.com/ysk0904/p/18003961

相关文章

  • 2024.2.2日报
    6.1HashShuffle解析以下的讨论都假设每个Executor有1个cpucore。6.1.1HashShuffleManagershufflewrite阶段,主要就是在一个stage结束计算之后,为了下一个stage可以执行shuffle类的算子(比如reduceByKey),而将每个task处理的数据按key进行“划分”。所谓“划分......
  • JXOI2024 日记
    看到大家都在写游记,退役菜鸡也想写一写自己退役的悠闲生活。(关于为什么是日记不是游记:我太啰嗦了)2024/1/21及以前在家睡觉,背单词,打块,玩原神。2024/1/22下雪了,于是鸽了,不去学校。准备在家好好睡一觉。问ZH喵要不要一起出来玩雪,ZH拒绝了,但是TZ喵问大家要不要出来散步,就......
  • AWR1243+DCA100——数据读取(基于mmWave Studio LUA和MATLAB)
    参考文献:[1]扬帆起航:毫米波雷达开发手册之硬件配置[2]使用LUA脚本,通过Matlab控制mmWaveStudio,一键实现DCA1000参数配置和雷达数据采集文献[1]详细介绍了利用mmWaveStudio的lua语言,基于Matlab对雷达板AWR1243进行参数配置和回波数据读取的解决方案,文献[2]是对文献[1]的增补......
  • 7.【2024初三年前集训测试2】
    ���\(\Huge打了一场模拟赛,又垫底了。qwq\)2024初三年前集训测试2T1上海\(0pts\)死因\(__int128\)不支持\(pow\)。事实上我打了一个快速幂(在一千行代码里翻出来就行)。但是我打\(qpow\)时忘打\(q\)了,然后本地运行还没报错……就交上去了之后结果就是,没过编。。。改......
  • 新主机加入k8s 1.24.4集群
    配置静态IP[root@localhost~]#cat/etc/sysconfig/network-scripts/ifcfg-ens33TYPE=EthernetPROXY_METHOD=noneBROWSER_ONLY=noBOOTPROTO=staticDEFROUTE=yesIPV4_FAILURE_FATAL=noIPV6INIT=yesIPV6_AUTOCONF=yesIPV6_DEFROUTE=yesIPV6_FAILURE_FATAL=noIPV6_ADDR......
  • 2024年AI发展趋势的十大预测
    美国《福布斯》发布了《10AIPredictionsFor2024》对2024年AI发展趋势进行了预测。今年AI领域会有哪些变化和发展趋势呢?对企业、开发者、从业人员有哪些影响?1.英伟达将努力成为云服务提供商去年英伟达已经推出了DGXCloud的云服务,今年有可能建立自己的数据中心(DGX......
  • pkuwc & wc 2024 游记
    由于官方认证,\(41\)也要用th,所以下面的日期全部是th。Jan.25th早上早起赶飞机,很早到了机场。早上以为充电宝不能随身携带,只能托运,到机场发现只能随身携带,只好在机场大开特开行李箱取充电宝。登机又得坐摆渡车,太不牛了。飞机延误了大概10分钟,到重庆江北机场大概已经12......
  • 【2024】jmeter分布式压测记录
    一、分布式压测配置分布式压测分为一台master机器和多台slave机器,master机器主要用于控制多台slave机器运行并汇总运行结果。当然,压力机资源紧张时,master机器只做控制机有点浪费,也可以通过配置让master机器也作为施压机,既做控制机又做施压机。1.配置准备工作master配置:jmet......
  • 寒假生活指导25
    #coding:utf8#三种创建DataFramed的方法importpandasaspdfrompyspark.sqlimportSparkSessionfrompyspark.sql.typesimportStructType,StringType,IntegerTypeif__name__=='__main__':#spark=SparkSession.builder.appName("create_df").m......
  • 2024年2月长沙/济南/南京/深圳DAMA-CDGA/CDGP认证报名入口
    2024年度第一季CDGA和CDGP认证考试定于2024年3月17日举行。考试报名现已开启,相关事宜通知如下: —— 考试科目及时间 ——CDGA数据治理工程师:2024年3月17日(周日)14:00-15:40CDGP数据治理专家:2024年3月17日(周日)14:00-16:10——考试地点 —— 考试已确定开放的城市有:北京,上......