题意:为什么要“解开”OpenAI Gym?
问题背景:
I'm trying to get some insights into reinforcement learning while using openAI gym as a learning environment. I do this by reading the book Hands-on reinforcement learning with Python. In this book, some code is provided. Often, the code doesn't work, because I have to unwrap it first, as shown in: openai gym env.P, AttributeError 'TimeLimit' object has no attribute 'P'
我正在尝试通过使用OpenAI Gym作为学习环境来深入了解强化学习。我通过阅读《使用Python的动手强化学习》这本书来实现这一点。在这本书中,提供了一些代码示例。然而,这些代码经常无法直接运行,因为我需要先对它们进行“展开”操作,正如我在尝试访问openai gym env.P
时遇到的AttributeError: 'TimeLimit' object has no attribute 'P'
错误所示。
However, I personally am still interested in the WHY of this unwrapping. Why do you need to unwrap? What does this do exactly? And why isn't it coded like that in the book? Is it outdated software as Giuliov assumed?
然而,我个人仍然对这个展开操作背后的原因感兴趣。你为什么需要展开?这具体做了什么?为什么书中的代码没有这样写?是像Giuliov所想的那样,是过时的软件吗?
Thanks in advance. 提前感谢。
问题解决:
Open AI Gym offers many different environments. Each of them with their own set of parameters and methods. Nevertheless they generally are wrapped by a single Class (like an interface on real OOPLs) called Env
. This class exposes the common most essential methods of any environment, like step
, reset
and seed
. Having this “interface” class is great, because it allows your code to be environment agnostic. It is also makes things easier if you want to test a single agent on different environments.
OpenAI Gym 提供了许多不同的环境。每个环境都有自己的一套参数和方法。然而,它们通常都被一个名为 Env
的单一类(类似于真实面向对象编程语言中的接口)所封装。这个类暴露了任何环境中最常见、最基本的方法,如 step
、reset
和 seed
。拥有这个“接口”类是非常棒的,因为它允许你的代码与环境无关。如果你想要在不同的环境上测试单个代理,这样做也会使事情变得更简单。
However, if you want to access the behind-the.scenes dynamics of a specific environment, then you use the unwrapped
property.
然而,如果你想要访问特定环境的幕后动态(即其内部工作机制),那么你可以使用unwrapped
属性。