dm_control: DeepMind Infrastructure for Physics-Based Simulation
DeepMind的软件堆栈,用于基于物理的模拟和强化学习环境,使用MuJoCo物理。
1、基准任务
for domain_name, task_name in suite.BENCHMARKING: print(domain_name, task_name) env = suite.load(domain_name, task_name) action_spec = env.action_spec() time_step = env.reset() action = np.random.uniform(action_spec.minimum, action_spec.maximum, size=action_spec.shape) time_step = env.step(action) print(time_step.last(), time_step.reward, time_step.discount, time_step.observation)
- acrobot swingup
- orientations: array(4)
- velocity: array(2)
- orientations: array(4)
- acrobot swingup_sparse
- orientations: array(4)
- velocity: array(2)
- ball_in_cup catch
- position: array(4)
- velocity: array(4)
- cartpole balance
- position: array(3)
- velocity: array(2)
- cartpole balance_sparse
- position: array(3)
- velocity: array(2)
- cartpole swingup
- position: array(3)
- velocity: array(2)
- cartpole swingup_sparse
- position: array(3)
- velocity: array(2)
- cheetah run
- position: array(8)
- velocity: array(9)
- finger spin
- position: array(4)
- velocity: array(3)
- touch: array(2)
- finger turn_easy
- position: array(4)
- velocity: array(3)
- touch: array(2)
- target_position: array(2)
- dist_to_target: float
- finger turn_hard
- position: array(4)
- velocity: array(3)
- touch: array(2)
- target_position: array(2)
- dist_to_target: float
- fish upright
- joint_angles: array(7)
- upright: float
- velocity: array(13)
- joint_angles: array(7)
- fish swim
- joint_angles: array(7)
- upright: float
- target: array(3)
- velocity: array(13)
- joint_angles: array(7)
- hopper stand
- position: array(6)
- velocity: array(7)
- touch: array(2)
- position: array(6)
- hopper hop
- position: array(6)
- velocity: array(7)
- touch: array(2)
- position: array(6)
- humanoid stand
- humanoid walk
- humanoid run
- manipulator bring_ball
- pendulum swingup
- point_mass easy
- reacher easy
- reacher hard
- swimmer swimmer6
- swimmer swimmer15
- walker stand
- walker walk
- walker run