Q-learning 玩maze游戏

时间：2024-05-13 15:57:22浏览次数：24

标签：__ 游戏 self current learning position action maze

import pygame
import numpy as np
import random
import sys

# 定义迷宫环境
class Maze:
    def __init__(self):
        self.size = 10
        self.maze = np.zeros((self.size, self.size))
        self.start = (0, 0)
        self.goal = (9, 9)
        self.maze[4, 2:7] = 1  # 添加墙壁
        self.maze[2, 1] = 1
        self.current_position = self.start
    
    def reset(self):
        self.current_position = self.start
        return self.current_position
    
    def manhattan_distance(self):
        return abs(self.current_position[0] - self.goal[0]) + abs(self.current_position[1] - self.goal[1])

    def step(self, action):
        x, y = self.current_position
        if action == 0:  # 上
            x -= 1
        elif action == 1:  # 右
            y += 1
        elif action == 2:  # 下
            x += 1
        elif action == 3:  # 左
            y -= 1
        if 0 <= x < self.size and 0 <= y < self.size and self.maze[x, y] == 0:
            self.current_position = (x, y)
            if self.current_position == self.goal:
                reward = 100
                done = True
            else:
                reward = -1
                done = False
        else:
            reward = -100
            done = True
        
        # done = self.current_position == self.goal
        return self.current_position, reward, done

    def render(self, screen):
        for x in range(self.size):
            for y in range(self.size):
                color = (255, 255, 255) if self.maze[x, y] == 0 else (0, 0, 0)
                if (x, y) == self.current_position:
                    color = (0, 255, 0)
                if (x, y) == self.goal:
                    color = (255, 0, 0)
                pygame.draw.rect(screen, color, (y*40, x*40, 40, 40))
        pygame.display.flip()

# Q-learning
class QLearning:
    def __init__(self, env):
        self.env = env
        self.q_table = np.zeros((env.size, env.size, 4))
        self.gamma = 0.9
        self.epsilon = 0.1
        self.alpha = 0.1

    def select_action(self, state):
        if random.random() < self.epsilon:
            return random.randint(0, 3)
        else:
            x, y = state
            return np.argmax(self.q_table[x, y])

    def update(self, state, action, reward, next_state):
        x, y = state
        nx, ny = next_state
        future_rewards = np.max(self.q_table[nx, ny])
        self.q_table[x, y, action] += self.alpha * (reward + self.gamma * future_rewards - self.q_table[x, y, action])

# 主程序
def main():
    pygame.init()
    screen = pygame.display.set_mode((400, 400))
    clock = pygame.time.Clock()
    maze = Maze()
    agent = QLearning(maze)

    for episode in range(10000):
        state = maze.reset()
        done = False
        while not done:
            action = agent.select_action(state)
            next_state, reward, done = maze.step(action)
            agent.update(state, action, reward, next_state)
            state = next_state

            for event in pygame.event.get():
                if event.type == pygame.QUIT:
                    pygame.quit()
                    sys.exit()

            if episode >= 8000:
                screen.fill((0, 0, 0))
                maze.render(screen)
                clock.tick(10)

if __name__ == '__main__':
    main()

运行效果：

标签：__,游戏,self,current,learning,position,action,maze
From： https://www.cnblogs.com/LiuXinyu12378/p/18189399

DQN玩cartpole游戏
importgymimporttorchimporttorch.nnasnnimporttorch.optimasoptimimportrandomimportpygameimportsysfromcollectionsimportdeque#定义DQN模型classDQN(nn.Module):def__init__(self):super(DQN,self).__init__()self.netwo......
actor critic 玩carpole游戏
importgymimporttorchimporttorch.nnasnnimporttorch.optimasoptimimportpygameimportsys#定义Actor网络classActor(nn.Module):def__init__(self):super(Actor,self).__init__()self.fc=nn.Sequential(nn.Linea......
C120 树剖+李超树 P4069 [SDOI2016] 游戏
视频链接：C120树剖+李超树P4069[SDOI2016]游戏_哔哩哔哩_bilibili D12LuoguP3384【模板】轻重链剖分/树链剖分-董晓-博客园(cnblogs.com) LuoguP4069[SDOI2016]游戏//树剖+李超树O(nlognlognlogn)#include<iostream>#include<cstring>#in......
45_jump Game II 跳跃游戏II
45_jumpGameII跳跃游戏II问题描述链接：https://leetcode.com/problems/jump-game-ii/description/Youaregivena0-indexedarrayofintegersnumsoflengthn.Youareinitiallypositionedatnums[0].Eachelementnums[i]representsthemaximumlengthofafo......
DirectX 12 Ultimate 是微软在 DirectX 12 API 的基础上推出的一个新版本，它旨在为游戏
DirectX12Ultimate是微软在DirectX12API的基础上推出的一个新版本，它旨在为游戏开发者提供更多的功能和支持，同时也为玩家带来更出色的游戏体验。下面我将简要介绍一下DirectX12Ultimate的特点和重要性：支持最新硬件特性：DirectX12Ultimate支持最新的硬件特性，包......
策略梯度玩 cartpole 游戏，强化学习代替PID算法控制平衡杆
cartpole游戏，车上顶着一个自由摆动的杆子，实现杆子的平衡，杆子每次倒向一端车就开始移动让杆子保持动态直立的状态，策略函数使用一个两层的简单神经网络，输入状态有4个，车位置，车速度，杆角度，杆速度，输出action为左移动或右移动，输入状态发现至少要给3个才能稳定一会儿，给2个完全学不明白，......
Python游戏制作大师，Pygame库的深度探索与实践
写在前言hello，大家好，我是一点，专注于Python编程，如果你也对感Python感兴趣，欢迎关注交流。希望可以持续更新一些有意思的文章，如果觉得还不错，欢迎点赞关注，有啥想说的，可以留言或者私信交流。如果你想看什么主题的文章，欢迎留言交流，关注公众号【一点sir】，领取编程资料。如果你还不了......
55-jump Game 跳跃游戏
问题描述Youaregivenanintegerarraynums.Youareinitiallypositionedatthearray'sfirstindex,andeachelementinthearrayrepresentsyourmaximumjumplengthatthatposition.Returntrueifyoucanreachthelastindex,orfalseotherwise解释......
[附源码+文档]Java Swing小游戏源码合集(14款)_毕业设计必选项目
(小众游戏塔防迷宫动作剧情类等)16款游戏源码Javaswing五子棋联网版源代码Javaswing贪吃蛇游戏开发教程+源码Javaswing超级玛丽游戏Javaswing俄罗斯方块项目源码Javaswing飞机大战游戏源码Javaswing雷电游戏源码Javaswing连连看游戏源码Javaswing模拟写字板源码......
AI已来，我与AI一起用Python编写了一个消消乐小游戏
在数字化与智能化的浪潮中，目前AI（人工智能）几乎在各行各业中发挥了不可忽略的价值，今天让我们也来体验一下AI的威力：我通过命令，一步一步的教AI利用Python编程语言打造了一款富有创意和趣味性的消消乐小游戏……本文Python消消乐游戏源代码：https://gitee.com/obullxl/Pytho......

Q-learning 玩maze游戏

相关文章

赞助商

阅读排行