ESP32 本地大模型部署语音助手

标签：esp ESP demo ESP32 py IDF 助手语音 idf

ESP32S3 Box 提供了 chatgpt 的 demo。因为访问不了的原因，打算改来做一个本地化部署的专用语音助手。

文章目录

准备工作
语音助手

前置条件：

>=12GB显存的显卡
ESP32S3-Box

准备工作

ESP-IDF 是我用过的最难用的 IDE！

Windows 安装 ESP-IDF

Windows 安装下载release版本的离线安装包就好了，然后进入powershell，找到example里的 Hello world

idf.py set-target esp32s3
idf.py menuconfig
idf.py build
idf.py -p COM3 flash

在这里插入图片描述

WSL 安装 ESP-IDF

配置 IDF

由于 ChatGPT demo 需要指定ESP-IDF的版本或者最新版本（Windows中编译出现了BUG），所以需要在linux环境中手动安装指定版本的 ESP-IDF：

sudo apt-get install git wget flex bison gperf python3 python3-pip python3-venv cmake ninja-build ccache libffi-dev libssl-dev dfu-util libusb-1.0-0

git clone git@github.com:espressif/esp-idf.git
cd esp-idf/

export IDF_GITHUB_ASSETS="dl.espressif.cn/github_assets"
./install.sh esp32s3

# 完成后配置shell环境
. ./export.sh

卡住哪个压缩包下载不下来，就到网页中手动下载，然后它会创建python虚拟环境，到idf-tool.py中找到pip下载的部分，在后面加上-i https://mirrors.aliyun.com/pypi/simple/以加快速度

设置环境变量：

vim ~/.bashrc
alias get_idf='. $HOME/esp/esp-idf/export.sh'

使用get_idf进入python虚拟环境

编译 chatgpt demo

git clone git@github.com:espressif/esp-box.git
cd examples/chatgpt_demo/factory_nvs
idf.py menuconfig

这里设置一下wifi和openAI密钥等信息：

在这里插入图片描述

OpenAI Key：https://platform.openai.com/api-keys

在这里插入图片描述

这里它会手动为IDF-ESP下载submodule，时间比较长，尤其会卡在esp_wifi包，如果卡住了，ctrl-C后继续，它会跳过这个包的更新去继续下载，然后进入menuconfig，但是最后编译的时候会报错，因为找不到库

ninja: error: '/esp-idf/components/esp_wifi/lib/esp32s3/libcore.a', needed by 'factory_nvs.elf', missing and no known rule to make it
ninja failed with exit code 1, output of the command is in the work/esp-box/examples/chatgpt_demo/factory_nvs/build/log/idf_py_stderr_output_18950 and /work/esp-box/examples/chatgpt_demo/factory_nvs/build/log/idf_py_stdout_output_18950

到esp-idf/components/esp_wifi目录下，删除里面所有文件后通过git checkout 将文件还原，可以解决上述这个问题。

也有可能是网好了给我下载下来了，总之，祈祷网是好的。

完成后，开始build

idf.py build

编译chatgpt-demo：

cd examples/chatgpt_demo/
idf.py menuconfig
idf.py build

串口映射

参考：wsl2中访问windows上的串口。windows下载安装：usbipd-win

在WSL中

sudo apt install linux-tools-5.4.0-77-generic hwdata
sudo update-alternatives --install /usr/local/bin/usbip usbip /usr/lib/linux-tools/5.4.0-77-generic/usbip 20

管理员模式下打开Powershell：

$ usbipd list
Connected:
BUSID  VID:PID    DEVICE                                                        STATE
1-1    303a:1001  USB 串行设备 (COM7), USB JTAG/serial debug unit                Not shared

Persisted:
GUID                                  DEVICE
$ usbipd bind --busid 1-1
$ usbipd attach --wsl --busid=1-1
$ usbipd: info: Using WSL distribution 'Ubuntu-20.04' to attach; the device will be available in all WSL 2 distributions.
usbipd: info: Using IP address 172.18.128.1 to reach the host.

WSL中，找到TTY：

$ ls /dev/tty*
/dev/ttyACM0

下载

idf.py -p /dev/ttyACM0 flash

烧写不进去，按住BOOT，再按reset，等待串口烧录。找不到串口时，在powershell重复上述的操作。

语音助手

在这里插入图片描述

Roadmap:

更改demo中的默认界面语言，并且删除与OpenAI相关的代码实现
更新demo的默认语音
更新语音唤醒的提示词
写一个python上位机服务器，接受ESP32的请求，处理完数据后
- 接收ESP的音频
- 下发语音识别结果，即第一串字符
- 下发ollama生成的结果，即第二串字符
- 下发语音生成的结果，即第三段数据
- ESP继续超时等待结果，收到后，播放声音

在这里插入图片描述

调试步骤

Windows 运行 Docker Desktop

打开FunASR

sudo bash work/funasr/funasr-runtime-deploy-online-cpu-zh.sh restart

打开Ollama serve

[Environment]::SetEnvironmentVariable('OLLAMA_HOST', '0.0.0.0:11434', 'Process')
[Environment]::SetEnvironmentVariable('OLLAMA_ORIGINS', '*', 'Process')
ollama serve

运行cosyvoice docker，在Docker Desktop中直接运行
管理员模式打开powershell，绑定 ESP-IDF 串口
```
usbipd attach --wsl --busid=1-1
```

运行 server.py 脚本

python3 \\wsl.localhost\Ubuntu-20.04\home\sjl\work\esp-box\examples\chatgpt_demo\main\server.py

下载烧录

get_idf
idf.py -p /dev/ttyACM0 flash
idf.py -p /dev/ttyACM0 monitor

遇到的问题

在WSL2中，server.py 脚本无法被ESP访问端口

通过转发端口方式，实际操作总是有问题。解决：把server放windows端，带来了下面的问题。

server.py 脚本无法访问 Ollama server

把 VPN 关了。

WSL2部署的 cosyvoice server 无法访问

创建 cosyvoice fastapi docker 容器，先提前下载一个镜像

docker pull pytorch/pytorch:2.0.1-cuda11.7-cudnn8-runtime

按照官方教程编译fasapi container。

cosyvoice fasapi container 无法外部访问

需要将cosyvoice fastapi docker的ip设为0.0.0.0，以让host能够访问，需要下载一个vim。

cosyvoice fastapi container 运行报错

显示为：CUFFT_INTERNAL_ERROR

Bug: Ubuntu on WSL2 - RTX4090 related cuFFT runtime error
CUFFT_INTERNAL_ERROR on RTX 4090
解决：到cosyvoice给的docker里，卸载掉11.7的pytorch cuda：
```
pip uninstall torch torchaudio
pip install torch==2.0.1 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118
```
考虑：提 pull request，在编译容器镜像时一步到位

Windows Server 保存音频时保存

在wsl2的客户端保存音频没问题

raise RuntimeError(f"Couldn't find appropriate backend to handle uri {uri} and format {format}.")
RuntimeError: Couldn't find appropriate backend to handle uri tmp.wav and format None.

在 save 时无法保存音频。解决，安装后端，并且不要使用windows conda，导致 pip 包混乱。

pip3 install ffmpeg soudfile

ESP32 只能录音一部分，后面会丢失

TODO

如何将 audio 写入 ESP32

TODO

标签：esp,ESP,demo,ESP32,py,IDF,助手,语音,idf
From： https://blog.csdn.net/JackSparrow_sjl/article/details/142482023

ESP32 本地大模型部署语音助手

文章目录

准备工作

Windows 安装 ESP-IDF

WSL 安装 ESP-IDF

配置 IDF

编译 chatgpt demo

串口映射

下载

语音助手

调试步骤

遇到的问题

在WSL2中，server.py 脚本无法被ESP访问端口

server.py 脚本无法访问 Ollama server

WSL2部署的 cosyvoice server 无法访问

cosyvoice fasapi container 无法外部访问

cosyvoice fastapi container 运行报错

Windows Server 保存音频时保存

ESP32 只能录音一部分，后面会丢失

如何将 audio 写入 ESP32

相关文章

赞助商

阅读排行