自测kaldi搭建语音识别模型

标签：自测 kaldi spk2utt file wav path txt 搭建

https://blog.csdn.net/weixin_42264992/article/details/125395239?ops_request_misc=&request_id=&biz_id=102&utm_term=link%20-s%20$kaldi_root/egs/wsj/s5&utm_medium=distribute.pc_search_result.none-task-blog-2~all~sobaiduweb~default-8-125395239.142^v51^control,201^v3^control_2&spm=1018.2226.3001.4187

-SA版权协议，转载请附上原文出处链接及本声明。原文链接：https://blog.csdn.net/weixin_42264992/article/details/125395239

前言
Kaldi是当前最流行的开源语音识别工具(Toolkit)，它使用WFST来实现解码算法。Kaldi的主要代码是C++编写，在此之上使用bash和python脚本做了一些工具。而实时识别系统的好坏取决于语音识别的性能，语音识别包含特征提取、声学模型、语言模型、解码器等部分。Kaldi工具箱集成了几乎所有搭建语音识别器需要用到的工具。

step1、下载源代码
git clone https://github.com/kaldi-asr/kaldi.git
1
step2、数据预处理，准备材料
wav.scp
id 存储位置
001 /tmp/dataset/wav_file1.wav
text （没错，就是没有文件类型后缀）
id 文本内容
001 我爱我的祖国
utt2spk
id 发音人
001 speaker1
spk2utt
发音人 ids
speaker1 001, 002,003
Kaldi有工具，当你已经有文件3时，可自动生成4。

参考处理代码如下：

import argparse
import os

def prepare_inf(txt_path,wav_path,output_path):
'''
生成wav.scp(id,wav_path); text(id,文本); utt2spk(id,speaker); utt2spk(speaker,id)
id是文件名，没有后缀！

:param txt_path: text file path
:param wav_path: wav file path
:param output_path: output the inf file (wav.scp, text, utt2spk, spk2utt)
:return: None
'''

wav_scp=[]
text=[]
utt2spk=[]
spk2utt={}

#wav.scp(id,wav_path)
for file in os.listdir(wav_path):
file_name,_=file.split('.')

wav_file=os.path.join(wav_path,file)
txt_file=os.path.join(txt_path,file_name+'.txt')

if file[-3:] == 'wav':

#wav.scp
wav_scp.append(file_name+' '+wav_file+'\n')

#text
if os.path.exists(txt_file):
txt=open(txt_file,'r',encoding='utf-8').read().replace('\n',' ').strip()
text.append(file_name+' '+txt+'\n')

spk=file_name[:8]
#utt2spk
utt2spk.append(file_name+' '+spk+'\n')
#spk2utt
if spk not in spk2utt:
spk2utt[spk]=[]
spk2utt[spk].append(file_name)

with open(os.path.join(output_path,'wav.scp'),'w',encoding='utf-8') as writer:
writer.writelines(wav_scp)
with open(os.path.join(output_path,'text'),'w',encoding='utf-8') as writer:
writer.writelines(text)
with open(os.path.join(output_path, 'utt2spk'), 'w', encoding='utf-8') as writer:
writer.writelines(utt2spk)
with open(os.path.join(output_path, 'spk2utt'), 'w', encoding='utf-8') as writer:
for item in spk2utt.keys():
writer.write(str(item)+' '+' '.join(spk2utt[item])+'\n')

if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--txt_path', type=str, default='/home/nfs/datasets/mediaspeech-es')
parser.add_argument('--wav_path', type=str,default='/home/nfs/datasets/mediaspeech-es')
parser.add_argument('--output_path', type=str,default='/home/wangyuke_i/asr_test/mykaldi/inf')
args = parser.parse_args()

prepare_inf(args.txt_path,args.wav_path,args.output_path)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
step3、配置环境
将kaldi/egs/aishell/s5/path.sh复制到自己的项目路径下，并将export KALDI_ROOT改为自己的kaldi路径
1
source运行:

. path.sh
1
会生成不可见文件：prepare_lang.sh（pre +tab键+tab键显示）
至此，path配置成功

ps: 如果没有生成prepare_lang.sh：

. /kaldi/egs/wsj/s5/path.sh
1
查看/kaldi/egs/wsj/s5/路径下有prepare_lang.sh（pre +tab键+tab键显示）则表示成功。

step4、编译tools：
vim /kaldi/INSTALL
1
看到官方的安装指引，分三步安装：

nlpmaster@NLP-2021-server:/home/nfs/yuke$ vim select_wav.py
nlpmaster@NLP-2021-server:/home/nfs/yuke$ vim select_wav.py
nlpmaster@NLP-2021-server:/home/nfs/yuke$ cd ..

nlpmaster@NLP-2021-server:/home/nfs$ ls
This is the official Kaldi INSTALL. Look also at INSTALL.md for the git mirror installation.
[Option 1 in the following does not apply to native Windows install, see windows/INSTALL or following Option 2]

Option 1 (bash + makefile):

Steps:
(1)
go to tools/ and follow INSTALL instructions there.

(2)
go to src/ and follow INSTALL instructions there.

Option 2 (cmake):

Go to cmake/ and follow INSTALL.md instructions there.
Note, it may not be well tested and some features are missing currently.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
按如下命令依次执行即可：

cd kaldi/tools/
extras/check_dependencies.sh
1
2
报错
如果没有报错，跳过这一节。
出现报错，按照提示安装相关api即可,可以直接运行官方命令安装依赖包：

sh extras/install_mkl.sh
1
也可以自己安装报错的依赖包：

yum install gcc-c++ make automake autoconf patch bzip2 unzip wget sox gcc-gfortran libtool subversion python3 zlib-devel zlib-devel.x86_64 0:1.2.7-18.el7
1
ps：MKL包会安装了也识别不了仍然提示安装，因为默认安装最新版的，有大佬说有问题，因此到官网下载2017 or 2018版的。

完成之后按官方文档继续走：
cd /kaldi/tools/
make -j 4
1
2
会自动检查环境是否齐全，如果有报错表明有环境没安装好，根据报错信息安装。

怎么检查配置成功？

cd /kaldi/tools/
输入fst，狂按tab键
1
2
出现如下界面表明安装成功：有很多fst开头的文件，特别是fstcompile文件，基本表明安装成功！

编译src
cd src
./configure --shared
make depend -j 8
make -j 8
1
2
3
4
BUG报错：

***configure failed: CUDA 10_1 does not support c++ (g++-9). Only versions strictly older than 9.0 are supported. ***
1
解决办法：
1、cuda路径参考博客：https://blog.csdn.net/accumulating_mocai/article/details/110006111
2、降低gcc版本：https://zhuanlan.zhihu.com/p/453542931
————————————————
版权声明：本文为CSDN博主「飞扬々岁月」的原创文章，遵循CC 4.0 BY-SA版权协议，转载请附上原文出处链接及本声明。
原文链接：https://blog.csdn.net/weixin_42264992/article/details/125395239

标签：自测,kaldi,spk2utt,file,wav,path,txt,搭建
From： https://www.cnblogs.com/wcxia1985/p/16753564.html

自测kaldi搭建语音识别模型

相关文章

赞助商

阅读排行