首页 > 其他分享 >Tacotron-2 实验记录

Tacotron-2 实验记录

时间:2024-02-16 23:13:11浏览次数:35  
标签:github Tacotron 记录 -- WaveNet 实验 https com

https://blog.csdn.net/u013625492/article/details/100155542

Try the Std Version
1. Get Tacotron-2-master.zip from https://github.com/Rayhane-mamah/Tacotron-2

2.Unzip Tacotron-2-master.zip on Unbuntu

3.Terminal: cp -r training_data ./Tacotron-2 #training_data is folder which was preparing by LJSpeech-1.1 & dataset

4.Terminal: python train.py --model='Tacotron-2':

CancelledError (see above for traceback): Enqueue operation was cancelled
[[Node: datafeeder/eval_queue_enqueue = QueueEnqueueV2[Tcomponents=[DT_INT32, DT_INT32, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](datafeeder/eval_queue, _arg_datafeeder/inputs_0_1, _arg_datafeeder/input_lengths_0_0, _arg_datafeeder/mel_targets_0_3, _arg_datafeeder/token_targets_0_6, _arg_datafeeder/linear_targets_0_2, _arg_datafeeder/targets_lengths_0_5, _arg_datafeeder/split_infos_0_4)]]


Traceback (most recent call last):
File "train.py", line 138, in <module>
main()
File "train.py", line 132, in main
train(args, log_dir, hparams)
File "train.py", line 57, in train
raise('Error occured while training Tacotron, Exiting!')
TypeError: exceptions must derive from BaseException
Maybe this wrong is caused by gpu collide,

Write code:

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "6, 7"
Then it can be training.(step1, 2, 3......)

(Befor it , sys need to get conda env including requirment. It's not a easy thing. Here need to add)

Then the time of trainnint this batch(32) is 4.5 sec, although use two gpus, but the memory seems to only use the forst one.

(Need to test gpu_nums = 4, and use 4gpus; And set different batch_size, and training steps)

###from now is test

File:train.py

'5, 6'

File:hyparam.py

tacotron_num_gpus = 2

tacotron_batch_size = 32 * 2

parser.add_argument('--summary_interval', type=int, default=250,
help='Steps between running summary ops')
parser.add_argument('--embedding_interval', type=int, default=5000,
help='Steps between updating embeddings projection visualization')
parser.add_argument('--checkpoint_interval', type=int, default=2500,
help='Steps between writing checkpoints')
parser.add_argument('--eval_interval', type=int, default=5000,
help='Steps between eval on test data')
change =>

parser.add_argument('--summary_interval', type=int, default=1000,
help='Steps between running summary ops')
parser.add_argument('--embedding_interval', type=int, default=5000,
help='Steps between updating embeddings projection visualization')
parser.add_argument('--checkpoint_interval', type=int, default=1000,
help='Steps between writing checkpoints')
parser.add_argument('--eval_interval', type=int, default=5000,
help='Steps between eval on test data')



Train successfully, but batch_size = 64 make the single step's time = 5, don't know wether this is good for train both 10w steps compared with batch_size = 32, single gpu.

###end here

Now let's just wait.

After steps 25000, suddenly, it occurs the error: data feeder...

Just reuse terminal python train --model=' Tacotron-2', then it can work.

Don't know the reason, maybe test later. But for now, if it happens again, decrease steps between the save model by half.

Happened again, decrease to 250 steps. Don't know why, maybe is gpu_nums problem. And again, find gpus were used, maybe because this. Error is:

 

change GPUs from '5, 6' => '3, 4'

But still error:

 

改为默认的gpu_num = 1, batch_size = 32, "6"

5. ToDo. Error cause training stoped happens every day, needs to write sh to restart and need to use the free GPUs.

6. Tacotron-2-log/wav 's waves are better then eval/wav's, read code to see why. Maybe the text unseen in the training data. All of this is no teacher forced. Finished: wavs is in training set, eval/waves is in testing set, if all is teacherforced, then just for avoiding overfitting.

7. Liang dada's picture paper about teacher forced rate. Finished: teacher_forcing-mode in haparm.py, but not know it's details.

8. optimize

(Song Changhe) 可能是特征这块出现问题, (这一步主要是为了验证加入线性谱的影响, 加入后线性谱介于MEL和WaveNet之间, 但音质还是不行, 同时看看线性谱对loss的影响, 到底是好是坏?)把提取出来的线谱直接用GL恢复一下, 看看有没有颤音 来排除taco2的原因.
严格按照https://github.com/Rayhane-mamah/Tacotron-2, python train.py --model='Tacotron-2'
tacotron_num_gpus = 1. wavenet_num_gpus = 1. split_on_cpu = False, 改成这个不会出错
tensorflow.python.framework.errors_impl.InternalError: Failed to create session.
maybe out of GPU memory? Try running with CUDA_VISIBLE_DEVICES=''
speaker分为0, 1, 2, 3是怎么回事, 数据集明明没有多说话人.
Ground Truth Aligned synthesis (DEFAULT: the model is assisted by true labels in a teacher forcing manner). This synthesis method is used when predicting mel spectrograms used to train the wavenet vocoder. (yields better results as stated in the paper) 没理解, 不过平时也不用, 先跳过.
CUDA_VISIBLE_DEVICES='' python synthesize.py --model='Tacotron-2' --mode='live' 或者增加GPU (这样跑出来的是mel转wav, 可以跑eval)
接着训练, 跑完全部的.
训练速度上, 去掉线性谱, outputs_per_step调大, 会加快. 并行也再看看, teacherForce Ratio也会影响收敛速度.
(Song Changhe) mel到线谱这个网络会添加比较大的扰动, 训练的时候, 最好一般是这个网络都不会使用的, 直接用mel位置的输出.
平时不带线性谱, 临时听效果就听Mel的, 仔细听效果用WaveNet.
其实WaveRNN版本真心不赖.
(Song Changhe) 数据集应该问题不大, taco2在LJSpeech上是可以的.(!!!!需要尝试)
(Song Changhe) 如果这样还不行的话, 可以考虑换个声码器, GL的一个缺点在实际应用的时候就是会出现高频的杂音, 非常明显. 尝试直接使用mel谱恢复语音, 用Merlin, WaveNet, WaveRNN, LPCNET, ClariNet, Voiceloop, Transformer TTS.
WaveNet. 使用https://r9y9.github.io/wavenet_vocoder/ 的版本, 直接在colaboratory上跑, https://colab.research.google.com/github/r9y9/Colaboratory/blob/master/Tacotron2_and_WaveNet_text_to_speech_demo.ipynb. Rayhane-mamah的Tacotron2 (Tensorflow) 版本, 以及r9y9/wavenet_vocoder (Pytorch)版本, pre_train model 189k steps and over 1000k steps. @Lab10: test_t2_wavenet + Tacotron-2 + wavenet_vocoder. 但是服务器环境conda tf1.1中pysptk (静北师兄帮忙配的). 目前在colab跑, 环境不矛盾 (ubuntu的原因? conda的原因?) 可以合成声音, 但是waveNet合成速度非常慢. "This is text-to-speech online demonstration by Tacotron 2 and WaveNet", 需要15min大约 (colab默认的GPU, 但代码对GPU的利用不知道). (WaveNet慢, 不知道用没用Fast优化, 以及并行计算版本)
WaveNet. 较快的WaveNet并行训练版本: https://github.com/andabi/parallel-wavenet-vocoder (声音质量太差, 但这个思路是个突破口)
WaveNet. 日本人实现的, 公司开源的版本, 看不懂WaveNet版本, 而且没有公布速度, 但是有colab的代码, 代码测试速度33919 samples 60s以上, https://github.com/kan-bayashi/PytorchWaveNetVocoder. (WaveNet慢, 没有仔细看)
WaveNet. https://github.com/NVIDIA/nv-wavenet 英伟达版本, git的说明写的不好, 但应该是最快的waveNet版本, 毕竟有工程化的角度. 但是目前没有去尝试, 不熟悉这部分代码和他的部署. (没有实验过)
尝试LPCNET.
https://github.com/mozilla/LPCNet
The code also supports very low bitrate compression at 1.6 kb/s.
The same functionality is available in the form of a library. See include/lpcnet.h for the API.
https://people.xiph.org/~jm/demo/lpcnet_codec/
https://zhuanlan.zhihu.com/p/54952637
需要有个转换网络, 或者直接用LPC的特征训练, 因为和使用pretrained Model不太合适, (等掌握了局部图变量赋值)
https://github.com/alokprasad/LPCTron 尝试代码, 长河在看.
ClariNet
https://github.com/ksw0306/ClariNet
https://github.com/ksw0306/FloWaveNet
demo不错, 介绍有点并行WaveNet的感觉, 没细看, 也没有跑代码, 等有时间再说吧, 也没有Pre-trained Model
WaveRNN.比WaveNet本身就快, 但是还不成熟. 尝试了一个: https://github.com/fatchord/WaveRNN
Merlin应该指的传统的SSPS吧, 单独作为Mel vocoder不知道怎么样. 问下长河.
CNN的Tacotron, 不主流, 但听说训练快, 汉语合成不错. https://github.com/ruclion/dc_tts
VoiceLoop, 应该用的是world, 没看. https://github.com/facebookarchive/loop
最初原始的Tacotron1, 并且有详细的数据和评测. https://github.com/kyubyong/tacotron
Transformer TTS, 等翔宇的调查和思考实验. https://github.com/soobinseo/Transformer-TTS
(Song Changhe) 音质的明显问题调参数是解决不了的, 没动默认参数, 更不可能是调参的问题.
(Song Changhe) 其实还有个英伟达的pytorch版本, 用的waveglow, 那个跑过, 不靠谱, 效果不好, 他给taco2做了一些简化, 并不是完全复现.
(Wu Xixin) 得用下wavenet, wavernn去进一步提高音质.
https://github.com/NVIDIA/tacotron2
WaveGlow: https://developer.download.nvidia.com/video/gputechconf/gtc/2019/presentation/s91022-text-to-speech-overview-of-the-latest-research-using-tacotron2-and-waveglow-with-tensor-core-performance.pdf
https://ngc.nvidia.com/catalog/model-scripts/nvidia:tacotron_2_and_waveglow_for_pytorch
(Wu Xixin) Tacotron1仍然能跑出来较好声音. 训练时间也是大约4天.
不知道要怎么测试, 因为T2比T1是"进步"了.
(Wu Xixin) 训练到到达10w步, 再去看声音, 一锤定音, 而且到时候应该会好很多.
(Wu Xixin) 训练的时候, 第一个epoch训练样本从短到长排列, 然后训练, 第二个epoch开始再打乱. (不是特别清楚...), 现在是先排序, 再分batch, 然后随机.
(Liu Liangqi) Rayham那个版本的tacotron2中的wavenet写的有问题, 跑的效果不好.
尝试直接全部开源Rayham + WaveNet, 以及100W+steps的pre_trained model.
(Liu Liangqi) GL的话, 合成的时候prenet 有加dropout吗? 不加的话, 如果模型训得过拟合会有噪音的情况.
hypama.py中有droupout的设置, 应该是对的.
(Liu Liangqi) 尝试去年10月份那个版本.
1. 重新处理数据: python preprocess.py --base_dir '/home/data/LJSpeech-1.1/' 或者 cp -r ../Tacotron-2-master/training_data ./ 或者 train时候指定文件目录. 最终还是选了最保险的下载数据集, tar -jxvf ×××.tar.bz2, 然后用preprocess "python preprocess.py" 重新提取.
2. 训练参数的指定

3. python train.py --model='Tacotron'
(Liu Liangqi) 要是之后想用wavenet去合成就不带, 否则用GL看效果的话都会带上线性谱.
Try to Synthesis
ValueError: Defined synthesis batch size 1 is smaller than minimum required 2 (num_gpus)! Please verify your synthesis batch size choice.
num_gpu = 1

CUDA_VISIBLE_DEVICES='' python synthesize.py --model='Tacotron-2' --mode='live' 或者增加GPU

Try to Train CN version
From https://github.com/awesome-archive/tacotron_cn down zip

conda tf1.10-pt1.10

Then pip install pypinyin

Try bash train.sh, but have not set the right path and training_data, so just stop now.

Wait for Xingchen's successful version.

Try Pytorch Version Tacotron2
1. "nvcc -V" to see cuda vesion

2. install pytorch1.2

conda install pytorch torchvision cudatoolkit=10.0 -c pytorch !!!!!!!!!Always Time out.
3. git clone https://github.com/NVIDIA/tacotron2.git

4.mv tacotron Pt-Tacotron

5.git submodule init; git submodule update

6.sed -i -- 's,DUMMY,/home/data/LJSpeech-1.1/wavs,g' filelists/*.txt
————————————————

版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。

原文链接:https://blog.csdn.net/u013625492/article/details/100155542

标签:github,Tacotron,记录,--,WaveNet,实验,https,com
From: https://www.cnblogs.com/wcxia1985/p/18017612

相关文章

  • [Flows] 网络流做题记录
    SomeTemplatesMaximumFlow(MinimumCut)template<classT,intN,intM>structMaxFlow{intcnt=1,head[N],nxt[M<<1],node[M<<1],dep[N],cur[N],n,flow_sink;Tflows[M<<1];voidadd_edge(intu,intv,Tw){......
  • Tacotron2语音合成
    Tacotron2语音合成    Tacotron2是由GoogleBrain提出来的一个语音合成框架.模型架构:机器环境:在Ubuntu16.04Ubuntu16.04GPUGeForceRTX2080(单个GPU)TensorFlow1.15cuda10.0cudnn7.6.3下完成.github上有一个Tacotron-2的Tensorflow实现,地址https://github.co......
  • Unity 类胡闹厨房游戏 KitchenChaos 阶段1整理记录
    原教程地址:https://youtu.be/AmGSEH7QcDg部分代码:usingSystem.Collections;usingSystem.Collections.Generic;usingUnityEngine;publicclassPlayerAnimator:MonoBehaviour{privateconststringIS_WALKING="IsWalking";[SerializeField]priv......
  • 网络实验
    因为网络的节点之间有影响力,比如假如对一个用户做投放会影响到另一个用户,因此如果单纯用随机实验的方式,特别是当用户量小的时候,会存在一些误差。因此facebook在一篇paper里面就介绍了,在这种情况下,可以先将用户做cluster,然后再把cluster作为实验分组的基本单元,确保实验组和对照组之......
  • 【记录】 unity插件 Addressables
    介绍Addressables是Unity官方推出的用于资源热更的系统,可在PackageManager里面下载。安装可在PackageManager里面下载、安装即可使用配置Addressables配置使用基础Addressables使用远程分发Addressables远程分发......
  • DP 做题记录
    复杂DP各种巨大DP。CF123CBrackets题意:括号数组是一个只有“(”或“)”两类字符的二维数组。括号数组中的合法路径只能从任意位置开始,向右或向下移动。如果一个n×m括号数组中从(1,1)到(n,m)的所有路径经过的字符构成的字符串均为可以完全匹配的括号序列,则这个括号......
  • css table 设置记录
     td,th{padding:3px7px2px7px;font-weight:bold;--blue:#007bff;--indigo:#6610f2;--purple:#6f42c1;--pink:#e83e8c;--red:#dc3545;--orange:#fd7e14;--yellow:#ffc107;--green:#28a745;--teal:#......
  • NJU PA4.1记录
    PA4-虚实交错的魔法:分时多任务多道程序上下文切换内核线程实现上下文切换(1)首先是kcontext(),理解讲义之后我们会发现其实很简单,就是让我们创建一个Context*cp指向所给的栈底位置,然后把entry填入Context的mepc中,为了后续在__am_asm_trap中mret时会返回到f函数中,这里的减4是为......
  • C代码实践——《Head first C》C语言实验室2
    目录任务要求完成过程Step1.安装OpenCVStep2.配置环境变量Step3.配置编译环境Step4.编写程序代码Step5.测试运行调整反思、总结、收获最终程序代码任务要求入侵者检测器计算机用摄像头持续监测周围环境,当检测到有物体在移动时就会把当前捕捉到的图像保存为文件。完成过程Step......
  • 实验 2 Scala 编程初级实践
    参考博客——https://www.cnblogs.com/kt-xb/p/12297023.html Linux—— 进入Scala所在目录,创建文件夹mycode 赋予文件夹权限 chmod-R777文件夹所在目录 创建test.scala,输入代码,执行文件代码内容——importscala.io.StdInobjecttest{    defmain(arg......