whisper-large-v3：速度快的令人翻译模型三种实用的调用方法

时间：2024-03-19 20:04:48浏览次数：53

标签：-- whisper insanely pipx fast large v3

1、whisper-large-v3 是openai公司的模型，可使用Python代码调用；

2、whisper-large-v3基础上chenxwh 制作了开源库insanely-fast-whisper ，可本地指令运行，或 Google Colab T4 GPU 运行；

3、以上两个模型应用，如果觉得使用复杂难度大，国内软件工程师制作了更简单的版本fast-whisper3。

以下分三部分介绍：

开源库

insanely-fast-whisper

https://github.com/chenxwh/insanely-fast-whisper

使用 OpenAI 的 Whisper Large v3 在不到 98 秒的时间内转录 300 分钟（5 小时）的音频。

insanely-fast-whisper

本地安装

1、pip install pipx

2、pipx install insanely-fast-whisper

默认路径：

C:\Users\Administrator\AppData\Local\pipx\pipx\venvs\insanely-fast-whisper

3、运行（cmd）

- 路径：

C:\Users\Administrator\AppData\Local\pipx\pipx\venvs\insanely-fast-whisper\Scripts

- 指令：

insanely-fast-whisper --file-name e:\huang.mp3

usage: insanely-fast-whisper.exe [-h]

[--file-name FILE_NAME ]

[--device-id DEVICE_ID]

[--transcript-path TRANSCRIPT_PATH]

[--model-name MODEL_NAME]

[--task {transcribe,translate}]

[--language LANGUAGE]

[--batch-size BATCH_SIZE]

[--flash FLASH]

[--timestamp {chunk,word}]

[--hf_token HF_TOKEN]

[--diarization_model DIARIZATION_MODEL]

二另一个分享

在开源whisper上加工，打包的翻译软件

更简单，更方便

large-v3

夸克网盘链接：https://pan.quark.cn/s/82b36b6adfa7提取码：JsyQ

Whisper

命令行使用

（例）

1、使用medium模型转录音频文件中的语音：

whisper audio.flac audio.mp3 audio.wav --model medium

2、指定语言--language

whisper japanese.wav --language Japanese（cantonese)

3、--task translate会将演讲翻译成英语

whisper japanese.wav --language Japanese --task translate

4、查看所有可用选项

whisper --help

5、python使用

import whisper

model = whisper.load_model("base")

result = model.transcribe("huang.mp3")

print(result["text"])

6、python安装whisper

pip install -U openai-whisper

或者

7、python 升级whisper

pip install --upgrade --no-deps --force-reinstall git+https://github.com/openai/whisper.git

pip install git+https://github.com/openai/whisper.git

参考文献：

1、《【本地开源】whisper-large-v3：速度快得令人难以置信的翻译模型，分享三种实用的调用方法》作者：万能君软件库

标签：--,whisper,insanely,pipx,fast,large,v3
From： https://blog.csdn.net/TechTornado/article/details/136853142

深度学习-卷积神经网络-目标检测YOLO-v3-推理运行-55
目录代码仓地址：https://github.com/qqwweee/keras-yolo3.git创建虚拟环境三方件安装Python3.7.5conda4.1.6pip3uninstallkeras-nightlypip3uninstall-ytensorflowpip3installkeras2.1.6pip3installtensorflow1.15.0pip3installh5py==2.10.0pipinsta......
解决go项目引进etcd/clientv3的一个报错grpc版本过高的两个方法
在使用etcd做服务注册和发现，编译时提示下列错误：#github.com/coreos/etcd/clientv3/balancer/pickerF:\Go\pkg\mod\github.com\coreos\[email protected]+incompatible\clientv3\balancer\picker\err.go:37:44:undefined:balancer.PickOptionsF:\Go\pkg\mod\github.com\coreos\e......
CF933-Div3
D-RudolfandtheBallGame深搜+减枝点击查看代码#include<bits/stdc++.h>#definelllonglongusingnamespacestd;constintN=1005;intT,n,m,x;boolans[N];intu[N],v[N],tot;boolstep[N][N];voiddfs(intnow,intpos){ if(step[now][pos]==1) { re......
Editing Factual Knowledge and Explanatory Ability of Medical Large Language Mode
本文是LLM系列文章，针对《EditingFactualKnowledgeandExplanatoryAbilityofMedicalLargeLanguageModels》的翻译。医学大语言模型的编辑事实知识与解释能力摘要1引言2相关工作3方法4实验5结论6局限性摘要模型编辑旨在精确地修改大型语言模型......
Jailbreaking Large Language Models in Few Queries via Disguise and Reconstructio
本文是LLM系列文章，针对《MakingThemAskandAnswer:JailbreakingLargeLanguageModelsinFewQueriesviaDisguiseandReconstruction》的翻译。让他们问答：通过伪装和重建在少数查询中打破大型语言模型的牢笼摘要1引言2背景和问题陈述3LLM微调中的安全偏......
CF933-Div3 大致思路+题解
A-RudolfandtheTicket纯水题暴力枚举直接过$code$#include<bits/stdc++.h>#definefo(x,y,z)for(int(x)=(y);(x)<=(z);(x)++)#definefu(x,y,z)for(int(x)=(y);(x)>=(z);(x)--)inlineintqr(){ charch=getchar();intx=0,f=1; for(;ch<'0......
cfRound933div3-E题解
E-RudolfandkBridges题意:选择的桥在连续的行中,每个桥的支架安装位置是可以不一样的.做法：赛时也感觉也感觉是dp,但是害怕dp,就选择了逃避.往贪心方向想,认为每次到了每个跳板都要跳到最远距离,实际上这样是不行的.很明显,可能存在近一点的点花费更少。实际上是dp,而且也不......
[LeetCode] 2789. Largest Element in an Array after Merge Operations
Youaregivena0-indexedarraynumsconsistingofpositiveintegers.Youcandothefollowingoperationonthearrayanynumberoftimes:Chooseanintegerisuchthat0<=i<nums.length-1andnums[i]<=nums[i+1].Replacetheelementnums......
【PR】Block-NeRF: Scalable Large Scene Neural View Synthesis
【简介】本文的作者来自UCBerkeley，Waymo和Google研究院，一听就是大佬。发表在CVPR2022。【创新点】【review】【方法】【结论】【参考】TancikM,CasserV,YanX,etal.Block-nerf:Scalablelargesceneneuralviewsynth......
使用 Keras 和 ArcGIS Pro 通过 Mask-RCNN/DeepLabV3+ 进行 EagleView 高分辨率图像语
机器学习中的计算机视觉为GIS提供了巨大的机会。其任务包括获取、处理、分析和理解数字图像的方法，以及从现实世界中提取高维数据以产生数字或符号信息，例如以决策的形式。在过去的几年中，计算机视觉越来越多地从传统的统计方法转向最先进的深度学习神经网络技......

whisper-large-v3：速度快的令人翻译模型三种实用的调用方法

相关文章

赞助商

阅读排行