最近在做翻译相关的工作,需要先判断语种,找到了以下几种方法:fasttext、fastlid(基于fasttext)、langid、langdetect、googletrans、google_trans_new(改进googletrans),接下来就实现一下这几种方法。
import fasttext
from fastlid import fastlid
import langid
from langdetect import detect
from googletrans import Translator
from httpcore import SyncHTTPProxy
from google_trans_new import google_translator
## 调用google翻译需要代理
http_proxy = SyncHTTPProxy((b'http', b'xxx.xxx.xxx.xxx', xxxx, b''))
proxies = {'http': http_proxy, 'https': http_proxy}
translator = Translator(service_urls=['translate.google.com', 'translate.google.hk'], proxies=proxies)
## google_trans_new的代理设置比较简单
detector2 = google_translator(url_suffix="com", proxies={'http': 'xxx.xxx.xxx.xxx:xxxx', 'https': 'xxx.xxx.xxx.xxx:xxxx'})
text = ["全键热插拔", "主水路无双酚A,出水水路无硅胶,保障饮水安全", "图片、表格", "聯交所"]
fasttext.FastText.eprint = lambda x: None
## 模型需要下载,地址https://fasttext.cc/docs/en/language-identification.html
path_to_pretrained_model = 'resources/lid.176.bin'
fmodel = fasttext.load_model(path_to_pretrained_model)
def retry(func, text, max_retries=3):
'''
重试机制,谷歌翻译容易调用失败
'''
for i in range(max_retries):
try:
result = func(text)
return result
except Exception as e:
print(f'Error: {e}')
print(f'Retrying ({i+1}/{max_retries})...')
print(f'Failed after {max_retries} retries.')
return None
for i in text:
print("fasttext: ", fmodel.predict(i))
print("fastlid: ", fastlid(i))
print("langid: ", langid.classify(i))
print("langdetect: ", detect(i))
print("googletrans: ", retry(translator.detect, i))
print("google_trans_new: ", retry(detector2.detect, i))
至于时间和检测效果,大家可以自行判断,我主要测试了中文的效果,googletrans、google_trans_new最准确,fasttext、fastlid速度最快
标签:google,fasttext,语种,检测,xxx,print,import,fastlid From: https://www.cnblogs.com/zhiyixue/p/17476830.html