中文句子标点符号预测

时间：2022-10-20 19:34:32浏览次数：89

标签：中文 ids token result input 句子 sent 标点符号

中文句子标点符号预测

https://github.com/jiangnanboy/punctuation_prediction

对一个没有标点符号的句子预测标点，主要预测逗号、句号以及问号（，。？）

给句子添加标点符号

请下载模型 [pun_model.onnx]，将模型放入model/ernie_onnx目录下。

链接：https://pan.baidu.com/s/1l62YmuU3giNPkT2TonZRKA 提取码：sy12

def onnx_infer(sess, tokenizer, sent):
    tokenized_tokens = tokenizer(sent)
    input_ids = np.array([tokenized_tokens['input_ids']], dtype=np.int64)
    token_type_ids = np.array([tokenized_tokens['token_type_ids']], dtype=np.int64)
    result = sess.run(
        output_names=None,
        input_feed={"input_ids": input_ids,
                    "token_type_ids": token_type_ids}
    )[0]
    return result, input_ids

输出结果：

sent: 从小我有个梦想这个梦想是我想当一个科学家 -> result: 从小我有个梦想，这个梦想是我想当一个科学家。
------------------------------------------------
sent: 中国的首都是北京我爱我的祖国 -> result: 中国的首都是北京。我爱我的祖国。
------------------------------------------------
sent: 早上起来穿衣吃饭后我就上学了在路上碰见了许久不见的一个朋友 -> result: 早上起来，穿衣吃饭后，我就上学了，在路上碰见了许久不见的一个朋友。

标签：中文,ids,token,result,input,句子,sent,标点符号
From： https://www.cnblogs.com/little-horse/p/16810990.html

Mysql变量插入中文失败
当给变量赋值中文时，报错：mysql>createprocedurepro_test4()->begin->declareheightintdefault175;->declaredescriptionvarchar(50)default......
Django Rest Framework中文文档：Serializer relations
这是对DRF官方文档：Serializerrelations的翻译，根据个人的理解对内容做了些补充、修改和整理。一，django模型间的关系在我们对数据进行建模时，最重要的一点就是根据功能需求......
autocad2023 for mac中文版（cad2023三维绘图设计软件）
autocad2023formac软件的最新功能，包括行业特定的工具集、新的自动化以及跨设备和Autodesk产品的无缝连接。AutoCAD2023forMac中文版软件介绍AutoCAD是由美国欧特克......
Windows 10, version 22H2 (released Oct 2022) 简体中文版、英文版下载
请访问原文链接：https://sysin.org/blog/windows-10/，查看最新版。原创作品，转载请保留出处。Windows10版本信息2022/10/19从Windows10版本21H2开始，Windows10版......
Ubuntu中文乱码问题
Ubuntu中文乱码问题安装中文字体相关依赖:sudoapt-getinstalllanguage-pack-zh-hanssudoapt-getinstallfonts-droid-fallbackttf-wqy-zenheittf-wqy-microhei......
Microsoft Excel 2019 for Mac(excel电子表格)中文正式版mac/win
MicrosoftExcel是微软公司的办公软件Microsoftoffice的组件之一，是由Microsoft为Windows和AppleMacintosh操作系统的电脑而编写和运行的一款试算表软件。Excel是微软办......
Tomcat startup.bat启动控制台中文乱码问题
一.问题背景以双击运行tomcat目录下startup.bat的方式启动tomcat程序。控制台输出的中文提示信息都是乱码二、问题原因windows默认编码集为GBK，由于使用s......
Wondershare Filmora X 中文直装版 (喵影工厂视频编辑器)mac/win
WondershareFilmoraforMac是一个易于使用的视频编辑器，wondersharefilmora mac版具有强大的功能和丰富的功能，使用wondersharefilmora mac软件，可以让你制作出高质量......
EasyNLP发布融合语言学和事实知识的中文预训练模型CKBERT
导读预训练语言模型在NLP的各个应用中都有及其广泛的应用；然而，经典的预训练语言模型（例如BERT）缺乏对知识的理解，例如知识图谱中的关系三元组。知识增强预训练模型使用外部知识（......
pytest中文文档
在网上找到的感觉还不错的pytest的中文文档，这里收藏一下：翻译的中文文档：完整的Pytest文档中文文档链接地址:https://www.osgeo.cn/pytest/contents.html#full-......

中文句子标点符号预测

中文句子标点符号预测

给句子添加标点符号

相关文章

赞助商

阅读排行