首页 > 其他分享 >jieba分词

jieba分词

时间:2023-12-21 21:45:09浏览次数:26  
标签:jieba word items file counts 分词

import jieba

# 读取文本文件
path = "红楼梦.txt"
file = open(path, "r", encoding="GB2312",errors="ignore")
text = file.read()
file.close()

# 使用jieba分词
words = jieba.lcut(text)

# 统计词频
counts = {}
for word in words:
# 过滤掉长度为1的词语
if len(word) == 1:
continue
# 更新字典中的词频
counts[word] = counts.get(word, 0) + 1

# 对字典中的键值对进行排序
items = list(counts.items())
items.sort(key=lambda x: x[1], reverse=True)

# 输出前20个高频词语
for i in range(20):
word, count = items[i]
print(f"{word:<10}{count:>5}")







 

标签:jieba,word,items,file,counts,分词
From: https://www.cnblogs.com/antea/p/17920181.html

相关文章

  • jieba分词
     importjiebatxt=open("D:\\python\\西游记.txt","r",encoding='ansi').read()words=jieba.lcut(txt)#使用精确模式对文本进行分词counts={}#通过键值对的形式存储词语及其出现的次数forwordinwords:iflen(word)==1:continueelifword......
  • jieba分词《聊斋》
    importjiebatxt=open("聊斋志异白话简写版.txt","r",encoding='utf-8').read()words=jieba.lcut(txt)#使用精确模式对文本进行分词counts={}#通过键值对的形式存储词语及其出现的次数forwordinwords:iflen(word)==1:continueelif......
  • jieba分词——西游记相关的分词,出现次数最高的20个
    1importjieba23txt=open("D:\Pythonproject\Python123作业\西游记.txt","r",encoding='utf-8').read()4words=jieba.lcut(txt)#使用精确模式对文本进行分词5counts={}#通过键值对的形式存储词语及其出现的次数67forwordinwords:......
  • jieba 分词
    西游记相关的分词,出现次数最高的20个输入:1importjieba2excludes={"一个","我们","怎么","那里","不知","不是","只见","两个","不敢","这个","如何","原来","甚......
  • jieba分词
    尾号为1,2,3的同学做,西游记相关的分词,出现次数最高的20个。```importjieba#读取文本文件path="西游记.txt"file=open(path,"r",encoding="utf-8")text=file.read()file.close()#使用jieba分词words=jieba.lcut(text)#统计词频counts={}forwordin......
  • jieba 分词
    描述尾号为1,2,3的同学做,西游记相关的分词,出现次数最高的20个。‪‬‪‬‪‬‪‬‪‬‮‬‪‬‫‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‮‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‭‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‫‬‪‬‪‬‪‬‪‬‪‬‮‬‭‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‭‬‪‬‪......
  • jieba 分词红楼梦相关的分词,出现次数最高的20个
    点击查看代码importjiebaimportwordclouddeftakeSecond(elem):returnelem[1]defcreateWordCloud(text):#生成词云函数w=wordcloud.WordCloud(font_path="STZHONGS.TTF",width=1000,height=500,background_color="white")w.g......
  • jieba 分词西游记
    importjiebatxt=open("西游记.txt","r",encoding='utf-8').read()words=jieba.lcut(txt)counts={}forwordinwords:iflen(word)==1:continueelifword=="大圣"orword=="老孙"or......
  • jieba分词
    jieba分词‪‬‪‬‪‬‪‬‪‬‮‬‪‬‫‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‮‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‭‬‫‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‭‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬描述‪‬‪‬‪‬‪‬‪‬‮‬‪‬......
  • jieba分词之聊斋
    importjiebaexcludes={"不知","不可","一日","不敢","数日","以为","不能","可以","不得","如此","------------","三日","而已","明日","其中&qu......