描述
尾号为1,2,3的同学做,西游记相关的分词,出现次数最高的20个。
尾号为4,5,6的同学做,红楼梦相关的分词,出现次数最高的20个。
尾号为7,8,9,0的同学做,聊斋相关的分词,出现次数最高的20个。
需要把是同一个人不同说法,要合并成一个。比如 孙猴子和孙悟空,要算成一个。
输入输出示例
博客地址放这里
import jieba
import collections
with open('journey_to_the_west.txt', 'r', encoding='utf-8') as f:
text = f.read()
words = jieba.cut(text)
word_counts = collections.Counter(words)
top_20_words = word_counts.most_common(20)
for word, count in top_20_words:
print(word, count)
标签:jieba,word,尾号,words,20,分词 From: https://www.cnblogs.com/jauker/p/17912135.html