python做中文词频统计代码（Python文本词频统计）

小君 2023-06-14 12:32:52 899

很多时候需要对一篇文章统计其中多次出现词语，进而分析文章的内容，这就需要用到词频统计。词频统计就是累加问题，即对文档中每个词设置一个计数器，词语每出现一次，相关计数器就加一次。

def gettext(): text = open('ceshi.txt' 'r').read() text = text.lower() for ch in '!''#*() -:; ?></@[\\]^_’{|}~': text =text.replace(ch " ") return text text = gettext() words = text.split() counts={} for word in words: counts[word] = counts.get(word 0) 1 items = list(counts.items()) items.sort(key=lambda x:x[1] reverse=True) for i in range(10): word count=items[i] print("{0:10} {1:>5}".format(word count))

结果如下：

python做中文词频统计代码（Python文本词频统计）(1)

一句话里面如何用Python统计词频（Python实现词频分析器）
这样的公交车我愿意挤来挤去（这样的公交车谁又不喜欢乘坐呢）

网站首页

返回栏目

python做中文词频统计代码（Python文本词频统计）

猜您喜欢：

相关文章