信息网络安全 ›› 2014, Vol. 14 ›› Issue (12): 27-31.doi: 10.3969/j.issn.1671-1122.2014.12.006

Previous Articles     Next Articles

Research and Implementation of Micro-blog Keyword Extraction Method Based on Clustering

SUN Xing-dong, LI Ai-ping, LI Shu-dong   

  1. College of Computer Science, National University of Defense Technology, Changsha Hunan 410073, China
  • Received:2014-10-08 Online:2014-12-15

Abstract: This paper presented a Micro-blog keyword extraction based on Clustering. It achieved in three steps. At first, the experiment pre-processed and breaked word on the microblogs, then used TF-IDF and TextRank algorithm to calculate word weight, according to the characteristics of short text microblogging used a combination of the two methods calculate weighting terms and extracted candidate keyword by clustering algorithm. Secondly, taked n is 2 defines the maximum probability left neighbor and maximum probability right neighbor based on the theory of n-gram language model, accordingly extended the candidate keywords into key phrases. At last, the result filtered according to the concept of accessory variety and semantic number of units in the semantics extension model. The experimental results show this method can effectively extracted the microblogs keywords and TextRank performed better than the TF-IDF when processed short text .

Key words: Key Words: micro-blog, clustering algorithm, TF-IDF, TextRank, n-gram language model

CLC Number: