Netinfo Security ›› 2015, Vol. 15 ›› Issue (9): 170-174.doi: 10.3969/j.issn.1671-1122.2015.09.039

• Orginal Article • Previous Articles     Next Articles

A Study on Incremental Text Clustering in Sensitive Topic Detection

Yue-jin ZHANG1(), Ding DING2   

  1. 1. Internet Information Office of Beijing, Beijing 100062, China
    2. College of Computer, Wuhan University, Wuhan Hubei 430072, China
  • Received:2015-07-15 Online:2015-09-01 Published:2015-11-13

Abstract:

Faced with the huge amounts of news data which updated on the Internet all the time, Sensitive Topic Detection and Tracking has become an important research now. In this paper, we discuss and research the incremental text clustering algorithm for sensitive topic detection in a online consensus analysis system. We introduce the related work of text clustering. Based on the Single-pass algorithm, we improve its performance and propose a new incremental text clustering algorithm which based on simhash. Based on the real online news corpus from the online consensus analysis system, we conduct an experiment to test and verify the feasibility and effectiveness of the algorithm we proposed. The result shows that the new algorithm is much more efficient compared to the original Single-pass clustering algorithm. In the real application, the new incremental text clustering algorithm basically meet the real-time demand of online topic detection and has a certain practical value.

Key words: sensitive topic detection, Simhash, incremental text clustering, Single-pass

CLC Number: