The Method of Classifying Network Public Opinion Text Based on Random Forest Algorithm

doi:10.3969/j.issn.1671-1122.2014.11.006

Abstract

Abstract:

Faced with massive growth of Internet public opinion information, it’s very meaningful to classify these public opinion text information. First of all, this paper established the model of text document representation and selection of feature selection function. Then, it analyzed the characteristics of random forest algorithm in classification learning algorithm, and proposed to complete a series of document category by constructing decision tree. In the experiments, it collected a large number of network media corpora, and set the training and test, the common algorithm is obtained by contrast test (including the kNN, SMO, SVM) compared with the algorithm of RF quantitative performance data, this paper demonstrated that the proposed algorithm has better comprehensive classification rate and the stability of classification.

Key words: network public opinion text, random forest algorithm, document detection tree, document classification

CLC Number:

TP309

WU Jian, SHA Jing. The Method of Classifying Network Public Opinion Text Based on Random Forest Algorithm[J]. Netinfo Security, 2014, 14(11): 36-40.

Figures/Tables 5

References 19

[1]	中国互联网络信息中心. 第33次中国互联网络发展状况统计报告[R], 2014.
[2]	许鑫, 章成志, 李雯静. 国内网络舆情研究的回顾与展望[J]. 情报理论与实践, 2009, 32(3): 115-120.
[3]	彭辉, 姚颉靖. 我国政府应对网络舆情的现状及对策研究——基于33件网络舆情典型案例分析[J]. 北京交通大学学报(社会科学版), 2014, 13(3): 102-109.
[4]	徐厌平, 邵梦洁. 公共治理视域下中国网络舆情危机及应对研究[J]. 求索, 2013, (11): 250-252.
[5]	万源. 基于语义统计分析的网络舆情挖掘技术研究[D]. 武汉:武汉理工大学, 2012.
[6]	Fabrizio Sebastiani, Machine learning in automated text categorization, ACM Computing Surveys, 2002, 34(1):1-47.
[7]	Maria Fernanda Caropreso, Stan Matwin, Fabrizio Sebastiani, A learner-independent evaluation of the usefulness of statistical phrases for automated text categorization, Text databases & document management, IGI Publishing Hershey, PA, USA, 2001, 78-102.
[8]	余一骄, 刘芹. 基于语义的中文网页检索[J]. 计算机科学, 2012, 39(8): 79-87.
[9]	Gerard Salton, Christopher Buckley. Information Processing and Management , 1988, 24(5):513—523.
[10]	Busagala L.S.P., Ohyama W., Wakabayashi T., Kimura F., Multiple Feature-Classifier Combination in Automated Text Classification, 2012 10th IAPR International Workshop on Document Analysis Systems, 2012, 43-47.
[11]	Norbert Fuhr, Chris Buckley, A probabilistic learning approach for document indexing, ACM Transactions on Information Systems, 1991, 9(3):223-248.
[12]	Miguel E. Ruiz, Padmini Srinivasan, Hierarchical neural networks for text categorization, Proceedings of the 22nd annual international ACM SIGIR conference, California, United States, 1999, 281-282.
[13]	Caropreso M F, Matwin S, Sebastiani F.A learner-independent evaluation of the usefulness of statistical phrases for automated text categorization. Text databases and document management: Theory and practice, 2001: 78-102.
[14]	Galavotti L, Sebastiani F, Simi M.Experiments on the use of feature selection and negative evidence in automated text categorization, Research and Advanced Technology for Digital Libraries. Springer Berlin Heidelberg, 2000: 59-68.
[15]	Hwee Tou Ng, Wei Boon Goh, Kok Leong Low, Feature selection, perceptron learning, and a usability case study for text categorization, Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval, 1997, 31(SI): 67-73.
[16]	袁辛奋,胡子林.浅析突发事件的特征、分类及意义[J].科技与管理,2005,7(2):23-25.
[17]	Chen Huang, Xiaoqing Ding, Chi Fang, Head Pose Estimation Based on Random Forests for Multiclass Classification, 20th International Conference on Pattern Recognition, Istanbul, 2010, 934-937.
[18]	E Wiener.A neural network approach to topic spotting, The 4th Annual Symposium on Document Analysis and Information Retrieval, Las Vegas: ACM Press, 1995: 317-332.
[19]	Abdul-Rahman S., Exploring Feature Selection and Support Vector Machine in Text Categorization, IEEE 16th International Conference on Computational Science and Engineering, Sydney, 2013:1101-1104.