信息网络安全 ›› 2014, Vol. 14 ›› Issue (11): 30-35.doi: 10.3969/j.issn.1671-1122.2014.11.005

• • 上一篇    下一篇

大数据时代中文文本褒贬倾向性分类研究

曾凡锋, 朱万山(), 王景中   

  1. 北方工业大学信息工程学院,北京 100144
  • 收稿日期:2014-09-28 出版日期:2014-11-01 发布日期:2020-05-18
  • 作者简介:

    作者简介: 曾凡锋(1966-),男,江西,副研究员,硕士,主要研究方向:面向对象技术、信息安全、图像处理、智能控制、系统辨识等;朱万山(1988-),男,吉林,硕士研究生,主要研究方向:信息安全;王景中(1962-),男,内蒙古,教授,硕士,主要研究方向:计算机通信网络与信息安全技术。

  • 基金资助:
    北京市自然科学基金重点项目B类[KZ2010009008];科技成果转化项目[PXM2013];北京市创新团队计划项目[HT20130502]

Research on Chinese Text Appraisive Classification in the Present Era of Big Data

ZENG Fan-feng, ZHU Wan-shan(), WANG Jing-zhong   

  1. College of Information Engineering of North China University of Technology, Beijing 100144, China
  • Received:2014-09-28 Online:2014-11-01 Published:2020-05-18

摘要:

在当前的大数据时代,互联网上的博客、论坛产生了海量的主观性评论信息,这些评论信息表达了人们的各种情感色彩和情感倾向性。如果仅仅用人工的方法来对网络上海量的评论信息进行分类和处理实在是太难了,那么,如何高效地挖掘出网络上大量的具有褒贬倾向性观点的信息就成为目前亟待解决的问题,中文文本褒贬倾向性分类技术研究正是解决这一问题的一个方法。文章介绍了常用的文本特征选择算法,分析了文档频率和互信息算法的不足,通过对两个算法的对比和研究,结合文本特征与文本类型的相关度和文本褒贬特征的出现概率,提出了改进的文本特征选择算法(MIDF)。实验结果表明,MIDF算法对文本褒贬倾向性分类是有效的。

关键词: 褒贬倾向性分类, 文本特征选择, 褒贬特征提取

Abstract:

In the current era of big data, the Internet blog, forum produce a flood of subjective comment information which express various peoples’ color emotion and emotional tendency. It is so difficult to classify and process the massive comment information only by using the artificial methods, then how to efficiently dig out a lot of information that has appraisive views on the network has become an urgent problem at present. The research on Chinese text appraisive classification technology is the way to solve this problem. This article describes the common text feature selection algorithms, analyzes the shortcomings of document frequency and mutual information algorithm. By comparing and analyzing the two algorithms, combined with the relevance of text feature and text classification and the probability that the text feature appears, this article proposes an improved text feature selection algorithm(MIDF). The experimental results show that, MIDF is valid to the appraisive classification research.

Key words: appraisive classification, text feature selection, appraisive feature extracting

中图分类号: