• • 上一篇    下一篇

垃圾邮件分类技术对比研究

赵晓丹%徐燕   

  • 基金资助:
    国家科技部“十一五”科技计划[2012BAH39B02]、北京市自然科学基金[4122076]、北京语言大学研究生创新基金[13YCX176]、中央高校基本科研业务费专项资金

The Comparative Study of Methods on Spam Filtering

ZHAO Xiao-dan%XU Yan   

  • About author:北京语言大学信息科学学院,北京100083; 中国科学院计算技术研究所,北京100190

摘要: 文章主要进行了接收端的垃圾邮件处理技术的对比研究,包括预处理、特征选择和分类3大步骤。其中特征选择技术包括文档频率(DF)、信息增益(IG)、优势率(ODD)等方法。文章详细介绍了其中基于粗糙集理论的特征选择方法--信息增益(knowledge gain),并用实验验证了该方法在正确率等指标中的突出表现。主流分类器算法包括k近邻、贝叶斯、SVM等,其中详细展示了线性分类器在垃圾邮件分类算法实验中的突出表现。

Abstract: This paper mainly introduces the comparative study of methods dealing with the spam in receiving end, which includes preprocessing, feature selecting and classifying. Documents frequency, information gain and odds ratio are all the methods of feature selection.This paper also introduces a new method of feature selection,which is knowledge gain. Its excellent behavior is veriifed in the experiment. Common classiifers include KNN, Bayes, SVM, etc. The liner classiifers also have the advantages in spam which is presented in the experiment too.