信息网络安全 ›› 2023, Vol. 23 ›› Issue (2): 85-95.doi: 10.3969/j.issn.1671-1122.2023.02.010

• 技术研究 • 上一篇    下一篇

基于改进随机森林的Android广告应用静态检测方法

胡智杰1, 陈兴蜀2(), 袁道华1, 郑涛2   

  1. 1.四川大学计算机学院,成都 610065
    2.四川大学网络空间安全学院,成都 610207
  • 收稿日期:2022-10-19 出版日期:2023-02-10 发布日期:2023-02-28
  • 通讯作者: 陈兴蜀 E-mail:chenxsh@scu.edu.cn
  • 作者简介:胡智杰(1992—),男,贵州,硕士研究生,主要研究方向为移动安全|陈兴蜀(1968—),女,贵州,教授,博士,主要研究方向为可信计算、云计算和大数据安全|袁道华(1963—),男,四川,教授,硕士,主要研究方向为分布式并行处理和网络计算|郑涛(1994—)男,四川,博士研究生,主要研究方向为移动安全和软件安全分析
  • 基金资助:
    国家自然科学基金(U19A2081);国家自然科学基金(61802270);国家自然科学基金(61802271);教育部-中国移动科研基金(CM20200409);四川大学工科特色团队项目(2020SCUNG129)

Static Detection Method of Android Adware Based on Improved Random Forest Algorithm

HU Zhijie1, CHEN Xingshu2(), YUAN Daohua1, ZHENG Tao2   

  1. 1. School of Computer Science, Sichuan University, Chengdu 610065, China
    2. School of Cyber Science and Engineering, Sichuan University, Chengdu 610207, China
  • Received:2022-10-19 Online:2023-02-10 Published:2023-02-28
  • Contact: CHEN Xingshu E-mail:chenxsh@scu.edu.cn

摘要:

Android广告应用对用户正常使用Android手机构成了威胁,传统的广告应用检测方法时间成本高且受限于动态特征,难以满足大规模、高精度的检测需求。为解决此问题,文章提出一种基于改进随机森林的Android广告应用静态检测方法。首先,基于广告应用的特点,文章在传统的应用程序编程接口、权限、意图的基础上,将第三方库纳入特征选择的范围;对数据集中的广告软件的APK提取静态信息进行统计学分析,筛选后确定基准特征集合,将APK特征向量化;然后基于集成思想,利用多种特征选择算法共同选择用于模型训练的特征并赋予特征权重;最后使用基于特征权重的改进随机森林算法提高分类器的性能。实验选取了5751个广告应用和3465个非广告应用进行分类检测,实验结果表明,该方法能在保证准确率的情况下,具有较快的检测速度。

关键词: Android, 广告应用, 静态检测, 机器学习

Abstract:

Android adware shows advertisement in a disruptive way, and has the possibility to further transform into malware which posed a serious threat to user’s smartphone. The traditional adware detection method has high time costs and depends on dynamic feature of Android adware, making it difficult to respond to large-scale, high-precision detection requirements. To solve this problem, an Android adware static detection method based on improved random forest algorithm was proposed. Based on the characteristics of android adware, on the basis of traditional application programming interface, permission and intent, the third party library was included in the scope of feature selection. Statically decompile all the APK of adware collected in the dataset and extract the static information from them, and the static information was statistically analyzed to obtain the high-frequency information. After filtering this information, the base feature set was determined, and the static information in each APK was extracted and transforms into the feature vector, based on the idea of ensemble, used a variety of feature selection algorithms to joinly select features for model training and gave feature weights. Finally, the improved random forest algorithm based feature weights was used to improve the accuracy of the classifier, 5751 adware and 3465 non-adware application were selected for classification detection. The experimental results prove that the method has a faster speed while ensuring the accuracy.

Key words: Android, adware, static detection, machine learning

中图分类号: