信息网络安全 ›› 2019, Vol. 19 ›› Issue (12): 72-78.doi: 10.3969/j.issn.1671-1122.2019.12.009
收稿日期:
2019-08-10
出版日期:
2019-12-10
发布日期:
2020-05-11
作者简介:
作者简介:冯胥睿瑞(1996—),女,四川,硕士研究生,主要研究方向为网络数据分析与信息安全;刘嘉勇(1962—),男,四川,教授,博士,主要研究方向为网络信息安全、网络信息处理、大数据分析;程芃森(1988—),男,四川,博士研究生,主要研究方向为信息内容安全。
基金资助:
Xuruirui FENG, Jiayong LIU(), Pengsen CHENG
Received:
2019-08-10
Online:
2019-12-10
Published:
2020-05-11
摘要:
为应对恶意软件对网络空间安全的威胁,安全厂商发布了大量恶意软件报告,其中蕴含着许多网络安全相关信息,如恶意软件的特征能力及其所采取的具体行为模式。通过对这些恶意软件报告进行分析获取相关信息,有助于研究人员全面了解恶意软件功能,实现有效防御。自动从报告中抽取与恶意软件能力及行为相关的文本的任务,存在报告数量庞大、文本结构松散、一词多义的问题。为此,文章提出基于Bert预训练模型获取特征向量的方法,以实现对多义词的消歧,通过BiLSTM和注意力机制进一步提取特征,训练分类器。利用MalwareTextDB数据集进行实验,召回率和F1值分别可达到85.56%和66.67%。与其他模型进行比较,该模型能够更高效地自动从恶意软件报告中提取与恶意软件行为特征及能力相关文本。
中图分类号:
冯胥睿瑞, 刘嘉勇, 程芃森. 基于特征提取的恶意软件行为及能力分析方法研究[J]. 信息网络安全, 2019, 19(12): 72-78.
Xuruirui FENG, Jiayong LIU, Pengsen CHENG. Analyzing Malware Behavior and Capability Related Text Based on Feature Extraction[J]. Netinfo Security, 2019, 19(12): 72-78.
表2
数据示例
句子 | 数据集 | 标签 |
---|---|---|
All three samples provided remote access to the attacker, via two Command and Control(C2)Servers . | 训练集 | 恶意软件相关 |
The samples were clearly malicious and varied in sophistication . | 训练集 | 恶意软件不相关 |
To provide access to the server of interest the at-tackers may appropriately modify rules for firewalls Microsoft TMG, CISCO, etc . | 验证集 | 恶意软件相关 |
Here is a table with the minimal information about 46 different samples . | 验证集 | 恶意软件不相关 |
The“Cohhoc“malware uses an obfuscation layer, to disguise the malware and to complicate the analysis . | 测试集 | 恶意软件相关 |
For example, this code can perform any of the following actions. | 测试集 | 恶意软件不相关 |
[1] | WANG Shaomin, YANGDi, RENHua.Key Technology Research and Model Validation of Text Classification System Based on Deep Learning[J]. Telecommunications Science, 2018, 34(12): 117-124. |
汪少敏,杨迪,任华.基于深度学习的文本分类系统关键技术研究与模型验证[J].电信科学,2018,34(12):117-124. | |
[2] | KIM Y.Convolutional Neural Networks for Sentence Classification[C]// Association for Computational Linguistics. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing(EMNLP), October 25-29, 2014, Doha, Qatar. Stroudsburg PA: Association for Computational Linguistics, 2014: 1746-1751. |
[3] | KALCHBRENNER N, GREFENSTETTE E, BLUNSOM P.A Convolutional Neural Network for Modelling Sentences[C]// Association for Computational Linguistics. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, June 22-27, 2014, Baltimore‚USA. Stroudsburg PA: Association for Computational Linguistics, 2014: 655-665. |
[4] | TANG D, QIN B, LIU T.Document Modeling with Gated Recurrent Neural Network for Sentiment Classification[C]// Association for Computational Linguistics. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, September 17-21, 2015, Lisbon, Portugal. Stroudsburg PA: Association for Computational Linguistics, 2015: 1422-1432. |
[5] | YANG Z, YANG D, DYER C, et al.Hierarchical Attention Networks for Document Classification[C]// Association for Computational Linguistics. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, June 12-17, 2016, San Diego, California. Stroudsburg PA: Association for Computational Linguistics, 2016: 1480-1489. |
[6] | YANG Dong, WANG Yizhi.An Attention-based C-GRU Neural Network for Text Classification[J]. Computer and Modernization, 2018, 34(2): 96-100. |
杨东,王移芝.基于Attention-based C-GRU神经网络的文本分类[J]. 计算机与现代化,2018,34(2):96-100. | |
[7] | JIANG Dapeng.Research on Short Text Classification Based on Word Distributed Representation[D]. Hangzhou: Zhejiang University, 2015. |
江大鹏. 基于词向量的短文本分类方法研究[D].杭州:浙江大学,2015. | |
[8] | WANG Wei, SUN Yuxia, QI Qingjie, et al.Text Sentiment Classification Model Based on BiGRU-Attention Neural Network[J]. Application Research of Computers, 2018, 36(12): 1-10. |
王伟,孙玉霞,齐庆杰,等.基于BiGRU-Attention神经网络的文本情感分类模型[J].计算机应用研究,2018,36(12):1-10. | |
[9] | BENGIO Y, DUCHARME R, VINCENT P, et al.A Neural Probabilistic Language Model[J]. Journal of Machine Learning Research, 2003, 3(1): 1137-1155. |
[10] | MIKOLOV T, CHEN K, CORRADO G, et al.Efficient Estimation of Word Representations in Vector Space[J]. Computer Science, 2013(1):28-36. |
[11] | PENNINGTON J, SOCHER R, MANNING C.Glove: Global Vectors for Word Representation[C]//Association for Computational Linguistics. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing(EMNLP), October 25-29, 2014, Doha, Qatar. Stroudsburg, PA: 2014: 1532-1543. |
[12] | SIKDAR U K, BARIK B, GAMBÄCK B. Flytxt_NTNU at SemEval-2018 Task 8: Identifying and Classifying Malware Text Using Conditional Random Fields and Naive Bayes Classifiers[C]//Association for Computational Linguistics. Proceedings of The 12th International Workshop on Semantic Evaluation, June 5-6, 2018, New Orleans, Louisiana. Stroudsburg, PA: Association for Computational Linguistics, 2018: 890-893. |
[13] | LOYOLA P, GAJANANAN K, WATANABE Y, et al.Villani at SemEval-2018 Task 8: Semantic Extraction from Cybersecurity Reports using Representation Learning[C]//Association for Computational Linguistics. Proceedings of The 12th International Workshop on Semantic Evaluation, June 5-6, 2018, New Orleans, Louisiana. Stroudsburg, PA: Association for Computational Linguistics, 2018: 885-889. |
[14] | PETERS M E, NEUMANN M, IYYER M, et al.Deep Contextualized Word Representations[C]// Association for Computational Linguistics. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, June 1-6, 2018, New Orleans, Louisiana. Stroudsburg, PA: Association for Computational Linguistics, 2018: 2227-2237. |
[15] | RADFORD A, NARASIMHAN K, SALIMANS T, et al. Improving Language Understanding by Generative Pre-training[EB/OL]. , 2018-11-5. |
[16] | DEVLIN J, CHANG M W, LEE K, et al.Bert: Pre-training of Deep Bidirectional Transformers for Language Understanding[C]// Association for Computational Linguistics. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, June 2-7, 2019, Minneapolis, Minnesota. Stroudsburg, PA: Association for Computational Linguistics, 2019: 4171-4186. |
[17] | BRIDGES R A, JONES C L, IANNACONE M D, Testa, et al. Automatic Labeling for Entity Extraction in Cyber Security[EB/OL]. , 2018-11-5. |
[18] | PHANDI P, SILVA A, LU W.Semeval-2018 Task 8: Semantic Extraction from Cybersec Urity Reports Using Natural Language Processing(SecureNLP)[C]//Association for Computational Linguistics. Proceedings of The 12th International Workshop on Semantic Evaluation, June 5-6, 2018, New Orleans, Louisiana. Stroudsburg, PA: Association for Computational Linguistics, 2018: 697-706. |
[19] | MA C, ZHENG H, XIE P, et al.DM_NLP at SemEval-2018 Task 8: Neural Sequence Labeling with Linguistic Features[C]//Association for Computational Linguistics. Proceedings of The 12th International Workshop on Semantic Evaluation, June 5-6, 2018, New Orleans, Louisiana. Stroudsburg, PA: Association for Computational Linguistics, 2018: 707-711. |
[1] | 侯留洋, 罗森林, 潘丽敏, 张笈. 融合多特征的Android恶意软件检测方法[J]. 信息网络安全, 2020, 20(1): 67-74. |
[2] | 宋鑫, 赵楷, 张琳琳, 方文波. 基于随机森林的Android恶意软件检测方法研究[J]. 信息网络安全, 2019, 19(9): 1-5. |
[3] | 张健, 陈博翰, 宫良一, 顾兆军. 基于图像分析的恶意软件检测技术研究[J]. 信息网络安全, 2019, 19(10): 24-31. |
[4] | 王媛媛, 范潮钦, 苏玉海. 面向聊天记录的语义分析研究[J]. 信息网络安全, 2017, 17(9): 89-92. |
[5] | 任浩, 罗森林, 潘丽敏, 高君丰. 基于图结构的文本表示方法研究[J]. 信息网络安全, 2017, 17(3): 46-52. |
[6] | GULKhanSafiQamas, 尹继泽, 潘丽敏, 罗森林. 基于深度神经网络的命名实体识别方法研究[J]. 信息网络安全, 2017, 17(10): 29-35. |
[7] | 张谦, 高章敏, 刘嘉勇. 基于Word2vec的微博短文本分类研究[J]. 信息网络安全, 2017, 17(1): 57-62. |
[8] | 张健, 王文旭, 牛鹏飞, 顾兆军. 恶意软件防治产品与服务评测体系研究[J]. 信息网络安全, 2016, 16(9): 113-117. |
[9] | 尚海, 罗森林, 韩磊, 张笈. 基于句义成分的短文本表示方法研究[J]. 信息网络安全, 2016, 16(5): 64-70. |
[10] | 丁庸, 曹伟, 罗森林. 基于LKM系统调用劫持的恶意软件行为监控技术研究[J]. 信息网络安全, 2016, 16(4): 1-8. |
[11] | 林佳萍, 李晖. 安卓恶意软件检测研究综述[J]. 信息网络安全, 2016, 16(10): 80-88. |
[12] | 郑生军, 郭龙华, 陈建, 南淑君. 基于虚拟执行技术的高级恶意软件攻击在线检测系统[J]. 信息网络安全, 2016, 16(1): 29-33. |
[13] | 黄世锋, 郭亚军, 崔建群, 曾庆江. 基于优化模糊C均值的手机恶意软件检测[J]. 信息网络安全, 2016, 16(1): 45-50. |
[14] | 树雅倩, 付安民, 黄振涛. 基于云平台的移动支付类恶意软件检测系统的设计与实现[J]. 信息网络安全, 2016, 16(1): 59-63. |
[15] | 尚进, 谢军, 蒋东毅, 陈怀临. 现代网络安全架构异常行为分析模型研究[J]. 信息网络安全, 2015, 15(9): 15-19. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||