Image-based Phishing Email Detection Method and Implementation

doi:10.3969/j.issn.1671-1122.2021.09.008

Abstract

Abstract:

Email phishing attack is an APT attack method that exploits lack of consciousness of cyber security and software vulnerability. It can cause serious damage and the number of attacks is gradually increasing. The class imbalance problem of phishing emails and normal emails has been a difficult topic in the field of cyber security. Extracting the characteristics of email body for analysis also has the risk of infringing the user’s personal privacy. The paper proposed an image-based phishing email detection method. It used Simhash algorithm to transform emails into images, and then used LBP method to extract its features. It could not only retain the original information of emails, but also protected the privacy of users. In the paper, DCGAN model was used to expand the phishing email data set. It solved the class imbalance problem in emails and improved the accuracy of Inception V3 model for image classification. Experiments show that this method can detect phishing emails effectively, and the precision of experiments can reach to 92.8%.

Key words: phishing email, image, generative adversarial networks, conventional neural network

CLC Number:

TP309

YI Xiaoyang, ZHANG Jian. Image-based Phishing Email Detection Method and Implementation[J]. Netinfo Security, 2021, 21(9): 52-58.

Figures/Tables 4

References 16

[1]	ALEROUD A, ZHOU Lina. Phishing Environments, Techniques, and Countermeasures: A Survey[J]. Computers & Security, 2017, 68(7): 160-196. doi: 10.1016/j.cose.2017.04.006 URL
[2]	SIADAT H, MEMON N. Detecting Structurally Anomalous Logins within Enterprise Networks [C]//ACM. Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Oct 30-Nov 3, 2017, Dallas, TX, USA. New York: ACM, 2017: 1273-1284.
[3]	STRINGHINI G, THONNARD O. That Ain’t You: Blocking Spearphishing Through Behavioral Modelling [C]//Springer. Conference on Detection of Intrusions and Malware and Vulnerability Assessment (DIMVA), July 9-10, 2015, Milan, Italy. Berlin: Springer, 2015: 78-97.
[4]	LI Xue, ZHANG Dongmei, WU Bin. Detection Method of Phishing Email Based on Persuasion Principle [C]//IEEE. 2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), April 12-14, 2020, Chongqing, China. New York: IEEE, 2020: 571-574.
[5]	DUMAN S, KALKAN-CAKMAKCI K, EGELE M, et al. Email Profiler: Spearphishing Filtering with Header and Stylometric Features of Emails [C]//IEEE. Computer Software & Applications Conference, June 10-14, 2016, Atlanta, GA, USA. New York: IEEE, 2016: 408-416.
[6]	GASCON H, ULLRICH S, STRITTER B, et al. Reading Between the Lines: Content-agnostic Detection of Spear-phishing Emails [C]//Springer. 21st International Symposium, RAID 2018, September 10-12, 2018, Heraklion, Crete, Greece. Berlin: Springer, 2018: 69-91.
[7]	HU Xuan, LI Banghuai, ZHANG Yang, et al. Detecting Compromised Email Accounts from the Perspective of Graph Topology [C]//ACM. 11th International Conference, June 2016, Nanjing, China. New York: ACM, 2016: 76-82.
[8]	HO G, CIDON A, GAVISH L, et al. Detecting and Characterizing Lateral Phishing at Scale [C]//USENIX. 28th USENIX Security Symposium, August 14-16, 2019, Berkeley, USA. Berkeley: USENIX, 2019: 1273-1290.
[9]	YANG Peng, ZHAO Guangzhen, ZENG Peng. Phishing Website Detection Based on Multidimensional Features Driven by Deep Learning[J]. IEEE Access, 2019, 7(7): 15196-15209. doi: 10.1109/ACCESS.2019.2892066 URL
[10]	YU Gaoqing, FAN Wenqing, HUANG Wei, et al. An Explainable Method of Phishing Emails Generation and Its Application in Machine Learning [C]//IEEE. 2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), April 12-14, 2020, Chongqing, China. New York: IEEE, 2020: 1279-1283.
[11]	ALOTAIBI R, AL-TURAIKI I, ALAKEEL F. Mitigating Email Phishing Attacks Using Convolutional Neural Networks [C]//IEEE. 2020 3rd International Conference on Computer Applications & Information Security, March 19-21, 2020, Riyadh, Saudi Arabia. New York: IEEE, 2020: 1-6.
[12]	FANG Yong, ZHANG Cheng, HUANG Cheng, et al. Phishing Email Detection Using Improved RCNN Model with Multilevel Vectors and Attention Mechanism[J]. IEEE Access. 2019, 7(1): 56329-56340. doi: 10.1109/Access.6287639 URL
[13]	CHOLLET F. Deep Learning with Python[M]. Greenwich: Manning Publications. 2017.
[14]	VERMA R, HOSSAIN N. Semantic Feature Selection for Text with Application to Phishing Email Detection [C]//IEEE. International Conference on Information Security and Cryptology, November 27-29, 2013, Seoul, Korea. New York: IEEE, 2013: 455-468.
[15]	BERGHOLZ A, CHANG J H, PAASS G, et al. Improved Phishing Detection Using Model-based Features [C]//DBLP. The 5th Conference on Email and Anti-spam, August 21-22, 2008, Mountain View. California, USA: DBLP, 2008.
[16]	FEROZ M N, MENGEL S. Phishing URL Detection Using URL Ranking[EB/OL]. , 2020-12-20.

模型	TPR	FPR	Precision
决策树	0.948	0.042	0.957
LR	0.960	0.046	0.954
贝叶斯	0.956	0.266	0.782
CNN	0.932	0.050	0.928