基于语义的网络交易论坛虚拟身份同一性识别

doi:10.3969/j.issn.1671-1122.2020.12.007

信息网络安全 ›› 2020, Vol. 20 ›› Issue (12): 47-53.doi: 10.3969/j.issn.1671-1122.2020.12.007

基于语义的网络交易论坛虚拟身份同一性识别

张璇¹^,², 袁得嵛¹, 金波³()

1.中国人民公安大学信息网络安全学院,北京 100038
2.山东警察学院侦查系,济南 250014
3.公安部第三研究所,上海 201204

收稿日期:2020-09-19 出版日期:2020-12-10 发布日期:2021-01-12
通讯作者: 金波 E-mail:jinbo@stars.org.cn
作者简介:张璇（1980—）,女,山东,博士研究生,主要研究方向为网络安全、电子数据取证、网络犯罪侦查|袁得嵛（1986—）,男,河北,讲师,博士,主要研究方向为网络安全、网络犯罪侦查|金波（1972—）,男,浙江,研究员,博士,主要研究方向为网络安全、大数据
基金资助:
国家自然科学基金(61771072);辽宁省网络安全执法协同创新中心培育项目(WXZX-201912016);山东警察学院科技计划(YKJYB201706)

Virtual Identity Identification Based on Semantic for Network Trading Platform

ZHANG Xuan¹^,², YUAN Deyu¹, JIN Bo³()

1. School of Information Network Security, People’s Security University of China, Beijing 100038, China
2. Department of Investigation, Shandong Police College, Jinan 250014, China
3. The Third Research Institute ofMinistry of Public Security, Shanghai 201204, China

Received:2020-09-19 Online:2020-12-10 Published:2021-01-12
Contact: JIN Bo E-mail:jinbo@stars.org.cn

摘要/Abstract

摘要：

近年来,IT技术催生电子商务繁荣发展,网络交易深度融入到了人们的生产生活中。网络交易论坛作为重要的交易载体,其多样化和差异化也促使交易双方在不同平台注册账号,以多个虚拟身份进行商品买卖。由于不同交易论坛之间信息不共享,虚拟身份缺乏有效关联,无法进行数据汇聚,难以通过传统数据关联比对的方法识别用户,迫切需要新的技术方法对网络交易平台参与者虚拟身份进行深入分析,形成准确的身份映射。文章利用多个网络交易论坛数据,训练生成基于Doc2Vec语义相似度分析的虚拟身份同一性识别无监督模型,对出售商品的描述文本进行相似性计算,挖掘隐藏卖家同一虚拟身份,进而为用户画像、风控等技术场景提供支持。

关键词: Doc2Vec, 虚拟身份识别, 语义相似性

Abstract:

In recent years, the development of IT technology has given rise to the prosperity of online trading platforms, which are deeply integrated into people's production and life. The diversification and differentiation of online transactions also encourage both parties to register accounts on different platforms and use multiple virtual identities to buy and sell commodities. Due to the non-sharing of information between different platforms and the lack of effective association between virtual identities, data cannot be aggregated and it is difficult to identify users through the traditional data association comparison method. Therefore, new technical methods are urgently needed to effectively identify the virtual identities of participants of network trading platforms and form accurate identity mapping. Training data using multiple network trading platform, this paper generated virtual identity based on Doc2Vec semantic similarity analysis identity recognition unsupervised model, description of goods on sale text similarity calculation, dig the hidden sellers in the same virtual identity, and picture for the user, recommend, risk control and other technical application support.

Key words: Doc2Vec, virtual identity profiling, semantic similarity

中图分类号:

TP309

张璇, 袁得嵛, 金波. 基于语义的网络交易论坛虚拟身份同一性识别[J]. 信息网络安全, 2020, 20(12): 47-53.

ZHANG Xuan, YUAN Deyu, JIN Bo. Virtual Identity Identification Based on Semantic for Network Trading Platform[J]. Netinfo Security, 2020, 20(12): 47-53.

图/表 8

图1

图2

图3

图4

表1

图5

图6

表2

参考文献 21

[1]	GUAN Haotian. Research on Virtual Identity Mapping Algorithm for Network Sapce Entity[D]. Harbin: Harbin Institute of Technology, 2019.
	关皓天. 网络空间主体虚拟身份映射算法研究[D]. 哈尔滨:哈尔滨工业大学, 2019.
[2]	ZHANG Yiming, XIONG Qi, FAN Yujie, et al. Your Style Your Identity: Leveraging Writing and Photography Styles for Drug Trafficker Identification in Darknet Markets over Attributed Heterogeneous Information Network[C]// WWW. 19th The World Wide Web Conference, May 2019, New York, NY, USA. San Francisco: Association for Computing Machinery, 2019: 3448-3454.
[3]	LI Cong. Research on Virtual Identity Tracing and Forensics[D]. Chongqing: Chongqing University of Posts and Telecommunications, 2018.
	李葱. 虚拟身份追踪与取证研究[D]. 重庆:重庆邮电大学, 2018.
[4]	DING Xiang. Research and Implementation of Tor Darknet Content Discovery and Analysis Technology[D]. Nanjing: Southeast University of China, 2019.
	丁翔. Tor暗网内容发现与分析技术的研究和实现[D]. 南京:东南大学, 2019.
[5]	YANG Yi. Darknet Resource Exploring Technology Research Based on Tor[D]. Shanghai: Shanghai Jiao Tong University, 2018.
	杨溢. 基于Tor的暗网空间资源探测技术研究[D]. 上海:上海交通大学, 2018.
[6]	KADOGUCHI M, HAYASHI S, HANSHIMOTO, et al. Exploring the Dark Web for Cyber Threat Intelligence Using Machine Leaning[C]// IEEE. 2019 IEEE International Conference on Intelligence and Security Informatics (ISI). Shenzhen, China, IEEE, 2019: 200-202.
[7]	ZHANG Shusen, LIANG Xun, MI Baotong, et al. Content-Based Social Network User Identification Methods[J]. Chinese Journal of Computers, 2019,42(8):1739-1754.
	张树森, 梁循, 弭宝瞳, 等. 基于内容的社交网络用户身份识别方法[J]. 计算机学报, 2019,42(8):1739-1754.
[8]	XU Qian, CHEN Hongchang, WU Zheng, et al. User Identification Method Across Social Networks Based on Weighted Hypergraph[J]. Journal of Computer Applications, 2017, 37(12): 3435-3441+3471.
	徐乾, 陈鸿昶, 吴铮, 等. 基于带权超图的跨网络用户身份识别方法[J]. 计算机应用, 2017, 37(12): 3435-3441+3471.
[9]	SITIKHU P, PAHI K, THPAPA P, et al. A Comparison of Semantic Similarity Methods for Maximum Human Interpretability[C]// IEEE. Artificial Intelligence for Transforming Business and Society (AITB). November 2019, Kathmandu, Nepal. New York: IEEE, 2019: 5-5
[10]	LIN Dekang. An Information-theoretic Definition of Similarity[C]// ICML. The 15th International Conference on Machine Learning. July 1998, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc., 1998: 296-304.
[11]	CHEN Erjing, JIANG Enbo. Review of Studies on Text Similarity Measures[J]. Data Analysis and Knowledge Discovery, 2017,1(6):1-11.
	陈二静, 姜恩波. 文本相似度计算方法研究综述[J]. 数据分析与知识发现, 2017,1(6):1-11.
[12]	WANG Chunliu, YANG Yonghui, DENG Fei, et al. A Review of Text Similarity Approaches[J]. Information Science, 2019,37(3):158-168.
	王春柳, 杨永辉, 邓霏, 等. 文本相似度计算方法研究综述[J]. 情报科学, 2019,37(3):158-168.
[13]	ZHANG Shuang, ZHENG Xuefeng, HU Chuangjun. Survey of Semantic Similarity and its Application to Social Network Analysis[C]// IEEE. 2015 IEEE International Conference on Big Data (Big Data), January 2020, Santa Clara, CA, USA. New York: IEEE, 2015: 2362-2367
[14]	HAN Chengcheng, LI Lei, LIU Tingting, et al. Approaches for Semantic Textual Similarity[J]. Journal of East China Normal University(Natural Science), 2020(5):95-112.
	韩程程, 李磊, 刘婷婷, 等. 语义文本相似度计算方法[J]. 华东师范大学学报(自然科学版), 2020(5):95-112.
[15]	YANG Cheng. Research on Semantic Similarity Calculation of Short Text Based on Neural Network[D]. Chengdu: University of Electronic Science and Technology of China, 2020.
	杨晨. 基于神经网络的短文本语义相似度计算方法研究[D]. 成都:电子科技大学, 2020.
[16]	QUOC L, TOMAS M. Distributed Representations of Sentences and Documents[C]// ACM. In Proceedings of the 31st International Conference on International Conference on Machine Learning, June 2014, Beijing, China. New York: JMLR. org, 2014: 1188-1196.
[17]	YVES P. Comparing Sentence Similarity Methods[EB/OL]. http://nlp.town/blog/sentence-similarity/, 2018-05-02.
[18]	WANG Xiangwen, PENG Peng, WANG Chun, et al. You are Your Photographs: Detecting Multiple Identities of Vendors in the Darknet Marketplaces[C]// ACM. ASIA CCS '18: ACM Asia Conference on Computer and Communications Security, June 2018, Incheon Republic of Korea. New York: Association for Computing Machinery, 2018: 431-442.
[19]	SUN Junyi, WANG Dingyuan, XIANG Chao, et al. "Jieba"Chinese Word Segmentation[EB/OL]. https://github.com/fxsjy/jieba, 2020-10-20.
[20]	LUO Ruixuan, XU Jingjing, REN Xuancheng. Pkuseg[EB/OL]. https://github.com/lancopku/pkuseg-python, 2020-06-21.
[21]	RADIM R, PETR S. Genism[EB/OL]. https://radimrehurek.com/gensim/index.html 2019-11-01.

站点名称	交易发帖量/条
交易论坛1	22138
交易论坛2	2297
交易论坛3	263

算法	Top-1 Accurracy
算法	cs	scs
TF-IDF	0.741035857	0.760956175
Word2Vec	0.844621514	0.87250996
Doc2Vec	0.900318725	0.922191235

基于语义的网络交易论坛虚拟身份同一性识别

Virtual Identity Identification Based on Semantic for Network Trading Platform

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 8

参考文献 21

相关文章 15

编辑推荐

Metrics

本文评价

[1]	金志刚, 王新建, 李根, 岳顺民. 融合攻击图和博弈模型的网络防御策略生成方法[J]. 信息网络安全, 2021, 21(1): 1-9.
[2]	尤玮婧, 刘丽敏, 马悦, 韩东. 基于安全硬件的云端数据机密性验证方案[J]. 信息网络安全, 2020, 20(12): 1-8.
[3]	何泾沙, 韩松, 朱娜斐, 葛加可. 基于改进V-detector算法的入侵检测研究与优化[J]. 信息网络安全, 2020, 20(12): 19-27.
[4]	张新跃, 胡安磊, 李炬嵘, 冯燕春. 一种自适应的异常流量检测方法[J]. 信息网络安全, 2020, 20(12): 28-32.
[5]	赵国锋, 周文涛, 徐川, 徐磊. 一种基于双线性配对的天地一体化网络安全身份认证方案[J]. 信息网络安全, 2020, 20(12): 33-39.
[6]	冯雁, 刘念, 谢四江. 一种量子密钥池的双向使用方案[J]. 信息网络安全, 2020, 20(12): 40-46.
[7]	徐国天, 沈耀童. 基于XGBoost和LightGBM双层模型的恶意软件检测方法[J]. 信息网络安全, 2020, 20(12): 54-63.
[8]	毕新亮, 杨海滨, 杨晓元, 黄思远. 基于StarGAN的生成式图像隐写方案[J]. 信息网络安全, 2020, 20(12): 64-71.
[9]	谭杨, 刘嘉勇, 张磊. 基于混合特征的深度自编码器的恶意软件家族分类[J]. 信息网络安全, 2020, 20(12): 72-82.
[10]	王长杰, 李志华, 张叶. 一种针对恶意软件家族的威胁情报生成方法[J]. 信息网络安全, 2020, 20(12): 83-90.
[11]	余北缘, 刘建伟, 周子钰. 自组织网络环境下的节点认证机制研究[J]. 信息网络安全, 2020, 20(12): 9-18.
[12]	张正, 查达仁, 柳亚男, 方旭明. 基于物理不可克隆函数的Kerberos扩展协议及其形式化分析[J]. 信息网络安全, 2020, 20(12): 91-97.
[13]	段晓巍, 韩益亮, 王超, 李喆. 一种RLWE密钥交换协议的公钥复用分析与改进[J]. 信息网络安全, 2020, 20(11): 87-94.
[14]	姜楠, 王玮琦, 王健. 基于智能合约的个人隐私数据保护方法研究[J]. 信息网络安全, 2020, 20(11): 22-31.
[15]	文伟平, 陈夏润, 杨法偿. 基于Rootkit隐藏行为特征的Linux恶意代码取证方法[J]. 信息网络安全, 2020, 20(11): 32-42.