一种基于三维卷积网络的暴力视频检测方法

doi:10.3969/j.issn.1671-1122.2017.12.010

信息网络安全 ›› 2017, Vol. 17 ›› Issue (12): 54-60.doi: 10.3969/j.issn.1671-1122.2017.12.010

一种基于三维卷积网络的暴力视频检测方法

宋伟¹, 张栋梁¹, 齐振国², 郑男¹

1.中央民族大学信息工程学院,北京 100081
2.北京交通大学电子信息工程学院,北京 100044

收稿日期:2017-09-01 出版日期:2017-12-20 发布日期:2020-05-12
作者简介:
作者简介：宋伟（1983—）,男,湖北,讲师,博士,主要研究方向为图像处理、视频内容识别;张栋梁（1991—）,男,山东,硕士研究生,主要研究方向为视频内容检测、视频行为识别;齐振国（1989—）,男,山西,博士研究生,主要研究方向为信号处理、机器学习;郑男（1994—）,女,山西,硕士研究生,主要研究方向为图像处理。
基金资助:
国家自然科学基金[61503424]

A Violent Video Detection Method Based on 3D Convolutional Networks

Wei SONG¹, Dongliang ZHANG¹, Zhenguo QI², Nan ZHENG¹

1.School of Information Engineering, Minzu University of China, Beijing 100081, China
2. School of Electronic Information Engineering, Beijing Jiaotong University, Beijing 100044, China

Received:2017-09-01 Online:2017-12-20 Published:2020-05-12

摘要/Abstract

摘要：

随着内容分发网络和视频转码技术的发展,网络流量呈现视频化趋势,互联网中充斥着各种非法特殊视频,危害社会公共安全,急需有效的检测算法。为探索深度学习理论在特殊视频检测上的应用,文章提出采用三维卷积网络框架进行暴力视频检测。相比于传统手工特征和2D卷积网络,该方法可以较好地保护视频帧序列在时间维度上运动信息的完整性,实现对暴力视频时空信息的有效表征。在暴力视频数据集Hockey上进行实验,取得了98.96%的准确率。实验结果表明该方法能够有效地检测暴力视频内容。

关键词: 暴力视频检测, 三维卷积网络, 特殊视频

Abstract:

With the development of content distribution network and video transcoding technology, network traffic has a trend of being dominated by the video, and there are varieties of illegal special videos flooded the internet, endangering the social public security, so the effective detection algorithm is of great necessity. In order to explore the application of deep learning theory on special video detection, this paper proposes the use of 3D convolutional networks for violence video detection. Compared with traditional manual features and 2D convolutional networks, this method can well protect the motion information integrity of video frames in the time dimension, and realize the efficient characterization of spatio-temporal information. The experiment was carried out on the violent video dataset Hockey, achieving 98.96% accuracy. The results show that the method can effectively detect the violent contents of video.

Key words: violent video detection, 3D convolutional networks, special video

中图分类号:

TP309.1

宋伟, 张栋梁, 齐振国, 郑男. 一种基于三维卷积网络的暴力视频检测方法[J]. 信息网络安全, 2017, 17(12): 54-60.

Wei SONG, Dongliang ZHANG, Zhenguo QI, Nan ZHENG. A Violent Video Detection Method Based on 3D Convolutional Networks[J]. Netinfo Security, 2017, 17(12): 54-60.

图/表 9

图1

图2

图3

图4

图5

表1

表2

图6

表3

参考文献 32

[1]	KARPATHY A, TODERICI G, SHETTY S, et al.Large-scale Video Classification with Convolutional Neural Networks[C]//IEEE. 2014 IEEE Conference on Computer Vision and Pattern Recognition, June 24-27, 2014, Columbus, Ohio, USA. New York: IEEE, 2014: 1725-1732.
[2]	SIMONYAN K, ZISSERMAN A.Two-stream Convolutional Networks for Action Recognition in Videos[J]. Advances in Neural Information Processing Systems, 2014, 1(4): 568-576.
[3]	JI S, XU W, YANG M, et al.3D Convolutional Neural Networks for Human Action Recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(1): 221-231.
[4]	TRAN D, BOURDEV L, FERGUS R, et al.Learning Spatiotemporal Features with 3D Convolutional Networks[C]//IEEE. 2015 IEEE International Conference on Computer Vision, December 7-13, 2015, Santiago, Chile. New York: IEEE, 2015: 4489-4497.
[5]	WANG L, XIONG Y, WANG Z, et al.Temporal Segment Networks: Towards Good Practices for Deep Action Recognition[C]//IEEE. European Conference on Computer Vision, October 8-16, 2016, Amsterdam, the Netherlands. Cham: Springer International Publishing, 2016: 20-36.
[6]	PFEIFFER S, FISCHER S, EFFELSBERG W, Automatic Audio Content Analysis[C]//ACM. the fourth ACM International Conference on Multimedia, November 18-22, 1996, Boston, Massachusetts, USA. New York: ACM, 1996: 21-30.
[7]	CHENG W H, CHU W T, WU J L.Semantic Context Detection Based on Hierarchical Audio Models[C]//ACM. the 5th ACM SIGMM International Workshop on Multimedia Information Retrieval, November 07-07, 2003, Berkeley, California, USA. New York: ACM, 2003: 109-115.
[8]	RABINER L R.A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition[J]. Readings in Speech Recognition, 1990, 77(2): 267-296.
[9]	GIANNAKOPOULOS T, KOSMOPOULOS D, ARISTIDOU A, et al.Violence Content Classification Using Audio Features[C]//SETN. the 4th Helenic Conference on Advances in Artificial Intelligence, May 18-20, 2006, Heraklion, Greece. Heidelberg: Springer Berlin Heidelberg, 2006:502-507.
[10]	CLARIN C, DIONISIO J, ECHAVEZ M, et al.DOVE: Detection of Movie Violence Using Motion Intensity Analysis on Skin and Blood[J]. PCSC, 2005(6): 150-156.
[11]	NAM J, ALGHONIEMY M, TEWFIK A H.Audio-visual Content-based Violent Scene Characterization[C]//ICIP. 1998 International Conferenceon Image Processing, October 4-7, 1998, Chicago, Illinois, USA. New York: IEEE, 1998:353-357.
[12]	GONG Y, WANG W, JIANG S, et al. Detecting Violent Scenes in Movies by Auditory and Visual Cues[J]. Advances in Multimedia Information Processing-PCM2008(1): 317-326.
[13]	LIN J, WANG W.Weakly-supervised Violence Detection in Movies with Audio and Video Based Co-training[J]. Advances in Multimedia Information Processing-PCM, 2009(1): 930-935.
[14]	GIANNAKOPOULOS T, MAKRIS A, KOSMOPOULOS D, et al.Audio-visual Fusion for Detecting Violent Scenes in Videos[C]// SETN. the 6th Hellenic Conference on Advances in Artificial Intelligence, May 4-7, 2010, Athens, Greece. Cham: Springer International Publishing, 2010: 91-100.
[15]	DATTA A, SHAH M, LOBO N D V. Person-on-person Violence Detection in Video Data[C]// ICPR. the 16th International Conference on Pattern Recognition 2002, August 11-15, 2002, Quebec City, Quebec, Canada. New York: IEEE, 2002: 433-438.
[16]	HASSNER T, ITCHER Y, KLIPER-GROSS O.Violent Flows: Real-time Detection of Violent Crowd Behavior[C]//CVPRW. 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshop, June 16-21, 2015, Providence, RI, USA. New York: IEEE, 2015: 1-6.
[17]	DENIZ O, SERRANO I, BUENO G, et al.Fast Violence Detection in Video[C]// VISAPP. 2014 International Conference on Computer Vision Theory and Applications, January 5-8, 2014, Lisbon, Portugal. New York: IEEE, 2014: 478-485.
[18]	BERMEJO N E, DENIZ S O, Bueno G G, et al.Violence Detection in Video Using Computer Vision Techniques[C]//CAIP. International conference on Computer analysis of images and patterns, August 29-31, 2011, Seville, Spain. Heidelberg: Springer Berlin Heidelberg, 2011: 332-339.
[19]	LAPTEV I, LINDEBERG T.Space-time Interest Points[C]//ICCV. the 9th International Conference on Computer Vision, October 13-16, 2003, Nice, France. New York: IEEE, 2003: 432-439.
[20]	CHEN M Y, HAUPTMANN A.MoSIFT: Recognizing Human Actions in Surveillance Videos[J]. Annals of Pharmacotherapy, 2009, 39(1):150-152.
[21]	CHANG C C, LIN C J.LIBSVM: a Library for Support Vector Machines[J]. ACM Transactions on Intelligent Systems and Technology (TIST), 2011, 2(3): 27.
[22]	XU L, GONG C, YANG J, et al.Violent Video Detection Based on MoSIFTFeature and Sparse Coding[C]// Acoustics, Speech and Signal Processing (ICASSP). 2014 IEEE International Conference on Acoustics, Speech and Signal Processing, May 4-9, 2014, Florence, Italy. New York: IEEE, 2014:3538-3542.
[23]	ROTA P, CONCI N, SEBE N, et al.Real-life Violent Social Interaction Detection[C]//ICIP. 2015 IEEE Image Processing, September 27-30, 2015, QuebecCity, Quebec, Canada. New York: IEEE, 2015: 3456-3460.
[24]	WANG H, SCHMID C.Action Recognition with Improved Trajectories[C]//ICCV. 2013 IEEE International Conference on Computer Vision, December 3-6, 2013, Sydney, Australia. New York: IEEE, 2013: 3551-3558.
[25]	ZHANG T, JIA W, HE X, et al.Discriminative Dictionary Learning with Motion Weber Local Descriptor for Violence Detection[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2017, 27(3): 696-709.
[26]	CHEN J, SHAN S, He C, et al.WLD: A Robust Local Image Descriptor[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(9):1705-1720.
[27]	WRIGHT J, YANG A Y, GANESH A, et al.Robust Face Recognition via Sparse Representation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(2): 210-227.
[28]	DING C, FAN S, ZHU M, et al.Violence Detection in Video by Using 3D Convolutional Neural Networks[M]. New York: Springer International Publishing, 2014.
[29]	DAI Q, ZHAO R W, WU Z, et al. Fudan-Huawei at MediaEval 2015: Detecting Violent Scenes and Affective Impact in Movies with Deep Learning[EB/OL]. .
[30]	GERS F A, SCHMIDHUBER J, CUMMINS F.Learning to Forget: Continual Prediction with LSTM[J]. Neural Computation, 2000(1): 2451-2471.
[31]	ZHOU Peipei, DING Qinghai, LUO Haibo, et al.Violent Interaction Detection in Video Based on Deep Learning[C]//Jiangsu Optical Society,Southeast University. The Optical Society of America. 6th Conference on Advances in Optoelectronics and Micro/Nano-Optics, AOM 2017, April 23-26, 2017, Nanjing, China. Bristol: IOP Publishing, 2017: 012044.
[32]	HUANG G B, ZHU Q Y, SIEW C K.Extreme Learning Machine: Theory and Applications[J]. Neurocomputing, 2006, 70(1): 489-501.

编辑推荐 0

Metrics

阅读次数

全文

202

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	1	0	0	201

来源	本网站	其他网站

次数	162	40
比例	80%	20%

摘要

363

最新录用	在线预览	正式出版

0	0	363

	来源	本网站

	次数	363
	比例	100%

一种基于三维卷积网络的暴力视频检测方法

A Violent Video Detection Method Based on 3D Convolutional Networks

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 9

参考文献 32

相关文章 1

编辑推荐 0

Metrics

本文评价