A Violent Video Detection Method Based on 3D Convolutional Networks

doi:10.3969/j.issn.1671-1122.2017.12.010

Abstract

Abstract:

With the development of content distribution network and video transcoding technology, network traffic has a trend of being dominated by the video, and there are varieties of illegal special videos flooded the internet, endangering the social public security, so the effective detection algorithm is of great necessity. In order to explore the application of deep learning theory on special video detection, this paper proposes the use of 3D convolutional networks for violence video detection. Compared with traditional manual features and 2D convolutional networks, this method can well protect the motion information integrity of video frames in the time dimension, and realize the efficient characterization of spatio-temporal information. The experiment was carried out on the violent video dataset Hockey, achieving 98.96% accuracy. The results show that the method can effectively detect the violent contents of video.

Key words: violent video detection, 3D convolutional networks, special video

CLC Number:

TP309.1

Wei SONG, Dongliang ZHANG, Zhenguo QI, Nan ZHENG. A Violent Video Detection Method Based on 3D Convolutional Networks[J]. Netinfo Security, 2017, 17(12): 54-60.

Figures/Tables 9

References 32

[1]	KARPATHY A, TODERICI G, SHETTY S, et al.Large-scale Video Classification with Convolutional Neural Networks[C]//IEEE. 2014 IEEE Conference on Computer Vision and Pattern Recognition, June 24-27, 2014, Columbus, Ohio, USA. New York: IEEE, 2014: 1725-1732.
[2]	SIMONYAN K, ZISSERMAN A.Two-stream Convolutional Networks for Action Recognition in Videos[J]. Advances in Neural Information Processing Systems, 2014, 1(4): 568-576.
[3]	JI S, XU W, YANG M, et al.3D Convolutional Neural Networks for Human Action Recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(1): 221-231.
[4]	TRAN D, BOURDEV L, FERGUS R, et al.Learning Spatiotemporal Features with 3D Convolutional Networks[C]//IEEE. 2015 IEEE International Conference on Computer Vision, December 7-13, 2015, Santiago, Chile. New York: IEEE, 2015: 4489-4497.
[5]	WANG L, XIONG Y, WANG Z, et al.Temporal Segment Networks: Towards Good Practices for Deep Action Recognition[C]//IEEE. European Conference on Computer Vision, October 8-16, 2016, Amsterdam, the Netherlands. Cham: Springer International Publishing, 2016: 20-36.
[6]	PFEIFFER S, FISCHER S, EFFELSBERG W, Automatic Audio Content Analysis[C]//ACM. the fourth ACM International Conference on Multimedia, November 18-22, 1996, Boston, Massachusetts, USA. New York: ACM, 1996: 21-30.
[7]	CHENG W H, CHU W T, WU J L.Semantic Context Detection Based on Hierarchical Audio Models[C]//ACM. the 5th ACM SIGMM International Workshop on Multimedia Information Retrieval, November 07-07, 2003, Berkeley, California, USA. New York: ACM, 2003: 109-115.
[8]	RABINER L R.A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition[J]. Readings in Speech Recognition, 1990, 77(2): 267-296.
[9]	GIANNAKOPOULOS T, KOSMOPOULOS D, ARISTIDOU A, et al.Violence Content Classification Using Audio Features[C]//SETN. the 4th Helenic Conference on Advances in Artificial Intelligence, May 18-20, 2006, Heraklion, Greece. Heidelberg: Springer Berlin Heidelberg, 2006:502-507.
[10]	CLARIN C, DIONISIO J, ECHAVEZ M, et al.DOVE: Detection of Movie Violence Using Motion Intensity Analysis on Skin and Blood[J]. PCSC, 2005(6): 150-156.
[11]	NAM J, ALGHONIEMY M, TEWFIK A H.Audio-visual Content-based Violent Scene Characterization[C]//ICIP. 1998 International Conferenceon Image Processing, October 4-7, 1998, Chicago, Illinois, USA. New York: IEEE, 1998:353-357.
[12]	GONG Y, WANG W, JIANG S, et al. Detecting Violent Scenes in Movies by Auditory and Visual Cues[J]. Advances in Multimedia Information Processing-PCM2008(1): 317-326.
[13]	LIN J, WANG W.Weakly-supervised Violence Detection in Movies with Audio and Video Based Co-training[J]. Advances in Multimedia Information Processing-PCM, 2009(1): 930-935.
[14]	GIANNAKOPOULOS T, MAKRIS A, KOSMOPOULOS D, et al.Audio-visual Fusion for Detecting Violent Scenes in Videos[C]// SETN. the 6th Hellenic Conference on Advances in Artificial Intelligence, May 4-7, 2010, Athens, Greece. Cham: Springer International Publishing, 2010: 91-100.
[15]	DATTA A, SHAH M, LOBO N D V. Person-on-person Violence Detection in Video Data[C]// ICPR. the 16th International Conference on Pattern Recognition 2002, August 11-15, 2002, Quebec City, Quebec, Canada. New York: IEEE, 2002: 433-438.
[16]	HASSNER T, ITCHER Y, KLIPER-GROSS O.Violent Flows: Real-time Detection of Violent Crowd Behavior[C]//CVPRW. 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshop, June 16-21, 2015, Providence, RI, USA. New York: IEEE, 2015: 1-6.
[17]	DENIZ O, SERRANO I, BUENO G, et al.Fast Violence Detection in Video[C]// VISAPP. 2014 International Conference on Computer Vision Theory and Applications, January 5-8, 2014, Lisbon, Portugal. New York: IEEE, 2014: 478-485.
[18]	BERMEJO N E, DENIZ S O, Bueno G G, et al.Violence Detection in Video Using Computer Vision Techniques[C]//CAIP. International conference on Computer analysis of images and patterns, August 29-31, 2011, Seville, Spain. Heidelberg: Springer Berlin Heidelberg, 2011: 332-339.
[19]	LAPTEV I, LINDEBERG T.Space-time Interest Points[C]//ICCV. the 9th International Conference on Computer Vision, October 13-16, 2003, Nice, France. New York: IEEE, 2003: 432-439.
[20]	CHEN M Y, HAUPTMANN A.MoSIFT: Recognizing Human Actions in Surveillance Videos[J]. Annals of Pharmacotherapy, 2009, 39(1):150-152.
[21]	CHANG C C, LIN C J.LIBSVM: a Library for Support Vector Machines[J]. ACM Transactions on Intelligent Systems and Technology (TIST), 2011, 2(3): 27.
[22]	XU L, GONG C, YANG J, et al.Violent Video Detection Based on MoSIFTFeature and Sparse Coding[C]// Acoustics, Speech and Signal Processing (ICASSP). 2014 IEEE International Conference on Acoustics, Speech and Signal Processing, May 4-9, 2014, Florence, Italy. New York: IEEE, 2014:3538-3542.
[23]	ROTA P, CONCI N, SEBE N, et al.Real-life Violent Social Interaction Detection[C]//ICIP. 2015 IEEE Image Processing, September 27-30, 2015, QuebecCity, Quebec, Canada. New York: IEEE, 2015: 3456-3460.
[24]	WANG H, SCHMID C.Action Recognition with Improved Trajectories[C]//ICCV. 2013 IEEE International Conference on Computer Vision, December 3-6, 2013, Sydney, Australia. New York: IEEE, 2013: 3551-3558.
[25]	ZHANG T, JIA W, HE X, et al.Discriminative Dictionary Learning with Motion Weber Local Descriptor for Violence Detection[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2017, 27(3): 696-709.
[26]	CHEN J, SHAN S, He C, et al.WLD: A Robust Local Image Descriptor[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(9):1705-1720.
[27]	WRIGHT J, YANG A Y, GANESH A, et al.Robust Face Recognition via Sparse Representation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(2): 210-227.
[28]	DING C, FAN S, ZHU M, et al.Violence Detection in Video by Using 3D Convolutional Neural Networks[M]. New York: Springer International Publishing, 2014.
[29]	DAI Q, ZHAO R W, WU Z, et al. Fudan-Huawei at MediaEval 2015: Detecting Violent Scenes and Affective Impact in Movies with Deep Learning[EB/OL]. .
[30]	GERS F A, SCHMIDHUBER J, CUMMINS F.Learning to Forget: Continual Prediction with LSTM[J]. Neural Computation, 2000(1): 2451-2471.
[31]	ZHOU Peipei, DING Qinghai, LUO Haibo, et al.Violent Interaction Detection in Video Based on Deep Learning[C]//Jiangsu Optical Society,Southeast University. The Optical Society of America. 6th Conference on Advances in Optoelectronics and Micro/Nano-Optics, AOM 2017, April 23-26, 2017, Nanjing, China. Bristol: IOP Publishing, 2017: 012044.
[32]	HUANG G B, ZHU Q Y, SIEW C K.Extreme Learning Machine: Theory and Applications[J]. Neurocomputing, 2006, 70(1): 489-501.