信息网络安全 ›› 2017, Vol. 17 ›› Issue (12): 54-60.doi: 10.3969/j.issn.1671-1122.2017.12.010
宋伟1, 张栋梁1, 齐振国2, 郑男1
Wei SONG1, Dongliang ZHANG1, Zhenguo QI2, Nan ZHENG1
宋伟, 张栋梁, 齐振国, 郑男. 一种基于三维卷积网络的暴力视频检测方法[J]. 信息网络安全, 2017, 17(12): 54-60.
Wei SONG, Dongliang ZHANG, Zhenguo QI, Nan ZHENG. A Violent Video Detection Method Based on 3D Convolutional Networks[J]. Netinfo Security, 2017, 17(12): 54-60.
[1] | KARPATHY A, TODERICI G, SHETTY S, et al.Large-scale Video Classification with Convolutional Neural Networks[C]//IEEE. 2014 IEEE Conference on Computer Vision and Pattern Recognition, June 24-27, 2014, Columbus, Ohio, USA. New York: IEEE, 2014: 1725-1732. |
[2] | SIMONYAN K, ZISSERMAN A.Two-stream Convolutional Networks for Action Recognition in Videos[J]. Advances in Neural Information Processing Systems, 2014, 1(4): 568-576. |
[3] | JI S, XU W, YANG M, et al.3D Convolutional Neural Networks for Human Action Recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(1): 221-231. |
[4] | TRAN D, BOURDEV L, FERGUS R, et al.Learning Spatiotemporal Features with 3D Convolutional Networks[C]//IEEE. 2015 IEEE International Conference on Computer Vision, December 7-13, 2015, Santiago, Chile. New York: IEEE, 2015: 4489-4497. |
[5] | WANG L, XIONG Y, WANG Z, et al.Temporal Segment Networks: Towards Good Practices for Deep Action Recognition[C]//IEEE. European Conference on Computer Vision, October 8-16, 2016, Amsterdam, the Netherlands. Cham: Springer International Publishing, 2016: 20-36. |
[6] | PFEIFFER S, FISCHER S, EFFELSBERG W, Automatic Audio Content Analysis[C]//ACM. the fourth ACM International Conference on Multimedia, November 18-22, 1996, Boston, Massachusetts, USA. New York: ACM, 1996: 21-30. |
[7] | CHENG W H, CHU W T, WU J L.Semantic Context Detection Based on Hierarchical Audio Models[C]//ACM. the 5th ACM SIGMM International Workshop on Multimedia Information Retrieval, November 07-07, 2003, Berkeley, California, USA. New York: ACM, 2003: 109-115. |
[8] | RABINER L R.A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition[J]. Readings in Speech Recognition, 1990, 77(2): 267-296. |
[9] | GIANNAKOPOULOS T, KOSMOPOULOS D, ARISTIDOU A, et al.Violence Content Classification Using Audio Features[C]//SETN. the 4th Helenic Conference on Advances in Artificial Intelligence, May 18-20, 2006, Heraklion, Greece. Heidelberg: Springer Berlin Heidelberg, 2006:502-507. |
[10] | CLARIN C, DIONISIO J, ECHAVEZ M, et al.DOVE: Detection of Movie Violence Using Motion Intensity Analysis on Skin and Blood[J]. PCSC, 2005(6): 150-156. |
[11] | NAM J, ALGHONIEMY M, TEWFIK A H.Audio-visual Content-based Violent Scene Characterization[C]//ICIP. 1998 International Conferenceon Image Processing, October 4-7, 1998, Chicago, Illinois, USA. New York: IEEE, 1998:353-357. |
[12] | GONG Y, WANG W, JIANG S, et al. Detecting Violent Scenes in Movies by Auditory and Visual Cues[J]. Advances in Multimedia Information Processing-PCM2008(1): 317-326. |
[13] | LIN J, WANG W.Weakly-supervised Violence Detection in Movies with Audio and Video Based Co-training[J]. Advances in Multimedia Information Processing-PCM, 2009(1): 930-935. |
[14] | GIANNAKOPOULOS T, MAKRIS A, KOSMOPOULOS D, et al.Audio-visual Fusion for Detecting Violent Scenes in Videos[C]// SETN. the 6th Hellenic Conference on Advances in Artificial Intelligence, May 4-7, 2010, Athens, Greece. Cham: Springer International Publishing, 2010: 91-100. |
[15] | DATTA A, SHAH M, LOBO N D V. Person-on-person Violence Detection in Video Data[C]// ICPR. the 16th International Conference on Pattern Recognition 2002, August 11-15, 2002, Quebec City, Quebec, Canada. New York: IEEE, 2002: 433-438. |
[16] | HASSNER T, ITCHER Y, KLIPER-GROSS O.Violent Flows: Real-time Detection of Violent Crowd Behavior[C]//CVPRW. 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshop, June 16-21, 2015, Providence, RI, USA. New York: IEEE, 2015: 1-6. |
[17] | DENIZ O, SERRANO I, BUENO G, et al.Fast Violence Detection in Video[C]// VISAPP. 2014 International Conference on Computer Vision Theory and Applications, January 5-8, 2014, Lisbon, Portugal. New York: IEEE, 2014: 478-485. |
[18] | BERMEJO N E, DENIZ S O, Bueno G G, et al.Violence Detection in Video Using Computer Vision Techniques[C]//CAIP. International conference on Computer analysis of images and patterns, August 29-31, 2011, Seville, Spain. Heidelberg: Springer Berlin Heidelberg, 2011: 332-339. |
[19] | LAPTEV I, LINDEBERG T.Space-time Interest Points[C]//ICCV. the 9th International Conference on Computer Vision, October 13-16, 2003, Nice, France. New York: IEEE, 2003: 432-439. |
[20] | CHEN M Y, HAUPTMANN A.MoSIFT: Recognizing Human Actions in Surveillance Videos[J]. Annals of Pharmacotherapy, 2009, 39(1):150-152. |
[21] | CHANG C C, LIN C J.LIBSVM: a Library for Support Vector Machines[J]. ACM Transactions on Intelligent Systems and Technology (TIST), 2011, 2(3): 27. |
[22] | XU L, GONG C, YANG J, et al.Violent Video Detection Based on MoSIFTFeature and Sparse Coding[C]// Acoustics, Speech and Signal Processing (ICASSP). 2014 IEEE International Conference on Acoustics, Speech and Signal Processing, May 4-9, 2014, Florence, Italy. New York: IEEE, 2014:3538-3542. |
[23] | ROTA P, CONCI N, SEBE N, et al.Real-life Violent Social Interaction Detection[C]//ICIP. 2015 IEEE Image Processing, September 27-30, 2015, QuebecCity, Quebec, Canada. New York: IEEE, 2015: 3456-3460. |
[24] | WANG H, SCHMID C.Action Recognition with Improved Trajectories[C]//ICCV. 2013 IEEE International Conference on Computer Vision, December 3-6, 2013, Sydney, Australia. New York: IEEE, 2013: 3551-3558. |
[25] | ZHANG T, JIA W, HE X, et al.Discriminative Dictionary Learning with Motion Weber Local Descriptor for Violence Detection[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2017, 27(3): 696-709. |
[26] | CHEN J, SHAN S, He C, et al.WLD: A Robust Local Image Descriptor[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(9):1705-1720. |
[27] | WRIGHT J, YANG A Y, GANESH A, et al.Robust Face Recognition via Sparse Representation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(2): 210-227. |
[28] | DING C, FAN S, ZHU M, et al.Violence Detection in Video by Using 3D Convolutional Neural Networks[M]. New York: Springer International Publishing, 2014. |
[29] | DAI Q, ZHAO R W, WU Z, et al. Fudan-Huawei at MediaEval 2015: Detecting Violent Scenes and Affective Impact in Movies with Deep Learning[EB/OL]. . |
[30] | GERS F A, SCHMIDHUBER J, CUMMINS F.Learning to Forget: Continual Prediction with LSTM[J]. Neural Computation, 2000(1): 2451-2471. |
[31] | ZHOU Peipei, DING Qinghai, LUO Haibo, et al.Violent Interaction Detection in Video Based on Deep Learning[C]//Jiangsu Optical Society,Southeast University. The Optical Society of America. 6th Conference on Advances in Optoelectronics and Micro/Nano-Optics, AOM 2017, April 23-26, 2017, Nanjing, China. Bristol: IOP Publishing, 2017: 012044. |
[32] | HUANG G B, ZHU Q Y, SIEW C K.Extreme Learning Machine: Theory and Applications[J]. Neurocomputing, 2006, 70(1): 489-501. |
[1] | 任栋, 宋伟, 于京, 姜薇. 特殊视频内容检测算法研究综述[J]. 信息网络安全, 2016, 16(9): 184-191. |
阅读次数 | ||||||
全文 |
摘要 |