信息网络安全 ›› 2018, Vol. 18 ›› Issue (5): 75-81.doi: 10.3969/j.issn.1671-1122.2018.05.009

• • 上一篇    下一篇

基于递归神经网络的中文事件检测

马晨曦1,2, 陈兴蜀2,3, 王文贤2,3, 王海舟2,3()   

  1. 1.四川大学计算机学院网络与可信计算研究所,四川成都 610065
    2.四川大学网络空间安全研究院,四川成都 610065
    3.四川大学网络空间安全学院,四川成都 610065
  • 收稿日期:2017-11-26 出版日期:2018-05-15 发布日期:2020-05-11
  • 作者简介:

    作者简介:马晨曦(1993—),女,河北,硕士研究生,主要研究方向为舆情分析、网络安全;陈兴蜀(1968—),女,四川,教授,博士,主要研究方向为云计算和大数据安全、舆情分析;王文贤(1978—),男,福建,讲师,博士,主要研究方向为网络空间安全、舆情分析和挖掘;王海舟(1986—),男,四川,讲师,博士,主要研究方向为网络空间安全、舆情分析和挖掘。

  • 基金资助:
    国家自然科学基金[61272447];四川省科技厅计划项目[16ZHSF0483]

Chinese Event Detection Based on Recurrent Neural Network

Chenxi MA1,2, Xingshu CHEN2,3, Wenxian WANG2,3, Haizhou WANG2,3()   

  1. 1. Network and Trusted Computing Institute, Computer College, Sichuan University, Chengdu Sichuan 610065, China
    2.Cybersecurity Research Institute, Sichuan University, Chengdu Sichuan 610065, China
    3.College of Cybersecurity, Sichuan University, Chengdu Sichuan 610065, China
  • Received:2017-11-26 Online:2018-05-15 Published:2020-05-11

摘要:

随着互联网的迅猛发展,我国网民的规模迅速增长,互联网对人们的生活和社会影响力也日益加深,面对日益增长的海量互联网信息,快速定位到公众讨论的事件变得至关重要。事件抽取是信息抽取领域的一个重要研究方向,事件检测是事件抽取任务的第一步,在事件抽取任务中起到至关重要的作用。文章采用了基于递归神经网络的事件检测联合模型,实现了对事件触发词的识别和事件类别的分类。与传统的触发词检测方法相比,本文提出的联合模型避免了误差的传播,不依赖于触发词表的构造和扩展,有很好的移植性,而且不需要设计复杂的语言特征,依赖神经网络自动学习特征。文章选用CEC语料库作为训练语料和测试语料,实验结果表明该方法识别触发词和事件类别的准确率和召回率较高,F值为70.2%,优于传统方法。

关键词: 事件检测, 触发词, 事件抽取, 递归神经网络, 词向量

Abstract:

With the development of Internet, the size of the Internet users has grown rapidly. The Internet has become more and more important to people’s life and social influence. In the face of the growing mass of Internet information, it is vital to quickly locate the events of public discussion. Event extraction is an important research in the field of information extraction. Event detection is the first step in the event extraction task, which plays a crucial role in the event extraction task.We designed a joint model based on recurrent neural network, to realize the recognition of event trigger and the classification of event category. Compared with the traditional method, our joint model can avoid error propagation, it doesn’t depend on the table of the trigger word and has good portability, and doesn’t need to design complex linguistic features.We used CEC corpus as training corpus and test corpus. The experimental results show that accuracy rate of the trigger word and event category is high, and the F value is 70.2%, better than the traditional method.

Key words: event detection, trigger, event extraction, RNN, word embedding

中图分类号: