Netinfo Security ›› 2017, Vol. 17 ›› Issue (3): 46-52.doi: 10.3969/j.issn.1671-1122.2017.03.008

• Orginal Article • Previous Articles     Next Articles

Research on the Algorithm of Short Text Representation Based on Graph Structure

Hao REN, Senlin LUO(), Limin PAN, Junfeng GAO   

  1. Information System and Security & Countermeasures Experimental Center, Beijing Institute of Technology, Beijing 100081, China
  • Received:2016-12-09 Online:2017-03-20 Published:2020-05-12

Abstract:

This paper proposes a text representation method based on graph structure, the fusion topic model LDA and denoising automatic coder in deep learning, which is based on the vector space model to solve the problem of text representation for each word in isolation. Based on the information of the bag model, this paper constructs a two-dimensional matrix of uniform dimension by using the information of words and words. By using the LDA’s topic and the probability relation of the words, the main information in the original matrix is trained. Training denoising autoencoder machine model to obtain the final text representation. Based on the 20 categories of newsgroups that publicize the data source 20Newsgroup, the results of the text representations are verified using a categorical approach. The results show that this method is superior to other methods of text representation in 1-NN and SVM classification methods. Therefore, the introduction of information between words and words can enrich the meaning of the sentence, enhance the understanding of the deep meaning of the text content, and effectively improve the application effect of the text classification.

Key words: text representation, deep learning, denoising autoencoder, topic model, text classification

CLC Number: