Netinfo Security ›› 2017, Vol. 17 ›› Issue (1): 57-62.doi: 10.3969/j.issn.1671-1122.2017.01.009

• Orginal Article • Previous Articles     Next Articles

Research of Weibo Short Text Classification Based on Word2vec

Qian ZHANG(), Zhangmin GAO, Jiayong LIU   

  1. College of Electronics and Information Engineering of Sichuan University, Chengdu Sichuan 610065, China
  • Received:2016-10-01 Online:2017-01-20 Published:2020-05-12

Abstract:

With the rapid expansion of new available information on Microblogging and other social media. Text automatic classification becomes imperative in order to help people locate the information he inquires and filter spam. Based on the characteristics of curse of dimensionality and lack of semantic features in Traditional text classification model, put forward a short text classify based on Word2vec model.Since Word2vec can not distinguish the weight of words, we applied weights using tf-idf weighting with Word2vec, implemented weighted Word2vec. Then we concatenated tf-idf with our word2vec weighted by tf-idf. Our results show that the combination of Word2vec weighted by tf-idf without stop words and tf-idf without stop words can outperform either Word2vec weighted by tf-idf without stop words and tf-idf with or without stop word.

Key words: short text classification, Word2vec, TFIDF, SVM

CLC Number: