收录:
摘要:
Using the subject of the question can locate the question area, narrow the scope of the query, and provide users with better answers. The question text is usually short text. Therefore, in view of its sparse features and irregular structure, this paper proposes an identification method of question subjects based on word embedding and LSTM (IQS-WE-L), and uses question set on the MadSci website for experimentation, which has three subjects. We firstly use the Word2vec to train the Wikipedia database to generate a dictionary. Then based on word vectors, we propose four feature extraction methods: W2V, W2V-TFIDF, W2V-c-TFIDF and W2V-c, which formalizes the text features into vectors through word embedding and other features. Finally, we build an LSTM network for classification training to identify the subject of the question and quantitative evaluate effect of four feature extraction methods we proposed. Experimental data shows that the method proposed in this paper can effectively identify the subject of the question. When classifying the subject of the question, the F1 value can reach a maximum of 0.9339. © Published under licence by IOP Publishing Ltd.
关键词:
通讯作者信息:
电子邮件地址:
来源 :
ISSN: 1742-6588
年份: 2020
期: 1
卷: 1631
语种: 英文
归属院系: