收录:
摘要:
Unknown word recognition is one of the important research contents of natural language processing. However, there are still problems such as sparse data, corpus noise, and various forms of expressions for the identification of micro-blog short words. This paper proposes an unknown words recognition method POS-FP (Frequent Pattern growth with part- of-speech)for micro-blog short text. Firstly, the candidate unknown words are obtained by combing the N-grams model and frequent item sets. Then the unknown word is filtered and verified by the improved mutual information, information entropy and context dependence. Finally, the open verification method is used to obtain final unknown word. Experiments show that the algorithm improved the unknown word recognition for micro-blog short texts. © 2018 IEEE.
关键词:
通讯作者信息:
电子邮件地址:
来源 :
年份: 2018
页码: 1-7
语种: 英文
归属院系: