收录:
摘要:
Document similarity computation is an exciting research topic in information retrieval (IR) and it is a key issue for automatic document categorization, clustering analysis, fuzzy query and question answering. Topic model is an emerging field in natural language processing ( NLP), IR and machine learning (ML). In this paper, we apply a latent Dirichlet allocation (LDA) topic modelbased method to compute similarity between documents. By mapping a document with term space representation into a topic space, a distribution over topics derived for computing document similarity. An empirical study using real data set demonstrates the efficiency of our method.
关键词:
通讯作者信息:
电子邮件地址: