Document Similarity Measure Based on Topic Model - 文章详情页

作者：

He, Ming (He, Ming.) | Wang, Zhen-zhen (Wang, Zhen-zhen.) | Du, Yong-ping (Du, Yong-ping.) (学者：杜永萍)

收录：

CPCI-S EI Scopus

摘要：

Document　similarity　computation　is　an　exciting　research　topic　in　information　retrieval　(IR)　and　it　is　a　key　issue　for　automatic　document　categorization,　clustering　analysis,　fuzzy　query　and　question　answering.　Topic　model　is　an　emerging　field　in　natural　language　processing　(　NLP),　IR　and　machine　learning　(ML).　In　this　paper,　we　apply　a　latent　Dirichlet　allocation　(LDA)　topic　modelbased　method　to　compute　similarity　between　documents.　By　mapping　a　document　with　term　space　representation　into　a　topic　space,　a　distribution　over　topics　derived　for　computing　document　similarity.　An　empirical　study　using　real　data　set　demonstrates　the　efficiency　of　our　method.

关键词：

document similarity computation topic model latent Dirichlet allocation

作者机构：

[ 1 ] [He, Ming]Beijing Univ Technol, Coll Comp Sci, Beijing, Peoples R China
[ 2 ] [Wang, Zhen-zhen]Beijing Univ Technol, Coll Comp Sci, Beijing, Peoples R China
[ 3 ] [Du, Yong-ping]Beijing Univ Technol, Coll Comp Sci, Beijing, Peoples R China

通讯作者信息：

[He, Ming]Beijing Univ Technol, Coll Comp Sci, Beijing, Peoples R China

电子邮件地址：

heming@bjut.edu.cn |
wangzhen820355@126.com |
ypdu@bjut.edu.cn

查看成果更多字段

成果类型
所属机构

所有年份指定年份从至