收录:
摘要:
In multi-media and social media communities, web topic detection poses two main difficulties that conventional approaches can barely handle: 1) there are large inter-topic variations among web topics; 2) supervised information is rare to identify the real topics. In this paper, we address these problems from the similarity diffusion perspective among objects on web, and present a clustering-like pattern across similarity cascades (SCs). SCs are a series of subgraphs generated by truncating a weighted graph with a set of thresholds, and then maximal cliques are used to describe the topic candidates. Poisson deconvolution is adopted to efficiently identify the real topics from these topic candidates. Experiments demonstrate that our approach outperforms the state-of-the-arts on two datasets. In addition, we report accuracy v.s. false positives per topic (FPPT) curves for performance evaluation. To our knowledge, this is the first complete evaluation of web topic detection at the topic-wise level, and it establishes a new benchmark for this problem. © 2014 IEEE.
关键词:
通讯作者信息:
电子邮件地址: