收录:
摘要:
Organizing webpages into interesting topics is one of the key steps to understand the trends from multimodal Web data. The sparse, noisy, and less-constrained user-generated content results in inefficient feature representations. These descriptors unavoidably cause that a detected topic still contains a certain number of the false detected webpages, which further make a topic be less coherent, less interpretable, and less useful. In this paper, we address this problem from a viewpoint interpreting a topic by its prototypes, and present a two-step approach to achieve this goal. Following the detection-by-ranking approach, a sparse Poisson deconvolution is proposed to learn the intratopic similarities between webpages. To find the prototypes, leveraging the intratopic similarities, top-k diverse yet representative prototype webpages are identified from a submodularity function. Experimental results not only show the improved accuracies for the Web topic detection task, but also increase the interpretation of a topic by its prototypes on two public datasets.
关键词:
通讯作者信息:
电子邮件地址: