Indexed by:
Abstract:
Organizing webpages into interesting topics is one of the key steps to understand the trends from multimodal Web data. The sparse, noisy, and less-constrained user-generated content results in inefficient feature representations. These descriptors unavoidably cause that a detected topic still contains a certain number of the false detected webpages, which further make a topic be less coherent, less interpretable, and less useful. In this paper, we address this problem from a viewpoint interpreting a topic by its prototypes, and present a two-step approach to achieve this goal. Following the detection-by-ranking approach, a sparse Poisson deconvolution is proposed to learn the intratopic similarities between webpages. To find the prototypes, leveraging the intratopic similarities, top-k diverse yet representative prototype webpages are identified from a submodularity function. Experimental results not only show the improved accuracies for the Web topic detection task, but also increase the interpretation of a topic by its prototypes on two public datasets.
Keyword:
Reprint Author's Address:
Email:
Source :
IEEE TRANSACTIONS ON CYBERNETICS
ISSN: 2168-2267
Year: 2019
Issue: 3
Volume: 49
Page: 1072-1083
1 1 . 8 0 0
JCR@2022
ESI Discipline: COMPUTER SCIENCE;
ESI HC Threshold:147
JCR Journal Grade:1
Cited Count:
WoS CC Cited Count: 2
SCOPUS Cited Count: 5
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 4