收录:
摘要:
To quickly grasp what interesting topics are happening on web, it is challenge to discover and describe topics from User-Generated Content (UGC) data. Describing topics by probable keywords and prototype images is an efficient human-machine interaction to help person quickly grasp a topic. However, except for the challenges from web topic detection, mining the multi-media description is a challenge task that the conventional approaches can barely handle: (1) noises from non-informative short texts or images due to less-constrained UGC; and (2) even for these informative images, the gaps between visual concepts and social ones. This paper addresses above challenges from the perspective of background similarity remove, and proposes a two-step approach to mining the multi-media description from noisy data. First, we utilize a devcovolution model to strip the similarities among non-informative words/images during web topic detection. Second, the background-removed similarities are reconstructed to identify the probable keywords and prototype images during topic description. By removing background similarities, we can generate coherent and informative multi-media description for a topic. Experiments show that the proposed method produces a high quality description on two public datasets. (C) 2017 Elsevier B.V. All rights reserved.
关键词:
通讯作者信息:
电子邮件地址: