收录:
摘要:
Microblogging filtering can help users filter out irrelevant content, and extract timely content effectively from microblogs. However, as a typical short text, microblogging filtering suffers from the insufficient samples problem that makes the probabilisticlike models unreliable. According to the current research, an explicit brief query has been thought to be only an abstract of the user's information needs, and it's hard to infer what is the users' actual searching intents. Instead, we submit the relevant external documents as a user's implicit prior knowledge and then build a corresponding filtering framework. To against the risk of external documents expansion, we suppose the external document can be viewed as a complete statement of an explicit query, and encode the filtering preferences with the diverge degree between the external document and the the original explicit query. Thus the optimal filtering action is the one that allows one to trade off diverge degree against generalization performance. With respect to the established baselines, our algorithm yields compelling results for providing a meaningful tweets retrieval. This work helps further understand the innate risk characteristics of external expansion for the design of Microblogging filtering systems.
关键词:
通讯作者信息: