收录:
摘要:
At present, the network pyramid scheme has become a major tumor that hinders social development. In order to curb the propagation of the network pyramid scheme and effectively identify the pyramid scheme text in the network, this study proposes a joint topic model, Paragraph Vector Latent Dirichlet Allocation (PV_LDA), based on the characteristics of high-yield, high rebate, hierarchical salary and text topic diversity described in the text. The model uses the paragraph as the minimum processing unit to generate the topic distribution matrix of "high-interest rate" and "hierarchical salary" from the network pyramid scheme text. The Gibbs sampling is used to derive the "pyramid scheme" topic distribution matrix represented by the two features, which is used for classification processing by the classifier. the classification accuracy rate for the network pyramid scheme text can reach 86.25%. The conclusions show that the topic model proposed in this paper can capture the characteristics of the pyramid scheme more reasonably.
关键词:
通讯作者信息:
电子邮件地址: