Indexed by:
Abstract:
At present, the network pyramid scheme has become a major tumor that hinders social development. In order to curb the propagation of the network pyramid scheme and effectively identify the pyramid scheme text in the network, this study proposes a joint topic model, Paragraph Vector Latent Dirichlet Allocation (PV_LDA), based on the characteristics of high-yield, high rebate, hierarchical salary and text topic diversity described in the text. The model uses the paragraph as the minimum processing unit to generate the topic distribution matrix of "high-interest rate" and "hierarchical salary" from the network pyramid scheme text. The Gibbs sampling is used to derive the "pyramid scheme" topic distribution matrix represented by the two features, which is used for classification processing by the classifier. the classification accuracy rate for the network pyramid scheme text can reach 86.25%. The conclusions show that the topic model proposed in this paper can capture the characteristics of the pyramid scheme more reasonably.
Keyword:
Reprint Author's Address:
Source :
NLPIR 2019: 2019 3RD INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL
Year: 2019
Page: 15-19
Language: English
Cited Count:
WoS CC Cited Count: 1
SCOPUS Cited Count:
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 2
Affiliated Colleges: