收录:
摘要:
The detection and removal of malicious social bots in social networks has become an area of interest in industry and academia. The widely used bot detection method based on machine learning leads to an imbalance in the number of samples in different categories. Classifier bias leads to a low detection rate of minority samples. Therefore, we propose an improved conditional generative adversarial network (improved CGAN) to extend imbalanced data sets before applying training classifiers to improve the detection accuracy of social bots. To generate an auxiliary condition, we propose a modified clustering algorithm, namely, the Gaussian kernel density peak clustering algorithm (GKDPCA), which avoids the generation of data-augmentation noise and eliminates imbalances between and within social bot class distributions. Furthermore, we improve the CGAN convergence judgment condition by introducing the Wasserstein distance with a gradient penalty, which addresses the model collapse and gradient disappearance in the traditional CGAN. Three common oversampling algorithms are compared in experiments. The effects of the imbalance degree and the expansion ratio of the original data on oversampling are studied, and the improved CGAN performs better than the others. Experimental results comparing with three common oversampling algorithms show that the improved CGAN achieves the higher evaluation scores in terms of F1-score, G-mean and AUC.
关键词:
通讯作者信息:
电子邮件地址: