收录:
摘要:
With the development of information and communication technology, the situation of communication frauds is becoming more and more serious, how to identify fraudulent telephone accurately and effectively has become an urgent task in telecom operation at present. Affected by the power law distribution, existing machine learning methods are used to identify the unbalanced distribution data set of positive and negative samples with low recognition accuracy. This paper proposes ADASYN+RF model. First of all, for the problem of unbalanced data sets, this paper chooses the ADASYN(Adaptive Synthetic Sampling) algorithm to rebalance the original data set. Secondly, we choose the random forest algorithm is employed to train the new data set to avoid overfitting. Finally, two groups of comparative experiments are carried out respectively, and the results show that: (1) For the processing of biased data, the ADASYN algorithm used in this paper is more advantageous than the traditional SMOTE algorithm;(2) Compared with Nonintegrated learning model, the accuracy, recall rate and F1 value of the ADASYN+RF model are significantly improved. © 2020 IEEE.
关键词:
通讯作者信息:
电子邮件地址: