• 综合
  • 标题
  • 关键词
  • 摘要
  • 学者
  • 期刊-刊名
  • 期刊-ISSN
  • 会议名称
搜索

作者:

Liu, Yanbin (Liu, Yanbin.) | Zhang, Wen (Zhang, Wen.) | Qin, Guangjie (Qin, Guangjie.) | Zhao, Jiangpeng (Zhao, Jiangpeng.)

收录:

EI Scopus

摘要:

In the current stage, software defect prediction is suffering the imbalanced data problem. Traditional methods are insensitive to defect-prone modules and tend to predict defect-prone modules as defect-free modules. To deal with this problem, sampling techniques are adopted to rebalance the defect-prone and defect-free data to train the predictive model in order to improve the performance. However, it is not clear on the combined effect of the sampling techniques and the machine learning classifiers on the performance of software defect prediction. The intent of the paper is to study the performance impact on defect prediction incurred by different combinations of sampling techniques and machine learning classifiers. Specifically, we investigate three types of sampling techniques as resampling, spread subsampling and SMOTE (Synthetic Minority Over-sampling Technique), and five types of machine learning classifiers as C4.5, naive Bayes, logistic regression, support vector machine and deep learning to study their combined effect on defect prediction. By using the Friedman test and Nemenyi test, we find that there isn't an optimal method among all the 12 combinations in defect prediction. However, support vector machine and deep learning have produced the best performance stably among all the investigated projects. With ANOVA analysis, we find that the sampling techniques have great impact on the outcomes of defect prediction because they produce different data distributions for model training. Nevertheless, the sampling proportion has significant impacts on TPR (True Positive Ratio) and FPR (False Positive Ratio) while it can merely influence the AUC (Area under Curve) and Balance of logistic regression. We explain the experimental results in the paper. © 2022 The Authors. Published by Elsevier B.V.

关键词:

Logistic regression Learning systems Defects Support vector regression Deep learning Forecasting

作者机构:

  • [ 1 ] [Liu, Yanbin]The No.13th Research Institute of China Electronics Technology Group Corporation, Shijiazhuang, China
  • [ 2 ] [Zhang, Wen]College of Economics and Management, Beijing University of Technology, Beijing, China
  • [ 3 ] [Qin, Guangjie]College of Economics and Management, Beijing University of Technology, Beijing, China
  • [ 4 ] [Zhao, Jiangpeng]College of Economics and Management, Beijing University of Technology, Beijing, China

通讯作者信息:

电子邮件地址:

查看成果更多字段

相关关键词:

相关文章:

来源 :

年份: 2022

期: C

卷: 214

页码: 1603-1616

语种: 英文

被引次数:

WoS核心集被引频次:

SCOPUS被引频次: 15

ESI高被引论文在榜: 0 展开所有

万方被引频次:

中文被引频次:

近30日浏览量: 1

归属院系:

在线人数/总访问数:463/4953821
地址:北京工业大学图书馆(北京市朝阳区平乐园100号 邮编:100124) 联系我们:010-67392185
版权所有:北京工业大学图书馆 站点建设与维护:北京爱琴海乐之技术有限公司