收录:
摘要:
Diagnosing infants who are small for gestational age (SGA) at early stages could help physicians to introduce interventions for SGA infants earlier. Machine learning (ML) is envisioned as a tool to identify SGA infants. However, ML has not been widely studied in this field. To develop effective SGA prediction models, we conducted four groups of experiments that considered basic ML methods, imbalanced data, feature selection and the time characteristics of variables, respectively. Infants with SGA data collected from 2010 to 2013 with gestational weeks between 24 and 42 were detected. Support vector machine (SVM), random forest (RF), logistic regression (LR) and Sparse LR models were trained on 10-fold cross validation. Precision and the area under the curve (AUC) of the receiver operator characteristic curve were evaluated. For each group, the performance of SVM and Sparse LR was similarly well. LR without any sparsity penalties performed worst, possibly caused by the overfitting problem. With the combination of handling imbalanced data and feature selection, the RF ensemble classifier performed best, which even obtained the highest AUC value (0.8547) with the help of expert knowledge. In other cases, RF performed worse than Sparse LR and SVM, possibly because of fully grown trees. © 2015 IEEE.
关键词:
通讯作者信息:
电子邮件地址: