收录:
摘要:
This work studies the problem of identifying risk factors of Small for Gestational Age (SGA) and building classifiers for SGA prediction. Recently, SGA infants have received more and more concerns as this illness brings many difficulties to them along with their whole life. Some experts have begun to study the risk factors of SGA onset by using traditional statistical ways. Others have used logistic regression (LR) to construct SGA prediction models. Meanwhile, machine learning have evolved and envisioned as a tool able to potentially identify babies with SGA. This work tests several feature selection methods. Based on the risk factors obtained through them, it trains support vector machine, random forest, and LR models and evaluates them via 10-fold cross validation in terms of precision and area under the curve of receiver operator characteristic curve. The results show that sparse LR of the wrapper algorithms owns the best feature selection effectiveness. In addition, this work compares data driven factors and knowledge driven factors and shows that the feature selection is necessary and effective. Among the trained classifiers, the LR model achieves the best performance on the data driven factors.
关键词:
通讯作者信息:
电子邮件地址: