收录:
摘要:
Although the citations between scientific documents are deemed as a vehicle for dissemination, inheritance and development of scientific knowledge, not all citations are well-positioned to be equal. A plethora of taxonomies and machine-learning models have been implemented to tackle the task of citation function and importance classification from qualitative aspect. Inspired by the success of kernel functions from resulting general models to promote the performance of the support vector machine (SVM) model, this work exploits the potential of combining generative and discriminative models for the task of citation importance classification. In more detail, generative features are generated from a topic model, citation influence model (CIM) and then fed to two discriminative traditional machine-learning models, SVM and RF (random forest), and a deep learning model, convolutional neural network (CNN), with other 13 traditional features to identify important citations. The extensive experiments are performed on two data sets with different characteristics. These three models perform better on the data set from one discipline. It is very possible that the patterns for important citations may vary by the fields, which disable machine-learning models to learn effectively the discriminative patterns from publications from multiple domains. The RF classifier outperforms the SVM classifier, which accords with many prior studies. However, the CNN model does not achieve the desired performance due to small-scaled data set. Furthermore, our CIM model-based features improve further the performance for identifying important citations.
关键词:
通讯作者信息:
电子邮件地址: