收录:
摘要:
In recent times, numerous models tried to enhance the performance of Transformer on Chinese NER tasks. The model can be enhanced in two ways: one is combining it with lexicon augmentation techniques, the other is optimizing the Transformer model itself. According to research, fully connected self-attention may scatter the attention distribution, which is the reason for worse performance of the original Transformer with self-attention. In this paper, we attempt to optimize the Transformer model especially attention layer. Therefore, a novel attention mechanism, Dilated Shift Window Attention, is proposed to address this problem. By using Window Attention, this method improves the model’s capacity to deal local information, meanwhile, the model can still manage long text and long-distance dependencies owing to the Window Dilatation mechanism. Experiments on various datasets also show that DSWA replacing fully connected self-attention improves the model’s performance on the Chinese NER task. Copyright © 2023 by KSI Research Inc. and Knowledge Systems Institute, USA.
关键词:
通讯作者信息:
电子邮件地址:
来源 :
年份: 2023
页码: 51-57
语种: 英文
归属院系: