Indexed by:
Abstract:
In recent times, numerous models tried to enhance the performance of Transformer on Chinese NER tasks. The model can be enhanced in two ways: one is combining it with lexicon augmentation techniques, the other is optimizing the Transformer model itself. According to research, fully connected self-attention may scatter the attention distribution, which is the reason for worse performance of the original Transformer with self-attention. In this paper, we attempt to optimize the Transformer model especially attention layer. Therefore, a novel attention mechanism, Dilated Shift Window Attention, is proposed to address this problem. By using Window Attention, this method improves the model’s capacity to deal local information, meanwhile, the model can still manage long text and long-distance dependencies owing to the Window Dilatation mechanism. Experiments on various datasets also show that DSWA replacing fully connected self-attention improves the model’s performance on the Chinese NER task. Copyright © 2023 by KSI Research Inc. and Knowledge Systems Institute, USA.
Keyword:
Reprint Author's Address:
Email:
Source :
Year: 2023
Page: 51-57
Language: English
Cited Count:
SCOPUS Cited Count:
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 1
Affiliated Colleges: