收录:
摘要:
It is usually necessary to obtain task-specific training data in high-resource source languages for cross-lingual text classification. However, due to labeling costs, task characteristics, and privacy concerns, collecting such data is often unfeasible. We focus on how to improve text classification in low-resource languages in the absence of source training annotated data. To effectively transfer resources, we propose a new neural network framework(ATHG) that only make use of bilingual lexicons and high-resource languages's task-independent word embeddings. Firstly, through adversarial training, we map the source language vocabulary into the same space as the target language vocabulary, optimizing the mapping matrix. Then, considering multiple languages, we integrate different language information through a multi-step aggregation strategy. Our method outperforms pretrained models even without accessing large corpora. © 2024 IEEE.
关键词:
通讯作者信息:
电子邮件地址:
来源 :
年份: 2024
页码: 735-740
语种: 英文
归属院系: