收录:
摘要:
Cross-view geo-localization of satellite and unmanned aerial vehicles (UAVs) imagery has attracted extensive attention due to its tremendous potential for global navigation satellite system (GNSS) denied navigation. However, inadequate feature representation across different views coupled with positional shifts and distance-scale uncertainty are key challenges. Most of the existing research mainly focused on extracting comprehensive and fine-grained information, yet effective feature representation and alignment should be imposed equal importance. In this article, we propose an innovative transformer-based pipeline TransFG for robust cross-view image matching, which incorporates feature aggregation (FA) and gradient guidance (GG) module. TransFG synergically takes advantage of FA and GG, achieving an effective balance in feature representation and alignment. Specifically, the proposed FA module implicitly learns salient features and dynamically aggregates contextual features from the vision transformer (ViT). The proposed GG module uses the gradient information of local features to further enhance the cross-view feature representation and aligns specific instances across different views. Extensive experiments demonstrate that our pipeline outperforms existing methods in cross-view geo-localization. It achieves an impressive improvement in R@1 and AP than the state-of-the-art (SOTA) methods.
关键词:
通讯作者信息:
电子邮件地址:
来源 :
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING
ISSN: 0196-2892
年份: 2024
卷: 62
8 . 2 0 0
JCR@2022
归属院系: