Query:
学者姓名:张菁
Refining:
Year
Type
Indexed by
Source
Complex
Former Name
Co-Author
Language
Clean All
Abstract :
Road segmentation is a fundamental task for dynamic map in unmanned aerial vehicle (UAV) path navigation. In unplanned, unknown and even damaged areas, there are usually unpaved roads with blurred edges, deformations and occlusions. These challenges of unpaved road segmentation pose significant challenges to the construction of dynamic maps. Our major contributions have: (1) Inspired by dilated convolution, we propose dilated cross window self-attention (DCWin-Attention), which is composed of a dilated cross window mechanism and a pixel regional module. Our goal is to model the long-range horizontal and vertical road dependencies for unpaved roads with deformation and blurred edges. (2) A shifted cross window mechanism is introduced through coupling with DCWin-Attention to reduce the influence of occluded roads in UAV imagery. In detail, the GVT backbone is constructed by using the DCWin-Attention block for multilevel deep features with global dependency. (3) The unpaved road is segmented with the confidence map generated by fusing the deep features of different levels in a unified perceptual parsing network. We verify our method on the self-established BJUT-URD dataset and public DeepGlobe dataset, which achieves 67.72 and 52.67% of the highest IoU at proper inference efficiencies of 2.7, 2.8 FPS, respectively, demonstrating its effectiveness and superiority in unpaved road segmentation. Our code is available at https://github.com/BJUT-AIVBD/GVT-URS.
Keyword :
Unpaved road segmentation Unpaved road segmentation Dynamic map Dynamic map UAV imagery UAV imagery Global vision transformer Global vision transformer DCWin-attention DCWin-attention
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Li, Wensheng , Zhang, Jing , Li, Jiafeng et al. Unpaved road segmentation of UAV imagery via a global vision transformer with dilated cross window self-attention for dynamic map [J]. | VISUAL COMPUTER , 2024 . |
MLA | Li, Wensheng et al. "Unpaved road segmentation of UAV imagery via a global vision transformer with dilated cross window self-attention for dynamic map" . | VISUAL COMPUTER (2024) . |
APA | Li, Wensheng , Zhang, Jing , Li, Jiafeng , Zhuo, Li . Unpaved road segmentation of UAV imagery via a global vision transformer with dilated cross window self-attention for dynamic map . | VISUAL COMPUTER , 2024 . |
Export to | NoteExpress RIS BibTex |
Abstract :
With the rapid expansion of the we-media industry, streamers have increasingly incorporated inappropriate content into live videos to attract traffic and pursue interests. Blacklisted streamers often forge their identities or switch platforms to continue streaming, causing significant harm to the online environment. Consequently, streamer re-identification (re-ID) has become of paramount importance. Streamer biometrics in live videos exhibit multimodal characteristics, including voiceprints, faces, and spatiotemporal information, which complement each other. Therefore, we propose alight cross-modal attention network (LCMA-Net) for streamer re-ID in live videos. First, the voiceprint, face, and spatiotemporal features of the streamer are extracted by RawNetSA, Pi- Net, and STDA-ResNeXt3D, respectively. We then design alight cross-modal pooling attention (LCMPA) module, which, combined with a multilayer perceptron (MLP), aligns and concatenates different modality features into multimodal features within the LCMA-Net. Finally, the streamer is re-identified by measuring the similarity between these multimodal features. Five experiments were conducted on the StreamerReID dataset, and the results demonstrated that the proposed method achieved competitive performance. The dataset and code are available at https://github.com/BJUT-AIVBD/LCMA-Net.
Keyword :
Live video Live video Light cross-modal attention network Light cross-modal attention network Re-identification Re-identification Light cross-modal pooling attention Light cross-modal pooling attention Streamer Streamer
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Yao, Jiacheng , Zhang, Jing , Zhang, Hui et al. LCMA-Net: Alight cross-modal attention network for streamer re-identification in live video [J]. | COMPUTER VISION AND IMAGE UNDERSTANDING , 2024 , 249 . |
MLA | Yao, Jiacheng et al. "LCMA-Net: Alight cross-modal attention network for streamer re-identification in live video" . | COMPUTER VISION AND IMAGE UNDERSTANDING 249 (2024) . |
APA | Yao, Jiacheng , Zhang, Jing , Zhang, Hui , Zhuo, Li . LCMA-Net: Alight cross-modal attention network for streamer re-identification in live video . | COMPUTER VISION AND IMAGE UNDERSTANDING , 2024 , 249 . |
Export to | NoteExpress RIS BibTex |
Abstract :
Haze reduces the imaging effectiveness of outdoor vision systems, significantly degrading the quality of images; hence, reducing haze has been a focus of many studies. In recent years, decoupled representation learning has been applied in image processing; however, existing decoupled networks lack a specific design for information with different characteristics to achieve satisfactory results in dehazing tasks. This study proposes a heterogeneous decoupling unsupervised dehazing network (HDUD-Net). Heterogeneous modules are used to learn the content and haze information of images individually to separate them effectively. To address the problem of information loss when extracting the content from hazy images with complex noise, this study proposes a bi-branch multi-hierarchical feature fusion module. Additionally, it proposes a style feature contrast learning method to generate positive and negative sample queues and construct contrast loss for enhancing decoupling performance. Numerous experiments confirm that the proposed algorithm achieves higher performance according to objective metrics and a more realistic visual effect when compared with state-of-the-art single-image dehazing algorithms.
Keyword :
Unsupervised learning Unsupervised learning Single image dehazing Single image dehazing Image restoration Image restoration Image enhancement Image enhancement
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Li, Jiafeng , Kuang, Lingyan , Jin, Jiaqi et al. HDUD-Net: heterogeneous decoupling unsupervised dehaze network [J]. | NEURAL COMPUTING & APPLICATIONS , 2024 , 36 (6) : 2695-2711 . |
MLA | Li, Jiafeng et al. "HDUD-Net: heterogeneous decoupling unsupervised dehaze network" . | NEURAL COMPUTING & APPLICATIONS 36 . 6 (2024) : 2695-2711 . |
APA | Li, Jiafeng , Kuang, Lingyan , Jin, Jiaqi , Zhuo, Li , Zhang, Jing . HDUD-Net: heterogeneous decoupling unsupervised dehaze network . | NEURAL COMPUTING & APPLICATIONS , 2024 , 36 (6) , 2695-2711 . |
Export to | NoteExpress RIS BibTex |
Abstract :
Background and Objective: Cell cluster detection of thyroid fine needle aspiration biopsy (FNAB) in whole-slide image (WSI) is significant for improving the diagnostic efficiency and accuracy of thyroid cancer. For ultraresolution, small object size, and sparse irregularity of cell cluster in thyroid FNAB-WSI, we propose a cell cluster detection method via deformable convolution with frequency channel attention (FCA). Methods: Firstly, an adaptive data augmentation (ADA) module is used to classify the patch images after cropping with sliding window to activate different augmentation operations to solve the problem of easy loss of cell cluster objects. By combining ResNeXt101 backbone with feature pyramid network (FPN) to extract the multi-scale features of cell cluster, we add deformable convolutional DCNv2 and FCA to achieve cell cluster feature refinement. Finally, the improved Sparse R-CNN model with sparse learnable proposals is adopted to detect cell clusters in thyroid FNAB-WSI. Results: The dataset contains approximately 6020 patch images, with 3612 for training, 1204 for validation, and 1204 for testing. Experiments results demonstrate that our method wins the highest average detection accuracy of 95.4 % on the self-built thyroid FNAB-WSI dataset, which is 2.9 % higher than that of SOTA. Since feature extraction weighs on model consumption, yielding a balanced 12FPS, model acceleration is a future work. Overall, our cell cluster detection method has a positive impact on the efficiency and accuracy of thyroid cancer diagnosis. Significance: The proposed method can be applied as a fast and accurate computer-aided method for thyroid cancer diagnosis in clinical practice.
Keyword :
Deformable convolution Deformable convolution Frequency channel attention Frequency channel attention Thyroid FNAB-WSI Thyroid FNAB-WSI Sparse R -CNN Sparse R -CNN Cell cluster detection Cell cluster detection
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Sun, Meng , Zhang, Jing , Zhao, Shimei et al. Cell cluster detection of thyroid FNAB-WSI via deformable convolution with frequency channel attention [J]. | BIOMEDICAL SIGNAL PROCESSING AND CONTROL , 2024 , 100 . |
MLA | Sun, Meng et al. "Cell cluster detection of thyroid FNAB-WSI via deformable convolution with frequency channel attention" . | BIOMEDICAL SIGNAL PROCESSING AND CONTROL 100 (2024) . |
APA | Sun, Meng , Zhang, Jing , Zhao, Shimei , Li, Xiaoguang , Zhuo, Li . Cell cluster detection of thyroid FNAB-WSI via deformable convolution with frequency channel attention . | BIOMEDICAL SIGNAL PROCESSING AND CONTROL , 2024 , 100 . |
Export to | NoteExpress RIS BibTex |
Abstract :
Semantic segmentation of remote sensing images (RSIs) is of great significance for obtaining geospatial object information. Transformers win promising effect, whereas multi-head self-attention (MSA) is expensive. We propose an efficient semantic segmentation Transformer (ESST) of RSIs that combines zero-padding position encoding with linear space reduction attention (LSRA). First, to capture the coarse-to-fine features of RSI, a zero-padding position encoding is proposed by adding overlapping patch embedding (OPE) layers and convolution feed-forward networks (CFFN) to improve the local continuity of features. Then, we replace LSRA in the attention operation to extract multi-level features to reduce the computational cost of the encoder. Finally, we design a lightweight all multi-layer perceptron (all-MLP) head decoder to easily aggregate multi-level features to generate multi-scale features for semantic segmentation. Experimental results demonstrate that our method produces a trade-off in accuracy and speed for semantic segmentation of RSIs on the Potsdam and Vaihingen datasets, respectively.
Keyword :
semantic segmentation semantic segmentation All-MLP All-MLP Remote sensing images Remote sensing images Transformer Transformer linear space reduction attention linear space reduction attention Zero-padding position encoding Zero-padding position encoding
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Yan, Yi , Zhang, Jing , Wu, Xinjia et al. When zero-padding position encoding encounters linear space reduction attention: an efficient semantic segmentation Transformer of remote sensing images [J]. | INTERNATIONAL JOURNAL OF REMOTE SENSING , 2024 , 45 (2) : 609-633 . |
MLA | Yan, Yi et al. "When zero-padding position encoding encounters linear space reduction attention: an efficient semantic segmentation Transformer of remote sensing images" . | INTERNATIONAL JOURNAL OF REMOTE SENSING 45 . 2 (2024) : 609-633 . |
APA | Yan, Yi , Zhang, Jing , Wu, Xinjia , Li, Jiafeng , Zhuo, Li . When zero-padding position encoding encounters linear space reduction attention: an efficient semantic segmentation Transformer of remote sensing images . | INTERNATIONAL JOURNAL OF REMOTE SENSING , 2024 , 45 (2) , 609-633 . |
Export to | NoteExpress RIS BibTex |
Abstract :
本发明提供一种低光图像的增强方法、装置、电子设备及存储介质,该方法包括:获取待增强的低光图像;将所述低光图像输入至图像分解网络,得到所述低光图像对应的第一反射率分量图及第一光照分量图;将所述第一反射率分量图及所述第一光照分量图输入至反射率调整网络,得到第二反射率分量图,并将所述第一光照分量图输入至光照调整网络,得到第二光照分量图;根据所述第二反射率分量图及所述第二光照分量图,得到所述低光图像对应的增强图像。该方法利用图像分解网络,可准确分解低光图像,并利用反射率调整网络和光照调整网络,以从粗到细的方式调整分解后的低光图像,得到对应的反射率分量和光照分量,进而可有效提高获取增强图像的准确性。
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | 李嘉锋 , 郝帅 , 况玲艳 et al. 低光图像的增强方法、装置、电子设备及存储介质 : CN202310028246.9[P]. | 2023-01-09 . |
MLA | 李嘉锋 et al. "低光图像的增强方法、装置、电子设备及存储介质" : CN202310028246.9. | 2023-01-09 . |
APA | 李嘉锋 , 郝帅 , 况玲艳 , 张菁 , 卓力 . 低光图像的增强方法、装置、电子设备及存储介质 : CN202310028246.9. | 2023-01-09 . |
Export to | NoteExpress RIS BibTex |
Abstract :
With the development of high-resolution remote sensing images (HR-RSIs) and the escalating demand for intelligent analysis, fine-grained recognition of geospatial objects has become a more practical and challenging task. Although deep learning-based object recognition has achieved superior performance, it is inflexible to be directly utilized to the fine-grained object recognition (FGOR) tasks of HR-RSIs under the limitation of the size of geospatial objects. An efficient fine-grained object recognition method in HR-RSIs from knowledge distillation (KL) to filter grafting is proposed. Specifically, fine-grained object recognition consists of two stages: Stage 1 utilizes oriented region convolutional neural network (oriented R-CNN) to accurately locate and preliminarily classify geospatial objects. At the same time, it serves as a teacher network to guide students' effective learning of fine-grained object recognition; in Stage 2, we design a coarse-to-fine object recognition network (CF-ORNet), as the second teacher network, which realizes fine-grained recognition through feature learning and category correction. After that, we propose a lightweight model from knowledge distillation to filter grafting on two teacher networks to achieve efficient fine-grained object recognition. The experimental results on Vehicle Detection in Aerial Imagery (VEDAI) and HR Ship Collection 2016 (HRSC2016) datasets achieve competitive performance.
Keyword :
knowledge distillation knowledge distillation high-resolution remote sensing image (HR-RSI) high-resolution remote sensing image (HR-RSI) Coarse-to-fine object recognition network (CF-ORNet) Coarse-to-fine object recognition network (CF-ORNet) filter grafting filter grafting fine-grained object recognition (FGOR) fine-grained object recognition (FGOR)
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Wang, Liuqian , Zhang, Jing , Tian, Jimiao et al. Efficient Fine-Grained Object Recognition in High-Resolution Remote Sensing Images From Knowledge Distillation to Filter Grafting [J]. | IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING , 2023 , 61 . |
MLA | Wang, Liuqian et al. "Efficient Fine-Grained Object Recognition in High-Resolution Remote Sensing Images From Knowledge Distillation to Filter Grafting" . | IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 61 (2023) . |
APA | Wang, Liuqian , Zhang, Jing , Tian, Jimiao , Li, Jiafeng , Zhuo, Li , Tian, Qi . Efficient Fine-Grained Object Recognition in High-Resolution Remote Sensing Images From Knowledge Distillation to Filter Grafting . | IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING , 2023 , 61 . |
Export to | NoteExpress RIS BibTex |
Abstract :
Most industrial Internet of Things (IoT) devices reduce the capture image size using high-ratio joint photographic experts group (JPEG) compression, saving storage space, and transmission bandwidth consumption. However, the resulting compression artifacts considerably affect the accuracy of subsequent tasks. Most artifact reduction algorithms do not consider the limitations of storage space and computing power of edge devices. In this study, a blind artifact reduction recurrent network (BARRN), which can reduce compression artifacts when the quality factors are unknown, is proposed. First, a structure based on recurrent convolution is designed for the specific requirements of industrial IoT image acquisition devices; the network can be scaled according to system resource constraints. Second, a more efficient convolution group, capable of adaptively processing different degradation levels, is proposed for optimal use of the limited computational resources. The experimental results demonstrate that the proposed BARRN can meet the needs of industrial systems with high computational efficiency.
Keyword :
Artifact reduction (AR) Artifact reduction (AR) image restoration image restoration blind JPEG compression recurrent convolution blind JPEG compression recurrent convolution industrial Internet of Things (IoT) industrial Internet of Things (IoT)
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Li, Jiafeng , Liu, Xiaoyu , Gao, Yuqi et al. BARRN: A Blind Image Compression Artifact Reduction Network for Industrial IoT Systems [J]. | IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS , 2023 , 19 (9) : 9479-9490 . |
MLA | Li, Jiafeng et al. "BARRN: A Blind Image Compression Artifact Reduction Network for Industrial IoT Systems" . | IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS 19 . 9 (2023) : 9479-9490 . |
APA | Li, Jiafeng , Liu, Xiaoyu , Gao, Yuqi , Zhuo, Li , Zhang, Jing . BARRN: A Blind Image Compression Artifact Reduction Network for Industrial IoT Systems . | IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS , 2023 , 19 (9) , 9479-9490 . |
Export to | NoteExpress RIS BibTex |
Abstract :
RGB-thermal (RGB-T) dual-modal imaging significantly broadens the observation dimensions of the vision system. However, effectively harnessing the inherent advantages of different spectral bands and establishing fusion solutions tightly coupled with end tasks remains highly challenging. This article proposes a modality fusion approach that combines channel switching and cross-modal attention for RGB-T tracking. We explore the hierarchical fusion method adapted to the deep features of different abstraction levels. For low-level features, cross-modal information is introduced to increase the diversity of unimodal data by swapping feature channels with low computational costs. To exploit the semantic representation of high-level deep features and heterogeneous information in multimodal data, a fusion structure based on modal mutual attention is designed, which achieves effective enhancement of RGB-T fusion feature representation by integrating modal self-attention and cross-modal attention. Experimental results on public datasets show that the proposed algorithm is effective and computationally efficient to obtain the state-of-the-art tracking performance and real-time processing.
Keyword :
modal mutual attention modal mutual attention RGB-thermal (RGB-T) tracking RGB-thermal (RGB-T) tracking multimodal fusion multimodal fusion object fusion tracking object fusion tracking Channel swapping Channel swapping
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Luan, Tian , Zhang, Hui , Li, Jiafeng et al. Object Fusion Tracking for RGB-T Images via Channel Swapping and Modal Mutual Attention [J]. | IEEE SENSORS JOURNAL , 2023 , 23 (19) : 22930-22943 . |
MLA | Luan, Tian et al. "Object Fusion Tracking for RGB-T Images via Channel Swapping and Modal Mutual Attention" . | IEEE SENSORS JOURNAL 23 . 19 (2023) : 22930-22943 . |
APA | Luan, Tian , Zhang, Hui , Li, Jiafeng , Zhang, Jing , Zhuo, Li . Object Fusion Tracking for RGB-T Images via Channel Swapping and Modal Mutual Attention . | IEEE SENSORS JOURNAL , 2023 , 23 (19) , 22930-22943 . |
Export to | NoteExpress RIS BibTex |
Abstract :
Occluded pedestrian detection is very challenging in computer vision, because the pedestrians are frequently occluded by various obstacles or persons, especially in crowded scenarios. In this article, an occluded pedestrian detection method is proposed under a basic DEtection TRansformer (DETR) framework. Firstly, Dynamic Deformable Convolution (DyDC) and Gaussian Projection Channel Attention (GPCA) mechanism are proposed and embedded into the low layer and high layer of ResNet50 respectively, to improve the representation capability of features. Secondly, Cascade Transformer Decoder (CTD) is proposed, which aims to generate high-score queries, avoiding the influence of low-score queries in the decoder stage, further improving the detection accuracy. The proposed method is verified on three challenging datasets, namely CrowdHuman, WiderPerson, and TJU-DHD-pedestrian. The experimental results show that, compared with the state-of-the-art methods, it can obtain a superior detection performance.
Keyword :
Cascade transformer decoder Cascade transformer decoder Feature extraction Feature extraction occluded pedestrian detection occluded pedestrian detection Object detection Object detection Convolution Convolution dynamic deformable convolution dynamic deformable convolution Task analysis Task analysis Gaussian project channel attention mechanism Gaussian project channel attention mechanism Decoding Decoding Transformers Transformers Kernel Kernel
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Ma, Chunjie , Zhuo, Li , Li, Jiafeng et al. Cascade Transformer Decoder Based Occluded Pedestrian Detection With Dynamic Deformable Convolution and Gaussian Projection Channel Attention Mechanism [J]. | IEEE TRANSACTIONS ON MULTIMEDIA , 2023 , 25 : 1529-1537 . |
MLA | Ma, Chunjie et al. "Cascade Transformer Decoder Based Occluded Pedestrian Detection With Dynamic Deformable Convolution and Gaussian Projection Channel Attention Mechanism" . | IEEE TRANSACTIONS ON MULTIMEDIA 25 (2023) : 1529-1537 . |
APA | Ma, Chunjie , Zhuo, Li , Li, Jiafeng , Zhang, Yutong , Zhang, Jing . Cascade Transformer Decoder Based Occluded Pedestrian Detection With Dynamic Deformable Convolution and Gaussian Projection Channel Attention Mechanism . | IEEE TRANSACTIONS ON MULTIMEDIA , 2023 , 25 , 1529-1537 . |
Export to | NoteExpress RIS BibTex |
Export
Results: |
Selected to |
Format: |