您的检索:
学者姓名:张菁
精炼检索结果:
年份
成果类型
收录类型
来源
综合
合作者
语言
清除所有精炼条件
摘要 :
Road segmentation is a fundamental task for dynamic map in unmanned aerial vehicle (UAV) path navigation. In unplanned, unknown and even damaged areas, there are usually unpaved roads with blurred edges, deformations and occlusions. These challenges of unpaved road segmentation pose significant challenges to the construction of dynamic maps. Our major contributions have: (1) Inspired by dilated convolution, we propose dilated cross window self-attention (DCWin-Attention), which is composed of a dilated cross window mechanism and a pixel regional module. Our goal is to model the long-range horizontal and vertical road dependencies for unpaved roads with deformation and blurred edges. (2) A shifted cross window mechanism is introduced through coupling with DCWin-Attention to reduce the influence of occluded roads in UAV imagery. In detail, the GVT backbone is constructed by using the DCWin-Attention block for multilevel deep features with global dependency. (3) The unpaved road is segmented with the confidence map generated by fusing the deep features of different levels in a unified perceptual parsing network. We verify our method on the self-established BJUT-URD dataset and public DeepGlobe dataset, which achieves 67.72 and 52.67% of the highest IoU at proper inference efficiencies of 2.7, 2.8 FPS, respectively, demonstrating its effectiveness and superiority in unpaved road segmentation. Our code is available at https://github.com/BJUT-AIVBD/GVT-URS.
关键词 :
Unpaved road segmentation Unpaved road segmentation Dynamic map Dynamic map UAV imagery UAV imagery Global vision transformer Global vision transformer DCWin-attention DCWin-attention
引用:
复制并粘贴一种已设定好的引用格式,或利用其中一个链接导入到文献管理软件中。
GB/T 7714 | Li, Wensheng , Zhang, Jing , Li, Jiafeng et al. Unpaved road segmentation of UAV imagery via a global vision transformer with dilated cross window self-attention for dynamic map [J]. | VISUAL COMPUTER , 2024 . |
MLA | Li, Wensheng et al. "Unpaved road segmentation of UAV imagery via a global vision transformer with dilated cross window self-attention for dynamic map" . | VISUAL COMPUTER (2024) . |
APA | Li, Wensheng , Zhang, Jing , Li, Jiafeng , Zhuo, Li . Unpaved road segmentation of UAV imagery via a global vision transformer with dilated cross window self-attention for dynamic map . | VISUAL COMPUTER , 2024 . |
导入链接 | NoteExpress RIS BibTex |
摘要 :
With the rapid expansion of the we-media industry, streamers have increasingly incorporated inappropriate content into live videos to attract traffic and pursue interests. Blacklisted streamers often forge their identities or switch platforms to continue streaming, causing significant harm to the online environment. Consequently, streamer re-identification (re-ID) has become of paramount importance. Streamer biometrics in live videos exhibit multimodal characteristics, including voiceprints, faces, and spatiotemporal information, which complement each other. Therefore, we propose alight cross-modal attention network (LCMA-Net) for streamer re-ID in live videos. First, the voiceprint, face, and spatiotemporal features of the streamer are extracted by RawNetSA, Pi- Net, and STDA-ResNeXt3D, respectively. We then design alight cross-modal pooling attention (LCMPA) module, which, combined with a multilayer perceptron (MLP), aligns and concatenates different modality features into multimodal features within the LCMA-Net. Finally, the streamer is re-identified by measuring the similarity between these multimodal features. Five experiments were conducted on the StreamerReID dataset, and the results demonstrated that the proposed method achieved competitive performance. The dataset and code are available at https://github.com/BJUT-AIVBD/LCMA-Net.
关键词 :
Live video Live video Light cross-modal attention network Light cross-modal attention network Re-identification Re-identification Light cross-modal pooling attention Light cross-modal pooling attention Streamer Streamer
引用:
复制并粘贴一种已设定好的引用格式,或利用其中一个链接导入到文献管理软件中。
GB/T 7714 | Yao, Jiacheng , Zhang, Jing , Zhang, Hui et al. LCMA-Net: Alight cross-modal attention network for streamer re-identification in live video [J]. | COMPUTER VISION AND IMAGE UNDERSTANDING , 2024 , 249 . |
MLA | Yao, Jiacheng et al. "LCMA-Net: Alight cross-modal attention network for streamer re-identification in live video" . | COMPUTER VISION AND IMAGE UNDERSTANDING 249 (2024) . |
APA | Yao, Jiacheng , Zhang, Jing , Zhang, Hui , Zhuo, Li . LCMA-Net: Alight cross-modal attention network for streamer re-identification in live video . | COMPUTER VISION AND IMAGE UNDERSTANDING , 2024 , 249 . |
导入链接 | NoteExpress RIS BibTex |
摘要 :
Background and Objective: Cell cluster detection of thyroid fine needle aspiration biopsy (FNAB) in whole-slide image (WSI) is significant for improving the diagnostic efficiency and accuracy of thyroid cancer. For ultraresolution, small object size, and sparse irregularity of cell cluster in thyroid FNAB-WSI, we propose a cell cluster detection method via deformable convolution with frequency channel attention (FCA). Methods: Firstly, an adaptive data augmentation (ADA) module is used to classify the patch images after cropping with sliding window to activate different augmentation operations to solve the problem of easy loss of cell cluster objects. By combining ResNeXt101 backbone with feature pyramid network (FPN) to extract the multi-scale features of cell cluster, we add deformable convolutional DCNv2 and FCA to achieve cell cluster feature refinement. Finally, the improved Sparse R-CNN model with sparse learnable proposals is adopted to detect cell clusters in thyroid FNAB-WSI. Results: The dataset contains approximately 6020 patch images, with 3612 for training, 1204 for validation, and 1204 for testing. Experiments results demonstrate that our method wins the highest average detection accuracy of 95.4 % on the self-built thyroid FNAB-WSI dataset, which is 2.9 % higher than that of SOTA. Since feature extraction weighs on model consumption, yielding a balanced 12FPS, model acceleration is a future work. Overall, our cell cluster detection method has a positive impact on the efficiency and accuracy of thyroid cancer diagnosis. Significance: The proposed method can be applied as a fast and accurate computer-aided method for thyroid cancer diagnosis in clinical practice.
关键词 :
Deformable convolution Deformable convolution Frequency channel attention Frequency channel attention Thyroid FNAB-WSI Thyroid FNAB-WSI Sparse R -CNN Sparse R -CNN Cell cluster detection Cell cluster detection
引用:
复制并粘贴一种已设定好的引用格式,或利用其中一个链接导入到文献管理软件中。
GB/T 7714 | Sun, Meng , Zhang, Jing , Zhao, Shimei et al. Cell cluster detection of thyroid FNAB-WSI via deformable convolution with frequency channel attention [J]. | BIOMEDICAL SIGNAL PROCESSING AND CONTROL , 2024 , 100 . |
MLA | Sun, Meng et al. "Cell cluster detection of thyroid FNAB-WSI via deformable convolution with frequency channel attention" . | BIOMEDICAL SIGNAL PROCESSING AND CONTROL 100 (2024) . |
APA | Sun, Meng , Zhang, Jing , Zhao, Shimei , Li, Xiaoguang , Zhuo, Li . Cell cluster detection of thyroid FNAB-WSI via deformable convolution with frequency channel attention . | BIOMEDICAL SIGNAL PROCESSING AND CONTROL , 2024 , 100 . |
导入链接 | NoteExpress RIS BibTex |
摘要 :
Haze reduces the imaging effectiveness of outdoor vision systems, significantly degrading the quality of images; hence, reducing haze has been a focus of many studies. In recent years, decoupled representation learning has been applied in image processing; however, existing decoupled networks lack a specific design for information with different characteristics to achieve satisfactory results in dehazing tasks. This study proposes a heterogeneous decoupling unsupervised dehazing network (HDUD-Net). Heterogeneous modules are used to learn the content and haze information of images individually to separate them effectively. To address the problem of information loss when extracting the content from hazy images with complex noise, this study proposes a bi-branch multi-hierarchical feature fusion module. Additionally, it proposes a style feature contrast learning method to generate positive and negative sample queues and construct contrast loss for enhancing decoupling performance. Numerous experiments confirm that the proposed algorithm achieves higher performance according to objective metrics and a more realistic visual effect when compared with state-of-the-art single-image dehazing algorithms.
关键词 :
Unsupervised learning Unsupervised learning Single image dehazing Single image dehazing Image restoration Image restoration Image enhancement Image enhancement
引用:
复制并粘贴一种已设定好的引用格式,或利用其中一个链接导入到文献管理软件中。
GB/T 7714 | Li, Jiafeng , Kuang, Lingyan , Jin, Jiaqi et al. HDUD-Net: heterogeneous decoupling unsupervised dehaze network [J]. | NEURAL COMPUTING & APPLICATIONS , 2024 , 36 (6) : 2695-2711 . |
MLA | Li, Jiafeng et al. "HDUD-Net: heterogeneous decoupling unsupervised dehaze network" . | NEURAL COMPUTING & APPLICATIONS 36 . 6 (2024) : 2695-2711 . |
APA | Li, Jiafeng , Kuang, Lingyan , Jin, Jiaqi , Zhuo, Li , Zhang, Jing . HDUD-Net: heterogeneous decoupling unsupervised dehaze network . | NEURAL COMPUTING & APPLICATIONS , 2024 , 36 (6) , 2695-2711 . |
导入链接 | NoteExpress RIS BibTex |
摘要 :
Semantic segmentation of remote sensing images (RSIs) is of great significance for obtaining geospatial object information. Transformers win promising effect, whereas multi-head self-attention (MSA) is expensive. We propose an efficient semantic segmentation Transformer (ESST) of RSIs that combines zero-padding position encoding with linear space reduction attention (LSRA). First, to capture the coarse-to-fine features of RSI, a zero-padding position encoding is proposed by adding overlapping patch embedding (OPE) layers and convolution feed-forward networks (CFFN) to improve the local continuity of features. Then, we replace LSRA in the attention operation to extract multi-level features to reduce the computational cost of the encoder. Finally, we design a lightweight all multi-layer perceptron (all-MLP) head decoder to easily aggregate multi-level features to generate multi-scale features for semantic segmentation. Experimental results demonstrate that our method produces a trade-off in accuracy and speed for semantic segmentation of RSIs on the Potsdam and Vaihingen datasets, respectively.
关键词 :
semantic segmentation semantic segmentation All-MLP All-MLP Remote sensing images Remote sensing images Transformer Transformer linear space reduction attention linear space reduction attention Zero-padding position encoding Zero-padding position encoding
引用:
复制并粘贴一种已设定好的引用格式,或利用其中一个链接导入到文献管理软件中。
GB/T 7714 | Yan, Yi , Zhang, Jing , Wu, Xinjia et al. When zero-padding position encoding encounters linear space reduction attention: an efficient semantic segmentation Transformer of remote sensing images [J]. | INTERNATIONAL JOURNAL OF REMOTE SENSING , 2024 , 45 (2) : 609-633 . |
MLA | Yan, Yi et al. "When zero-padding position encoding encounters linear space reduction attention: an efficient semantic segmentation Transformer of remote sensing images" . | INTERNATIONAL JOURNAL OF REMOTE SENSING 45 . 2 (2024) : 609-633 . |
APA | Yan, Yi , Zhang, Jing , Wu, Xinjia , Li, Jiafeng , Zhuo, Li . When zero-padding position encoding encounters linear space reduction attention: an efficient semantic segmentation Transformer of remote sensing images . | INTERNATIONAL JOURNAL OF REMOTE SENSING , 2024 , 45 (2) , 609-633 . |
导入链接 | NoteExpress RIS BibTex |
摘要 :
本发明提供一种低光图像的增强方法、装置、电子设备及存储介质,该方法包括:获取待增强的低光图像;将所述低光图像输入至图像分解网络,得到所述低光图像对应的第一反射率分量图及第一光照分量图;将所述第一反射率分量图及所述第一光照分量图输入至反射率调整网络,得到第二反射率分量图,并将所述第一光照分量图输入至光照调整网络,得到第二光照分量图;根据所述第二反射率分量图及所述第二光照分量图,得到所述低光图像对应的增强图像。该方法利用图像分解网络,可准确分解低光图像,并利用反射率调整网络和光照调整网络,以从粗到细的方式调整分解后的低光图像,得到对应的反射率分量和光照分量,进而可有效提高获取增强图像的准确性。
引用:
复制并粘贴一种已设定好的引用格式,或利用其中一个链接导入到文献管理软件中。
GB/T 7714 | 李嘉锋 , 郝帅 , 况玲艳 et al. 低光图像的增强方法、装置、电子设备及存储介质 : CN202310028246.9[P]. | 2023-01-09 . |
MLA | 李嘉锋 et al. "低光图像的增强方法、装置、电子设备及存储介质" : CN202310028246.9. | 2023-01-09 . |
APA | 李嘉锋 , 郝帅 , 况玲艳 , 张菁 , 卓力 . 低光图像的增强方法、装置、电子设备及存储介质 : CN202310028246.9. | 2023-01-09 . |
导入链接 | NoteExpress RIS BibTex |
摘要 :
Most industrial Internet of Things (IoT) devices reduce the capture image size using high-ratio joint photographic experts group (JPEG) compression, saving storage space, and transmission bandwidth consumption. However, the resulting compression artifacts considerably affect the accuracy of subsequent tasks. Most artifact reduction algorithms do not consider the limitations of storage space and computing power of edge devices. In this study, a blind artifact reduction recurrent network (BARRN), which can reduce compression artifacts when the quality factors are unknown, is proposed. First, a structure based on recurrent convolution is designed for the specific requirements of industrial IoT image acquisition devices; the network can be scaled according to system resource constraints. Second, a more efficient convolution group, capable of adaptively processing different degradation levels, is proposed for optimal use of the limited computational resources. The experimental results demonstrate that the proposed BARRN can meet the needs of industrial systems with high computational efficiency.
关键词 :
Artifact reduction (AR) Artifact reduction (AR) image restoration image restoration blind JPEG compression recurrent convolution blind JPEG compression recurrent convolution industrial Internet of Things (IoT) industrial Internet of Things (IoT)
引用:
复制并粘贴一种已设定好的引用格式,或利用其中一个链接导入到文献管理软件中。
GB/T 7714 | Li, Jiafeng , Liu, Xiaoyu , Gao, Yuqi et al. BARRN: A Blind Image Compression Artifact Reduction Network for Industrial IoT Systems [J]. | IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS , 2023 , 19 (9) : 9479-9490 . |
MLA | Li, Jiafeng et al. "BARRN: A Blind Image Compression Artifact Reduction Network for Industrial IoT Systems" . | IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS 19 . 9 (2023) : 9479-9490 . |
APA | Li, Jiafeng , Liu, Xiaoyu , Gao, Yuqi , Zhuo, Li , Zhang, Jing . BARRN: A Blind Image Compression Artifact Reduction Network for Industrial IoT Systems . | IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS , 2023 , 19 (9) , 9479-9490 . |
导入链接 | NoteExpress RIS BibTex |
摘要 :
Images are often affected by insufficient illumination and suffer from degradation problems such as low brightness, noise, and color distortion, which results in reduced image quality. Existing low-light image enhancement methods based on Retinex theory decompose images into reflectance and illumination components, which are adjusted separately; however, the intrinsic connection between reflectance and illumination during decomposition is not considered, and multi-scale information during subsequent adjustments is inadequately utilized. In this study, we propose a low-light image enhancement network based on Retinex decomposition and multi-scale adjustment (RDMA), which performs initial decomposition followed by subsequent adjustment. We utilized prior knowledge to design the feature interaction module (FIM) and the feature fusion module (FFM) for image decomposition. Furthermore, a coarse-to-fine multi-scale network with residual channel and spatial attention (RCSA) was designed to remove noise from reflectance, suppress color distortion, preserve image details, and adjust the brightness of illumination. An evaluation of various low-light image datasets and comparisons with state-of-the-art methods showed that the proposed network is superior in terms of enhancement results.
关键词 :
Neural network Neural network Low-light image enhancement Low-light image enhancement Deep learning Deep learning Image restoration Image restoration
引用:
复制并粘贴一种已设定好的引用格式,或利用其中一个链接导入到文献管理软件中。
GB/T 7714 | Li, Jiafeng , Hao, Shuai , Li, Tianshuo et al. RDMA: low-light image enhancement based on retinex decomposition and multi-scale adjustment [J]. | INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS , 2023 , 15 (5) : 1693-1709 . |
MLA | Li, Jiafeng et al. "RDMA: low-light image enhancement based on retinex decomposition and multi-scale adjustment" . | INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS 15 . 5 (2023) : 1693-1709 . |
APA | Li, Jiafeng , Hao, Shuai , Li, Tianshuo , Zhuo, Li , Zhang, Jing . RDMA: low-light image enhancement based on retinex decomposition and multi-scale adjustment . | INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS , 2023 , 15 (5) , 1693-1709 . |
导入链接 | NoteExpress RIS BibTex |
摘要 :
RGB-thermal (RGB-T) dual-modal imaging significantly broadens the observation dimensions of the vision system. However, effectively harnessing the inherent advantages of different spectral bands and establishing fusion solutions tightly coupled with end tasks remains highly challenging. This article proposes a modality fusion approach that combines channel switching and cross-modal attention for RGB-T tracking. We explore the hierarchical fusion method adapted to the deep features of different abstraction levels. For low-level features, cross-modal information is introduced to increase the diversity of unimodal data by swapping feature channels with low computational costs. To exploit the semantic representation of high-level deep features and heterogeneous information in multimodal data, a fusion structure based on modal mutual attention is designed, which achieves effective enhancement of RGB-T fusion feature representation by integrating modal self-attention and cross-modal attention. Experimental results on public datasets show that the proposed algorithm is effective and computationally efficient to obtain the state-of-the-art tracking performance and real-time processing.
关键词 :
modal mutual attention modal mutual attention RGB-thermal (RGB-T) tracking RGB-thermal (RGB-T) tracking multimodal fusion multimodal fusion object fusion tracking object fusion tracking Channel swapping Channel swapping
引用:
复制并粘贴一种已设定好的引用格式,或利用其中一个链接导入到文献管理软件中。
GB/T 7714 | Luan, Tian , Zhang, Hui , Li, Jiafeng et al. Object Fusion Tracking for RGB-T Images via Channel Swapping and Modal Mutual Attention [J]. | IEEE SENSORS JOURNAL , 2023 , 23 (19) : 22930-22943 . |
MLA | Luan, Tian et al. "Object Fusion Tracking for RGB-T Images via Channel Swapping and Modal Mutual Attention" . | IEEE SENSORS JOURNAL 23 . 19 (2023) : 22930-22943 . |
APA | Luan, Tian , Zhang, Hui , Li, Jiafeng , Zhang, Jing , Zhuo, Li . Object Fusion Tracking for RGB-T Images via Channel Swapping and Modal Mutual Attention . | IEEE SENSORS JOURNAL , 2023 , 23 (19) , 22930-22943 . |
导入链接 | NoteExpress RIS BibTex |
摘要 :
Occluded pedestrian detection is very challenging in computer vision, because the pedestrians are frequently occluded by various obstacles or persons, especially in crowded scenarios. In this article, an occluded pedestrian detection method is proposed under a basic DEtection TRansformer (DETR) framework. Firstly, Dynamic Deformable Convolution (DyDC) and Gaussian Projection Channel Attention (GPCA) mechanism are proposed and embedded into the low layer and high layer of ResNet50 respectively, to improve the representation capability of features. Secondly, Cascade Transformer Decoder (CTD) is proposed, which aims to generate high-score queries, avoiding the influence of low-score queries in the decoder stage, further improving the detection accuracy. The proposed method is verified on three challenging datasets, namely CrowdHuman, WiderPerson, and TJU-DHD-pedestrian. The experimental results show that, compared with the state-of-the-art methods, it can obtain a superior detection performance.
关键词 :
Cascade transformer decoder Cascade transformer decoder Feature extraction Feature extraction occluded pedestrian detection occluded pedestrian detection Object detection Object detection Convolution Convolution dynamic deformable convolution dynamic deformable convolution Task analysis Task analysis Gaussian project channel attention mechanism Gaussian project channel attention mechanism Decoding Decoding Transformers Transformers Kernel Kernel
引用:
复制并粘贴一种已设定好的引用格式,或利用其中一个链接导入到文献管理软件中。
GB/T 7714 | Ma, Chunjie , Zhuo, Li , Li, Jiafeng et al. Cascade Transformer Decoder Based Occluded Pedestrian Detection With Dynamic Deformable Convolution and Gaussian Projection Channel Attention Mechanism [J]. | IEEE TRANSACTIONS ON MULTIMEDIA , 2023 , 25 : 1529-1537 . |
MLA | Ma, Chunjie et al. "Cascade Transformer Decoder Based Occluded Pedestrian Detection With Dynamic Deformable Convolution and Gaussian Projection Channel Attention Mechanism" . | IEEE TRANSACTIONS ON MULTIMEDIA 25 (2023) : 1529-1537 . |
APA | Ma, Chunjie , Zhuo, Li , Li, Jiafeng , Zhang, Yutong , Zhang, Jing . Cascade Transformer Decoder Based Occluded Pedestrian Detection With Dynamic Deformable Convolution and Gaussian Projection Channel Attention Mechanism . | IEEE TRANSACTIONS ON MULTIMEDIA , 2023 , 25 , 1529-1537 . |
导入链接 | NoteExpress RIS BibTex |
导出
数据: |
选中 到 |
格式: |