Query:
学者姓名:尹宝才
Refining:
Year
Type
Indexed by
Source
Complex
Co-Author
Language
Clean All
Abstract :
With the rapid development of deep learning models, great improvements have been achieved in the Visual Question Answering (VQA) field. However, modern VQA models are easily affected by language priors, which ignore image information and learn the superficial relationship between questions and answers, even in the optimal pre-training model. The main reason is that visual information is not fully extracted and utilized, which results in a domain gap between vision and language modalities to a certain extent. In order to mitigate the circumstances, we propose to extract dense captions (auxiliary semantic information) from images to enhance the visual information for reasoning and utilize them to release the gap between vision and language since the dense captions and the questions are from the same language modality (i.e., phrase or sentence). In this paper, we propose a novel dense caption-aware visual question answering model called DenseCapBert to enhance visual reasoning. Specifically, we generate dense captions for the images and propose a multimodal interaction mechanism to fuse dense captions, images, and questions in a unified framework, which makes the VQA models more robust. The experimental results on GQA, GQA-OOD, VQA v2, and VQA-CP v2 datasets show that dense captions are beneficial to improving the model generalization and our model effectively mitigates the language bias problem.
Keyword :
dense caption dense caption Cognition Cognition Visual question answering Visual question answering Question answering (information retrieval) Question answering (information retrieval) Semantics Semantics Feature extraction Feature extraction Detectors Detectors language prior language prior cross-modal fusion cross-modal fusion Data mining Data mining Visualization Visualization
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Bi, Yandong , Jiang, Huajie , Hu, Yongli et al. See and Learn More: Dense Caption-Aware Representation for Visual Question Answering [J]. | IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY , 2024 , 34 (2) : 1135-1146 . |
MLA | Bi, Yandong et al. "See and Learn More: Dense Caption-Aware Representation for Visual Question Answering" . | IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 34 . 2 (2024) : 1135-1146 . |
APA | Bi, Yandong , Jiang, Huajie , Hu, Yongli , Sun, Yanfeng , Yin, Baocai . See and Learn More: Dense Caption-Aware Representation for Visual Question Answering . | IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY , 2024 , 34 (2) , 1135-1146 . |
Export to | NoteExpress RIS BibTex |
Abstract :
In recent years, Graph Neural Networks (GNNs) have achieved unprecedented success in handling graph-structured data, thereby driving the development of numerous GNN-oriented techniques for inductive knowledge graph completion (KGC). A key limitation of existing methods, however, is their dependence on pre-defined aggregation functions, which lack the adaptability to diverse data, resulting in suboptimal performance on established benchmarks. Another challenge arises from the exponential increase in irrelated entities as the reasoning path lengthens, introducing unwarranted noise and consequently diminishing the model's generalization capabilities. To surmount these obstacles, we design an innovative framework that synergizes Multi-Level Sampling with an Adaptive Aggregation mechanism (MLSAA). Distinctively, our model couples GNNs with enhanced set transformers, enabling dynamic selection of the most appropriate aggregation function tailored to specific datasets and tasks. This adaptability significantly boosts both the model's flexibility and its expressive capacity. Additionally, we unveil a unique sampling strategy designed to selectively filter irrelevant entities, while retaining potentially beneficial targets throughout the reasoning process. We undertake an exhaustive evaluation of our novel inductive KGC method across three pivotal benchmark datasets and the experimental results corroborate the efficacy of MLSAA.
Keyword :
multi-level sampling multi-level sampling Inductive knowledge graph completion Inductive knowledge graph completion adaptive aggregation adaptive aggregation
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Sun, Kai , Jiang, Huajie , Hu, Yongli et al. Incorporating Multi-Level Sampling with Adaptive Aggregation for Inductive Knowledge Graph Completion [J]. | ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA , 2024 , 18 (5) . |
MLA | Sun, Kai et al. "Incorporating Multi-Level Sampling with Adaptive Aggregation for Inductive Knowledge Graph Completion" . | ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA 18 . 5 (2024) . |
APA | Sun, Kai , Jiang, Huajie , Hu, Yongli , Yin, Baocai . Incorporating Multi-Level Sampling with Adaptive Aggregation for Inductive Knowledge Graph Completion . | ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA , 2024 , 18 (5) . |
Export to | NoteExpress RIS BibTex |
Abstract :
Generalized zero-shot learning(GZSL) aims to recognize images from seen and unseen classes with side information, such as manually annotated attribute vectors. Traditional methods focus on mapping images and semantics into a common latent space, thus achieving the visual-semantics alignment. Since the unseen classes are unavailable during training, there is a serious problem of recognition bias, which will tend to recognize unseen classes as seen classes. To solve this problem, we propose a Domain-aware Prototype Network(DPN), which splits the GZSL problem into the seen class recognition and unseen class recognition problem. For the seen classes, we design a domain-aware prototype learning branch with a dual attention feature encoder to capture the essential visual information, which aims to recognize the seen classes and discriminate the novel categories. To further recognize the fine-grained unseen classes, a visual-semantic embedding branch is designed, which aims to align the visual and semantic information for unseen-class recognition. Through the multi-task learning of the prototype learning branch and visual-semantic embedding branch, our model can achieve excellent performance on three popular GZSL datasets.
Keyword :
transformer-based dual attention transformer-based dual attention Semantics Semantics domain detection domain detection Generalized zero-shot learning Generalized zero-shot learning Visualization Visualization Task analysis Task analysis Prototypes Prototypes Feature extraction Feature extraction Image recognition Image recognition Transformers Transformers
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Hu, Yongli , Feng, Lincong , Jiang, Huajie et al. Domain-Aware Prototype Network for Generalized Zero-Shot Learning [J]. | IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY , 2024 , 34 (5) : 3180-3191 . |
MLA | Hu, Yongli et al. "Domain-Aware Prototype Network for Generalized Zero-Shot Learning" . | IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 34 . 5 (2024) : 3180-3191 . |
APA | Hu, Yongli , Feng, Lincong , Jiang, Huajie , Liu, Mengting , Yin, Baocai . Domain-Aware Prototype Network for Generalized Zero-Shot Learning . | IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY , 2024 , 34 (5) , 3180-3191 . |
Export to | NoteExpress RIS BibTex |
Abstract :
Nowcasting the vehicular delay at intersections of road networks not only optimizes the signal timing at the intersections, but also alleviates traffic congestion effectively. Existing research work on the vehicular delay nowcasting involves two issues: low effectiveness on low-ping frequency trajectory data, and low efficiency for the nowcasting task. Inspired by recent works on hypergraphs which explore the high-order relationship of trajectory points, we propose an incremental hypergraph learning framework for nowcasting the control delay of vehicles from low-ping frequency trajectories. The framework characterizes the relationship among trajectory points using multi-kernel learning of multiple attributes of trajectory points. Then, it predicts the unknown trajectory points by incrementally constructing hypergraphs of both observed and unknown points and examining the total similarities of hyperedges associated with all the points. Finally, it evaluates the control delay of each trajectory precisely and efficiently based on the timestamp difference of critical points. We conduct experiments on the Didi-Chengdu dataset with 10-second ping frequency. Our framework outperforms state-of-the-art methods in both the accuracy and efficiency (with 6 seconds at each intersection averagely) for the control delay nowcasting task. That facilitates our framework for many real-world traffic scenarios.
Keyword :
incremental hypergraph learning incremental hypergraph learning trajectory prediction trajectory prediction Trajectory Trajectory Hidden Markov models Hidden Markov models Frequency control Frequency control Predictive models Predictive models multi-kernel affinity learning multi-kernel affinity learning Task analysis Task analysis Delays Delays Markov processes Markov processes Control delay nowcasting Control delay nowcasting
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Wang, Shaofan , Wang, Weixing , Huang, Shiyu et al. Nowcasting the Vehicular Control Delay From Low-Ping Frequency Trajectories via Incremental Hypergraph Learning [J]. | IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY , 2024 , 73 (1) : 185-199 . |
MLA | Wang, Shaofan et al. "Nowcasting the Vehicular Control Delay From Low-Ping Frequency Trajectories via Incremental Hypergraph Learning" . | IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY 73 . 1 (2024) : 185-199 . |
APA | Wang, Shaofan , Wang, Weixing , Huang, Shiyu , Han, Yuwei , Wei, Fuhao , Yin, Baocai . Nowcasting the Vehicular Control Delay From Low-Ping Frequency Trajectories via Incremental Hypergraph Learning . | IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY , 2024 , 73 (1) , 185-199 . |
Export to | NoteExpress RIS BibTex |
Abstract :
Inductive Knowledge Graph Completion (KGC) poses challenges due to the absence of emerging entities during training. Current methods utilize Graph Neural Networks (GNNs) to learn and propagate entity representations, achieving notable performance. However, these approaches primarily focus on chain-based logical rules, limiting their ability to capture the rich semantics of knowledge graphs. To address this challenge, we propose to generate Graph-based Rules for Enhancing Logical Reasoning (GRELR), a novel framework that leverages graph-based rules for enhanced reasoning. GRELR formulates graph-based rules by extracting relevant subgraphs and fuses them to construct comprehensive relation representations. This approach, combined with subgraph reasoning, significantly improves inference capabilities and showcases the potential of graph-based rules in inductive KGC. To demonstrate the effectiveness of the GRELR framework, we conduct experiments on three benchmark datasets, and our approach achieves state-of-the-art performance.
Keyword :
Graph-based Rules Graph-based Rules Knowledge Graphs Knowledge Graphs Inductive Knowledge Graph Completion Inductive Knowledge Graph Completion Subgraph reasoning Subgraph reasoning
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Sun, Kai , Jiang, Huajie , Hu, Yongli et al. Generating Graph-Based Rules for Enhancing Logical Reasoning [J]. | ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XII, ICIC 2024 , 2024 , 14873 : 143-156 . |
MLA | Sun, Kai et al. "Generating Graph-Based Rules for Enhancing Logical Reasoning" . | ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XII, ICIC 2024 14873 (2024) : 143-156 . |
APA | Sun, Kai , Jiang, Huajie , Hu, Yongli , Yin, Baocai . Generating Graph-Based Rules for Enhancing Logical Reasoning . | ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XII, ICIC 2024 , 2024 , 14873 , 143-156 . |
Export to | NoteExpress RIS BibTex |
Abstract :
With the widespread adoption of deep learning, the performance of Visual Question Answering (VQA) tasks has seen significant improvements. Nonetheless, this progress has unveiled significant challenges concerning their credibility, primarily due to the susceptibility of linguistic biases. Such biases can result in considerable declines in performance when faced with out-of-distribution scenarios. Therefore, various debiasing methods have been developed to reduce the impact of linguistic biases, where causal theory-based methods have attracted great attention due to their theoretical underpinnings and superior performance. However, traditional debiased causal strategies typically remove biases through simple subtraction, which neglects the fine-grained bias information, resulting in incomplete debiasing. To tackle this issue, we propose a fine-grained debiasing method named as VQA-PDF, which utilizes the features of the base model to guide the identification of biased features, purifying the debiased features and aiding the base learning process. This method has shown significant improvements on VQA-CP v2, VQA v2 and VQA-CE datasets.
Keyword :
Visual Question Answering Visual Question Answering Language Bias Language Bias Causal Strategy Causal Strategy
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Bi, Yandong , Jiang, Huajie , Liu, Jing et al. VQA-PDF: Purifying Debiased Features for Robust Visual Question Answering Task [J]. | ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XII, ICIC 2024 , 2024 , 14873 : 264-277 . |
MLA | Bi, Yandong et al. "VQA-PDF: Purifying Debiased Features for Robust Visual Question Answering Task" . | ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XII, ICIC 2024 14873 (2024) : 264-277 . |
APA | Bi, Yandong , Jiang, Huajie , Liu, Jing , Liu, Mengting , Hu, Yongli , Yin, Baocai . VQA-PDF: Purifying Debiased Features for Robust Visual Question Answering Task . | ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XII, ICIC 2024 , 2024 , 14873 , 264-277 . |
Export to | NoteExpress RIS BibTex |
Abstract :
Referring Image Segmentation (RIS) is an essential topic in visual language understanding that aims to segment the target instance in the image referred to by the language description. Conventional RIS methods have relied on expensive manual annotations involving the triplet (image-text-mask), with the acquisition of text annotations posing the most formidable challenge. To eliminate the heavy dependence on human annotations, we propose a novel RIS method, the Referring Image Segmentation without Text Annotations (WoTA), which substitutes textual annotations by generating the pseudo-query through the utilization of visual information. Specifically, we design a novel training-testing scheme that introduces a Pseudo-Query Generation Scheme (PQGS) in the training phase, which relies on the pre-trained cross-modal knowledge in CLIP to generate the pseudo-query related to global and local visual information. In the testing phase, the CLIP text encoder is directly applied to the test statements to generate real query language features. Extensive experiments on several benchmark datasets demonstrate the advantage of the proposed WoTA over several zero-shot baselines of the task and even the weakly supervised referring image segmentation method.
Keyword :
Without Text Annotation Without Text Annotation Pseudo-Query Pseudo-Query Referring Image Segmentation Referring Image Segmentation
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Liu, Jing , Jiang, Huajie , Bi, Yandong et al. Referring Image Segmentation Without Text Annotations [J]. | ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XII, ICIC 2024 , 2024 , 14873 : 278-293 . |
MLA | Liu, Jing et al. "Referring Image Segmentation Without Text Annotations" . | ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XII, ICIC 2024 14873 (2024) : 278-293 . |
APA | Liu, Jing , Jiang, Huajie , Bi, Yandong , Hu, Yongli , Yin, Baocai . Referring Image Segmentation Without Text Annotations . | ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XII, ICIC 2024 , 2024 , 14873 , 278-293 . |
Export to | NoteExpress RIS BibTex |
Abstract :
Long Document Classification (LDC) has attracted great attention in Natural Language Processing and achieved considerable progress owing to the large-scale pre-trained language models. In spite of this, as a different problem from the traditional text classification, LDC is far from being settled. Long documents, such as news and articles, generally have more than thousands of words with complex structures. Moreover, compared with flat text, long documents usually contain multi-modal content of images, which provide rich information but not yet being utilized for classification. In this article, we propose a novel cross-modal method for long document classification, in which multiple granularity feature shifting networks are proposed to integrate the multi-scale text and visual features of long documents adaptively. Additionally, a multi-modal collaborative pooling block is proposed to eliminate redundant fine-grained text features and simultaneously reduce the computational complexity. To verify the effectiveness of the proposed model, we conduct experiments on the Food101 dataset and two constructed multi-modal long document datasets. The experimental results show that the proposed cross-modal method outperforms the single-modal text methods and defeats the state-of-the-art related multi-modal baselines.
Keyword :
Long document classification Long document classification multi-modal collaborative pooling multi-modal collaborative pooling cross-modal multi-granularity interactive fusion cross-modal multi-granularity interactive fusion
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Liu, Tengfei , Hu, Yongli , Gao, Junbin et al. Cross-modal Multiple Granularity Interactive Fusion Network for Long Document Classification [J]. | ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA , 2024 , 18 (4) . |
MLA | Liu, Tengfei et al. "Cross-modal Multiple Granularity Interactive Fusion Network for Long Document Classification" . | ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA 18 . 4 (2024) . |
APA | Liu, Tengfei , Hu, Yongli , Gao, Junbin , Sun, Yanfeng , Yin, Baocai . Cross-modal Multiple Granularity Interactive Fusion Network for Long Document Classification . | ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA , 2024 , 18 (4) . |
Export to | NoteExpress RIS BibTex |
Abstract :
Vehicle behavior analysis has gradually developed by utilizing trajectories and motion features to characterize on-road behavior. However, the existing methods analyze the behavior of each vehicle individually, ignoring the interaction between vehicles. According to the theory of interactive cognition, vehicle-to-vehicle interaction is an indispensable feature for future autonomous driving, just as interaction is universally required for traditional driving. Therefore, we place the vehicle behavior analysis in the context of the vehicle interaction scene, where the self-vehicle should observe the behavior category and degree of the other-vehicle that is about to interact with itself, in order to predict whether the other-vehicle will pass through the intersection first or later, and then decide to pass through or wait. Inspired by the interactive cognition, we develop a general framework of Structured Vehicle Behavior Analysis (StruVBA) and derive a new model of Structured Fully Convolutional Networks (StruFCN). Moreover, both Intersection over Union (IoU) and False Negative Rate (FNR) are adopted to measure the similarity between the predicted behavior degree and the ground truth. Experimental results illustrate that the proposed method achieves higher prediction accuracy than most existing methods, while predicting vehicle behavior with richer visual meaning. In addition, it also provides an example of modeling the interaction between vehicles and a verification for interaction cognition theory as well.
Keyword :
Cognition Cognition vehicle-to-vehicle interaction vehicle-to-vehicle interaction Structured vehicle behavior analysis Structured vehicle behavior analysis Analytical models Analytical models Roads Roads Junctions Junctions Vehicular ad hoc networks Vehicular ad hoc networks structured fully convolutional networks structured fully convolutional networks structured label structured label Trajectory Trajectory interactive cognition interactive cognition Turning Turning
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Mou, Luntian , Xie, Haitao , Mao, Shasha et al. Image-Based Structured Vehicle Behavior Analysis Inspired by Interactive Cognition [J]. | IEEE TRANSACTIONS ON MULTIMEDIA , 2024 , 26 : 9121-9134 . |
MLA | Mou, Luntian et al. "Image-Based Structured Vehicle Behavior Analysis Inspired by Interactive Cognition" . | IEEE TRANSACTIONS ON MULTIMEDIA 26 (2024) : 9121-9134 . |
APA | Mou, Luntian , Xie, Haitao , Mao, Shasha , Yan, Dandan , Ma, Nan , Yin, Baocai et al. Image-Based Structured Vehicle Behavior Analysis Inspired by Interactive Cognition . | IEEE TRANSACTIONS ON MULTIMEDIA , 2024 , 26 , 9121-9134 . |
Export to | NoteExpress RIS BibTex |
Abstract :
Although object detection algorithms based on deep learning have been widely used in many scenarios, they face challenges under some degraded conditions, such as low-light. A conventional solution is that image enhancement approaches are used as a separate pre-processing module to improve the quality of degraded image. However, this two-step approach makes it difficult to unify the goals of enhancement and detection, that is, low-light enhancement operations are not always helpful for subsequent object detection. Recently, some works try to integrate enhancement and detection in an end-to-end network, but still suffer from complex network structure, training convergence problem and demanding reference images. To address above problems, a plug-and-play image enhancement model is proposed in this paper, namely, low-light image enhancement (LLIE) model, which can be easily embedded into some off-the-shelf object detection methods in an end-to-end manner. LLIE is composed of a parameter estimation module and image processing module. The former learns to regress lighting enhancement parameters according to the feedback of detection network, and the latter enhances degraded image adaptively to promote subsequent detection model under low-light condition. Extensive object detection experiments on several low-light image data sets show that the performance of detector is significantly improved when LLIE is integrated.
Keyword :
Plug-and-play Plug-and-play End-to-End End-to-End Low-light image enhancement Low-light image enhancement Object detection Object detection
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Yuan, Jiaojiao , Hu, Yongli , Sun, Yanfeng et al. A plug-and-play image enhancement model for end-to-end object detection in low-light condition [J]. | MULTIMEDIA SYSTEMS , 2024 , 30 (1) . |
MLA | Yuan, Jiaojiao et al. "A plug-and-play image enhancement model for end-to-end object detection in low-light condition" . | MULTIMEDIA SYSTEMS 30 . 1 (2024) . |
APA | Yuan, Jiaojiao , Hu, Yongli , Sun, Yanfeng , Wang, Boyue , Yin, Baocai . A plug-and-play image enhancement model for end-to-end object detection in low-light condition . | MULTIMEDIA SYSTEMS , 2024 , 30 (1) . |
Export to | NoteExpress RIS BibTex |
Export
Results: |
Selected to |
Format: |