Query:
学者姓名:尹宝才
Refining:
Year
Type
Indexed by
Source
Complex
Former Name
Co-Author
Language
Clean All
Abstract :
Nowcasting the vehicular delay at intersections of road networks not only optimizes the signal timing at the intersections, but also alleviates traffic congestion effectively. Existing research work on the vehicular delay nowcasting involves two issues: low effectiveness on low-ping frequency trajectory data, and low efficiency for the nowcasting task. Inspired by recent works on hypergraphs which explore the high-order relationship of trajectory points, we propose an incremental hypergraph learning framework for nowcasting the control delay of vehicles from low-ping frequency trajectories. The framework characterizes the relationship among trajectory points using multi-kernel learning of multiple attributes of trajectory points. Then, it predicts the unknown trajectory points by incrementally constructing hypergraphs of both observed and unknown points and examining the total similarities of hyperedges associated with all the points. Finally, it evaluates the control delay of each trajectory precisely and efficiently based on the timestamp difference of critical points. We conduct experiments on the Didi-Chengdu dataset with 10-second ping frequency. Our framework outperforms state-of-the-art methods in both the accuracy and efficiency (with 6 seconds at each intersection averagely) for the control delay nowcasting task. That facilitates our framework for many real-world traffic scenarios.
Keyword :
incremental hypergraph learning incremental hypergraph learning trajectory prediction trajectory prediction Trajectory Trajectory Hidden Markov models Hidden Markov models Frequency control Frequency control Predictive models Predictive models multi-kernel affinity learning multi-kernel affinity learning Task analysis Task analysis Delays Delays Markov processes Markov processes Control delay nowcasting Control delay nowcasting
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Wang, Shaofan , Wang, Weixing , Huang, Shiyu et al. Nowcasting the Vehicular Control Delay From Low-Ping Frequency Trajectories via Incremental Hypergraph Learning [J]. | IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY , 2024 , 73 (1) : 185-199 . |
MLA | Wang, Shaofan et al. "Nowcasting the Vehicular Control Delay From Low-Ping Frequency Trajectories via Incremental Hypergraph Learning" . | IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY 73 . 1 (2024) : 185-199 . |
APA | Wang, Shaofan , Wang, Weixing , Huang, Shiyu , Han, Yuwei , Wei, Fuhao , Yin, Baocai . Nowcasting the Vehicular Control Delay From Low-Ping Frequency Trajectories via Incremental Hypergraph Learning . | IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY , 2024 , 73 (1) , 185-199 . |
Export to | NoteExpress RIS BibTex |
Abstract :
With the rapid development of deep learning models, great improvements have been achieved in the Visual Question Answering (VQA) field. However, modern VQA models are easily affected by language priors, which ignore image information and learn the superficial relationship between questions and answers, even in the optimal pre-training model. The main reason is that visual information is not fully extracted and utilized, which results in a domain gap between vision and language modalities to a certain extent. In order to mitigate the circumstances, we propose to extract dense captions (auxiliary semantic information) from images to enhance the visual information for reasoning and utilize them to release the gap between vision and language since the dense captions and the questions are from the same language modality (i.e., phrase or sentence). In this paper, we propose a novel dense caption-aware visual question answering model called DenseCapBert to enhance visual reasoning. Specifically, we generate dense captions for the images and propose a multimodal interaction mechanism to fuse dense captions, images, and questions in a unified framework, which makes the VQA models more robust. The experimental results on GQA, GQA-OOD, VQA v2, and VQA-CP v2 datasets show that dense captions are beneficial to improving the model generalization and our model effectively mitigates the language bias problem.
Keyword :
dense caption dense caption Cognition Cognition Visual question answering Visual question answering Question answering (information retrieval) Question answering (information retrieval) Semantics Semantics Feature extraction Feature extraction Detectors Detectors language prior language prior cross-modal fusion cross-modal fusion Data mining Data mining Visualization Visualization
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Bi, Yandong , Jiang, Huajie , Hu, Yongli et al. See and Learn More: Dense Caption-Aware Representation for Visual Question Answering [J]. | IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY , 2024 , 34 (2) : 1135-1146 . |
MLA | Bi, Yandong et al. "See and Learn More: Dense Caption-Aware Representation for Visual Question Answering" . | IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 34 . 2 (2024) : 1135-1146 . |
APA | Bi, Yandong , Jiang, Huajie , Hu, Yongli , Sun, Yanfeng , Yin, Baocai . See and Learn More: Dense Caption-Aware Representation for Visual Question Answering . | IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY , 2024 , 34 (2) , 1135-1146 . |
Export to | NoteExpress RIS BibTex |
Abstract :
Inductive Knowledge Graph Completion (KGC) poses challenges due to the absence of emerging entities during training. Current methods utilize Graph Neural Networks (GNNs) to learn and propagate entity representations, achieving notable performance. However, these approaches primarily focus on chain-based logical rules, limiting their ability to capture the rich semantics of knowledge graphs. To address this challenge, we propose to generate Graph-based Rules for Enhancing Logical Reasoning (GRELR), a novel framework that leverages graph-based rules for enhanced reasoning. GRELR formulates graph-based rules by extracting relevant subgraphs and fuses them to construct comprehensive relation representations. This approach, combined with subgraph reasoning, significantly improves inference capabilities and showcases the potential of graph-based rules in inductive KGC. To demonstrate the effectiveness of the GRELR framework, we conduct experiments on three benchmark datasets, and our approach achieves state-of-the-art performance.
Keyword :
Graph-based Rules Graph-based Rules Knowledge Graphs Knowledge Graphs Inductive Knowledge Graph Completion Inductive Knowledge Graph Completion Subgraph reasoning Subgraph reasoning
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Sun, Kai , Jiang, Huajie , Hu, Yongli et al. Generating Graph-Based Rules for Enhancing Logical Reasoning [J]. | ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XII, ICIC 2024 , 2024 , 14873 : 143-156 . |
MLA | Sun, Kai et al. "Generating Graph-Based Rules for Enhancing Logical Reasoning" . | ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XII, ICIC 2024 14873 (2024) : 143-156 . |
APA | Sun, Kai , Jiang, Huajie , Hu, Yongli , Yin, Baocai . Generating Graph-Based Rules for Enhancing Logical Reasoning . | ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XII, ICIC 2024 , 2024 , 14873 , 143-156 . |
Export to | NoteExpress RIS BibTex |
Abstract :
With the widespread adoption of deep learning, the performance of Visual Question Answering (VQA) tasks has seen significant improvements. Nonetheless, this progress has unveiled significant challenges concerning their credibility, primarily due to the susceptibility of linguistic biases. Such biases can result in considerable declines in performance when faced with out-of-distribution scenarios. Therefore, various debiasing methods have been developed to reduce the impact of linguistic biases, where causal theory-based methods have attracted great attention due to their theoretical underpinnings and superior performance. However, traditional debiased causal strategies typically remove biases through simple subtraction, which neglects the fine-grained bias information, resulting in incomplete debiasing. To tackle this issue, we propose a fine-grained debiasing method named as VQA-PDF, which utilizes the features of the base model to guide the identification of biased features, purifying the debiased features and aiding the base learning process. This method has shown significant improvements on VQA-CP v2, VQA v2 and VQA-CE datasets.
Keyword :
Visual Question Answering Visual Question Answering Language Bias Language Bias Causal Strategy Causal Strategy
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Bi, Yandong , Jiang, Huajie , Liu, Jing et al. VQA-PDF: Purifying Debiased Features for Robust Visual Question Answering Task [J]. | ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XII, ICIC 2024 , 2024 , 14873 : 264-277 . |
MLA | Bi, Yandong et al. "VQA-PDF: Purifying Debiased Features for Robust Visual Question Answering Task" . | ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XII, ICIC 2024 14873 (2024) : 264-277 . |
APA | Bi, Yandong , Jiang, Huajie , Liu, Jing , Liu, Mengting , Hu, Yongli , Yin, Baocai . VQA-PDF: Purifying Debiased Features for Robust Visual Question Answering Task . | ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XII, ICIC 2024 , 2024 , 14873 , 264-277 . |
Export to | NoteExpress RIS BibTex |
Abstract :
Referring Image Segmentation (RIS) is an essential topic in visual language understanding that aims to segment the target instance in the image referred to by the language description. Conventional RIS methods have relied on expensive manual annotations involving the triplet (image-text-mask), with the acquisition of text annotations posing the most formidable challenge. To eliminate the heavy dependence on human annotations, we propose a novel RIS method, the Referring Image Segmentation without Text Annotations (WoTA), which substitutes textual annotations by generating the pseudo-query through the utilization of visual information. Specifically, we design a novel training-testing scheme that introduces a Pseudo-Query Generation Scheme (PQGS) in the training phase, which relies on the pre-trained cross-modal knowledge in CLIP to generate the pseudo-query related to global and local visual information. In the testing phase, the CLIP text encoder is directly applied to the test statements to generate real query language features. Extensive experiments on several benchmark datasets demonstrate the advantage of the proposed WoTA over several zero-shot baselines of the task and even the weakly supervised referring image segmentation method.
Keyword :
Without Text Annotation Without Text Annotation Pseudo-Query Pseudo-Query Referring Image Segmentation Referring Image Segmentation
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Liu, Jing , Jiang, Huajie , Bi, Yandong et al. Referring Image Segmentation Without Text Annotations [J]. | ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XII, ICIC 2024 , 2024 , 14873 : 278-293 . |
MLA | Liu, Jing et al. "Referring Image Segmentation Without Text Annotations" . | ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XII, ICIC 2024 14873 (2024) : 278-293 . |
APA | Liu, Jing , Jiang, Huajie , Bi, Yandong , Hu, Yongli , Yin, Baocai . Referring Image Segmentation Without Text Annotations . | ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XII, ICIC 2024 , 2024 , 14873 , 278-293 . |
Export to | NoteExpress RIS BibTex |
Abstract :
Long Document Classification (LDC) has attracted great attention in Natural Language Processing and achieved considerable progress owing to the large-scale pre-trained language models. In spite of this, as a different problem from the traditional text classification, LDC is far from being settled. Long documents, such as news and articles, generally have more than thousands of words with complex structures. Moreover, compared with flat text, long documents usually contain multi-modal content of images, which provide rich information but not yet being utilized for classification. In this article, we propose a novel cross-modal method for long document classification, in which multiple granularity feature shifting networks are proposed to integrate the multi-scale text and visual features of long documents adaptively. Additionally, a multi-modal collaborative pooling block is proposed to eliminate redundant fine-grained text features and simultaneously reduce the computational complexity. To verify the effectiveness of the proposed model, we conduct experiments on the Food101 dataset and two constructed multi-modal long document datasets. The experimental results show that the proposed cross-modal method outperforms the single-modal text methods and defeats the state-of-the-art related multi-modal baselines.
Keyword :
Long document classification Long document classification multi-modal collaborative pooling multi-modal collaborative pooling cross-modal multi-granularity interactive fusion cross-modal multi-granularity interactive fusion
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Liu, Tengfei , Hu, Yongli , Gao, Junbin et al. Cross-modal Multiple Granularity Interactive Fusion Network for Long Document Classification [J]. | ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA , 2024 , 18 (4) . |
MLA | Liu, Tengfei et al. "Cross-modal Multiple Granularity Interactive Fusion Network for Long Document Classification" . | ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA 18 . 4 (2024) . |
APA | Liu, Tengfei , Hu, Yongli , Gao, Junbin , Sun, Yanfeng , Yin, Baocai . Cross-modal Multiple Granularity Interactive Fusion Network for Long Document Classification . | ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA , 2024 , 18 (4) . |
Export to | NoteExpress RIS BibTex |
Abstract :
Vehicle behavior analysis has gradually developed by utilizing trajectories and motion features to characterize on-road behavior. However, the existing methods analyze the behavior of each vehicle individually, ignoring the interaction between vehicles. According to the theory of interactive cognition, vehicle-to-vehicle interaction is an indispensable feature for future autonomous driving, just as interaction is universally required for traditional driving. Therefore, we place the vehicle behavior analysis in the context of the vehicle interaction scene, where the self-vehicle should observe the behavior category and degree of the other-vehicle that is about to interact with itself, in order to predict whether the other-vehicle will pass through the intersection first or later, and then decide to pass through or wait. Inspired by the interactive cognition, we develop a general framework of Structured Vehicle Behavior Analysis (StruVBA) and derive a new model of Structured Fully Convolutional Networks (StruFCN). Moreover, both Intersection over Union (IoU) and False Negative Rate (FNR) are adopted to measure the similarity between the predicted behavior degree and the ground truth. Experimental results illustrate that the proposed method achieves higher prediction accuracy than most existing methods, while predicting vehicle behavior with richer visual meaning. In addition, it also provides an example of modeling the interaction between vehicles and a verification for interaction cognition theory as well.
Keyword :
Cognition Cognition vehicle-to-vehicle interaction vehicle-to-vehicle interaction Structured vehicle behavior analysis Structured vehicle behavior analysis Analytical models Analytical models Roads Roads Junctions Junctions Vehicular ad hoc networks Vehicular ad hoc networks structured fully convolutional networks structured fully convolutional networks structured label structured label Trajectory Trajectory interactive cognition interactive cognition Turning Turning
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Mou, Luntian , Xie, Haitao , Mao, Shasha et al. Image-Based Structured Vehicle Behavior Analysis Inspired by Interactive Cognition [J]. | IEEE TRANSACTIONS ON MULTIMEDIA , 2024 , 26 : 9121-9134 . |
MLA | Mou, Luntian et al. "Image-Based Structured Vehicle Behavior Analysis Inspired by Interactive Cognition" . | IEEE TRANSACTIONS ON MULTIMEDIA 26 (2024) : 9121-9134 . |
APA | Mou, Luntian , Xie, Haitao , Mao, Shasha , Yan, Dandan , Ma, Nan , Yin, Baocai et al. Image-Based Structured Vehicle Behavior Analysis Inspired by Interactive Cognition . | IEEE TRANSACTIONS ON MULTIMEDIA , 2024 , 26 , 9121-9134 . |
Export to | NoteExpress RIS BibTex |
Abstract :
Although object detection algorithms based on deep learning have been widely used in many scenarios, they face challenges under some degraded conditions, such as low-light. A conventional solution is that image enhancement approaches are used as a separate pre-processing module to improve the quality of degraded image. However, this two-step approach makes it difficult to unify the goals of enhancement and detection, that is, low-light enhancement operations are not always helpful for subsequent object detection. Recently, some works try to integrate enhancement and detection in an end-to-end network, but still suffer from complex network structure, training convergence problem and demanding reference images. To address above problems, a plug-and-play image enhancement model is proposed in this paper, namely, low-light image enhancement (LLIE) model, which can be easily embedded into some off-the-shelf object detection methods in an end-to-end manner. LLIE is composed of a parameter estimation module and image processing module. The former learns to regress lighting enhancement parameters according to the feedback of detection network, and the latter enhances degraded image adaptively to promote subsequent detection model under low-light condition. Extensive object detection experiments on several low-light image data sets show that the performance of detector is significantly improved when LLIE is integrated.
Keyword :
Plug-and-play Plug-and-play End-to-End End-to-End Low-light image enhancement Low-light image enhancement Object detection Object detection
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Yuan, Jiaojiao , Hu, Yongli , Sun, Yanfeng et al. A plug-and-play image enhancement model for end-to-end object detection in low-light condition [J]. | MULTIMEDIA SYSTEMS , 2024 , 30 (1) . |
MLA | Yuan, Jiaojiao et al. "A plug-and-play image enhancement model for end-to-end object detection in low-light condition" . | MULTIMEDIA SYSTEMS 30 . 1 (2024) . |
APA | Yuan, Jiaojiao , Hu, Yongli , Sun, Yanfeng , Wang, Boyue , Yin, Baocai . A plug-and-play image enhancement model for end-to-end object detection in low-light condition . | MULTIMEDIA SYSTEMS , 2024 , 30 (1) . |
Export to | NoteExpress RIS BibTex |
Abstract :
The goal of mixed-modality clustering, which differs from typical multi-modality/view clustering, is to divide samples derived from various modalities into several clusters. This task has to solve two critical semantic gap problems: i) how to generate the missing modalities without the pairwise-modality data; and ii) how to align the representations of heterogeneous modalities. To tackle the above problems, this paper proposes a novel mixed-modality clustering model, which integrates the missing-modality generation and the heterogeneous modality alignment into a unified framework. During the missing-modality generation process, a bidirectional mapping is established between different modalities, enabling generation of preliminary representations for the missing-modality using information from another modality. Then the intra-modality bipartite graphs are constructed to help generate better missing-modality representations by weighted aggregating existing intra-modality neighbors. In this way, a pairwise-modality representation for each sample can be obtained. In the process of heterogeneous modality alignment, each modality is modelled as a graph to capture the global structure among intra-modality samples and is aligned against the heterogeneous modality representations through the adaptive heterogeneous graph matching module. Experimental results on three public datasets show the effectiveness of the proposed model compared to multiple state-of-the-art multi-modality/view clustering methods.
Keyword :
multi-view clustering multi-view clustering Web sites Web sites adaptive graph structure learning adaptive graph structure learning Data models Data models Bipartite graph Bipartite graph Semantics Semantics Correlation Correlation heterogeneous graph matching heterogeneous graph matching Task analysis Task analysis Feature extraction Feature extraction Mixed-modality clustering Mixed-modality clustering
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | He, Xiaxia , Wang, Boyue , Gao, Junbin et al. Mixed-Modality Clustering via Generative Graph Structure Matching [J]. | IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING , 2024 , 36 (12) : 8773-8786 . |
MLA | He, Xiaxia et al. "Mixed-Modality Clustering via Generative Graph Structure Matching" . | IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 36 . 12 (2024) : 8773-8786 . |
APA | He, Xiaxia , Wang, Boyue , Gao, Junbin , Wang, Qianqian , Hu, Yongli , Yin, Baocai . Mixed-Modality Clustering via Generative Graph Structure Matching . | IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING , 2024 , 36 (12) , 8773-8786 . |
Export to | NoteExpress RIS BibTex |
Abstract :
A better knowledge-based visual question answering (KBVQA) model needs to rely on visual features, question features, and related external knowledge to solve an open visual question answering task. Although the existing knowledge-based visual question answering works have achieved some accomplishments, there are still the following challenges: 1) There is a serious lack of visual feature information. Image information is worth a thousand words. Only relying on the converted salient text information is difficult to express the original rich information of the image. 2) The external knowledge acquired is not comprehensive enough, and there is a lack of relevant knowledge directly retrieved by visual feature information. To solve these challenges, we propose a Visual Information-Guided knowledge-based visual question answering (VIG) model. It fully considers the utilization of visual features information. Specifically: 1) We introduce multi-granularity visual information that can comprehensively characterize visual feature information. 2) We consider not only the knowledge retrieved through text information but also the knowledge directly retrieved from visual feature information. Finally, we feed the visual features and retrieved multiple text knowledge into an encoder-decoder module to generate an answer. We perform extensive experiments on the OKVQA dataset and achieve state-of-the-art performance of 60.27% accuracy.
Keyword :
Visual Information-Guided Visual Information-Guided External Knowledge External Knowledge Knowledge-Based VQA Knowledge-Based VQA
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Liu, Heng , Wang, Boyue , Sun, Yanfeng et al. VIG: Visual Information-Guided Knowledge-Based Visual Question Answering [J]. | PROCEEDINGS OF THE 2024 27 TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, CSCWD 2024 , 2024 : 1086-1091 . |
MLA | Liu, Heng et al. "VIG: Visual Information-Guided Knowledge-Based Visual Question Answering" . | PROCEEDINGS OF THE 2024 27 TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, CSCWD 2024 (2024) : 1086-1091 . |
APA | Liu, Heng , Wang, Boyue , Sun, Yanfeng , Li, Xiaoyan , Hu, Yongli , Yin, Baocai . VIG: Visual Information-Guided Knowledge-Based Visual Question Answering . | PROCEEDINGS OF THE 2024 27 TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, CSCWD 2024 , 2024 , 1086-1091 . |
Export to | NoteExpress RIS BibTex |
Export
Results: |
Selected to |
Format: |