收录:
摘要:
Recently emerged RGB-D sensors provide great promise for indoor scene understanding, which is a fundamental and challenging problem in computer vision. We present a discriminative model in this paper to semantically label indoor scenes from RGB-D images Unlike previous work which only labels pre-determined superpixels, we characterize the scenes with a set of planes and compose them into objects. The optimal way to composition and corresponding labels are inferred simultaneously using a greedy algorithm. Our model considers unary features and pairwise and co-occurrence context, as well as latent variables that account for multi-mode distributions of each object category. We train the model with latent structural SVM learning framework. Our approach achieves state-of-the-art performance on the Cornell RGB-D indoor scene dataset [1].
关键词:
通讯作者信息: