收录:
摘要:
As a research area of computer vision and deep learning, scene understanding has attracted a lot of attention in recent years. One major challenge encountered is obtaining high levels of segmentation accuracy while dealing with the computational cost and time associated with training or inference. Most current algorithms compromise one metric for the other depending on the intended devices. To address this problem, this paper proposes a novel deep neural network architecture called Segmentation Efficient Blocks Network (SEB-Net) that seeks to achieve the best possible balance between accuracy and computational costs as well as real-time inference speed. The model is composed of both an encoder path and a decoder path in a symmetric structure. The encoder path consists of 16 convolution layers identical to a VGG-19 model, and the decoder path includes what we call E-blocks (Efficient Blocks) inspired by the widely popular ENet architecture's bottleneck module with slight modifications. One advantage of this model is that the max-unpooling in the decoder path is employed for expansion and projection convolutions in the E-Blocks, allowing for less learnable parameters and efficient computation (10.1 frames per second (fps) for a 480x320 input, 11x fewer parameters than DeconvNet, 52.4 GFLOPs for a 640x360 input on a TESLA K40 GPU device). Experimental results on two outdoor scene datasets; Cambridge-driving Labeled Video Database (CamVid) and Cityscapes, indicate that SEB-Net can achieve higher performance compared to Fully Convolutional Networks (FCN), SegNet, DeepLabV, and Dilation8 in most cases. What's more, SEB-Net outperforms efficient architectures like ENet and LinkNet by 16.1 and 11.6 respectively in terms of Instance-level intersection over Union (iLoU). SEB-Net also shows better performance when further evaluated on the SUNRGB-D, an indoor scene dataset © 2020 ACM.
关键词:
通讯作者信息:
电子邮件地址: