Scalable Image Coding for Human and Machines: Based on partial channel context model - Details

Author：

Shi, Yunhui (Shi, Yunhui.) | Ren, Jiawei (Ren, Jiawei.) | Wang, Lilong (Wang, Lilong.) | Wang, Jin (Wang, Jin.) | Liu, Jiale (Liu, Jiale.)

Indexed by：

EI Scopus

Abstract：

In　recent　years,　there　has　been　a　substantial　increase　in　the　amount　of　visual　data　generated　by　edge　devices.　Machines　typically　process　this　data　to　accomplish　tasks　such　as　object　detection　without　human　visual　judgment.　However,　human　viewing　is　sometimes　required　during　human-robot　interaction.　Here,　there　exists　a　significant　difference　in　the　focus　of　information　between　humans　and　machines.　To　tackle　this　issue,　we　propose　an　end-to-end　learning-based　image　coding　framework,　aiming　to　strike　a　balance　between　human　and　machine　vision　tasks.　Also,　a　portion　of　the　latent　space　is　used　for　both　machine　vision　and　human　vision.　This　is　different　from　a　compression　framework　that　only　targets　human　vision.　Because　of　this　difference,　correlations　still　exist　between　tasks.　So　we　propose　a　partial-channel　context　model　to　improve　coding　performance.Our　scalable　coding　framework　achieves　simultaneous　support　for　both　human　and　machine　vision　by　partitioning　the　latent　space.　Machine　vision　tasks　are　handled　by　a　subset　of　the　latent　space,　referred　to　as　the　base　layer.　More　complex　human　visual　reconstruction　tasks　are　accomplished　by　an　additional　subset　of　the　latent　space,　comprising　both　base　and　enhancement　layers.　In　the　experimental　section,　we　present　the　performance　of　human　visual　reconstruction　and　machine　vision　tasks,　comparing　them　with　other　benchmarks.　The　experiments　demonstrate　that　our　framework　achieves　a　28.27%-38.16%　reduction　in　bitrate　for　machine　vision　tasks　and　matches　the　performance　of　state-of-the-art　image　codecs　in　terms　of　input　reconstruction.　©　2024　IEEE.

Keyword：

Benchmarking Human robot interaction Image coding Deep neural networks Video signal processing Computer vision Object detection Image compression

Author Community：

[ 1 ] [Shi, Yunhui]Faculty Of Information Technology, Beijing University Of Technology, Beijing, China
[ 2 ] [Ren, Jiawei]Faculty Of Information Technology, Beijing University Of Technology, Beijing, China
[ 3 ] [Wang, Lilong]Faculty Of Information Technology, Beijing University Of Technology, Beijing, China
[ 4 ] [Wang, Jin]Faculty Of Information Technology, Beijing University Of Technology, Beijing, China
[ 5 ] [Liu, Jiale]Faculty Of Information Technology, Beijing University Of Technology, Beijing, China

Reprint Author's Address：

Email：

Show more details

Related Keywords：

Improved LCU level rate control for 3D-HEVC
2016，2016 IEEE Visual Communication and Image Processing, VCIP 2016
VisDrone-VDT2018: The vision meets drone video detection and tracking challenge results
2019，15th European Conference on Computer Vision, ECCV 2018
T-C3D: Temporal convolutional 3D network for real-time action recognition
2018，32nd AAAI Conference on Artificial Intelligence, AAAI 2018
Spatial-Temporal Dual Graph Neural Network for Pedestrian Trajectory Prediction
2024，39th Youth Academic Annual Conference of Chinese Association of Automation, YAC 2024

Source ：

Year： 2024

Page： 1852-1857

Language： English

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count：

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 2

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search Engineering Village

Type
Departments

All Years Choose Year From to