收录:
摘要:
In recent years, there has been a substantial increase in the amount of visual data generated by edge devices. Machines typically process this data to accomplish tasks such as object detection without human visual judgment. However, human viewing is sometimes required during human-robot interaction. Here, there exists a significant difference in the focus of information between humans and machines. To tackle this issue, we propose an end-to-end learning-based image coding framework, aiming to strike a balance between human and machine vision tasks. Also, a portion of the latent space is used for both machine vision and human vision. This is different from a compression framework that only targets human vision. Because of this difference, correlations still exist between tasks. So we propose a partial-channel context model to improve coding performance.Our scalable coding framework achieves simultaneous support for both human and machine vision by partitioning the latent space. Machine vision tasks are handled by a subset of the latent space, referred to as the base layer. More complex human visual reconstruction tasks are accomplished by an additional subset of the latent space, comprising both base and enhancement layers. In the experimental section, we present the performance of human visual reconstruction and machine vision tasks, comparing them with other benchmarks. The experiments demonstrate that our framework achieves a 28.27%-38.16% reduction in bitrate for machine vision tasks and matches the performance of state-of-the-art image codecs in terms of input reconstruction. © 2024 IEEE.
关键词:
通讯作者信息:
电子邮件地址:
来源 :
年份: 2024
页码: 1852-1857
语种: 英文
归属院系: