Progressive Learned Image Compression for Machine Perception
Episode

Progressive Learned Image Compression for Machine Perception

Dec 23, 20258:38
Computer Vision and Pattern Recognitioneess.IV
No ratings yet

Abstract

Recent advances in learned image codecs have been extended from human perception toward machine perception. However, progressive image compression with fine granular scalability (FGS)-which enables decoding a single bitstream at multiple quality levels-remains unexplored for machine-oriented codecs. In this work, we propose a novel progressive learned image compression codec for machine perception, PICM-Net, based on trit-plane coding. By analyzing the difference between human- and machine-oriented rate-distortion priorities, we systematically examine the latent prioritization strategies in terms of machine-oriented codecs. To further enhance real-world adaptability, we design an adaptive decoding controller, which dynamically determines the necessary decoding level during inference time to maintain the desired confidence of downstream machine prediction. Extensive experiments demonstrate that our approach enables efficient and adaptive progressive transmission while maintaining high performance in the downstream classification task, establishing a new paradigm for machine-aware progressive image compression.

Summary

The paper introduces PICM-Net, the first progressive learned image compression codec designed specifically for machine perception. The core problem addressed is the lack of progressive image compression techniques optimized for machine vision tasks, where task performance is prioritized over human perceptual quality. PICM-Net utilizes trit-plane coding to decompose latent representations into ternary digits, enabling coarse-to-fine transmission. An adaptive decoding controller dynamically determines the optimal decoding level based on the desired confidence of downstream machine prediction. The methodology involves analyzing rate-distortion priorities from a machine-oriented perspective, examining existing prioritization strategies like variance-based and sigma-based sorting, and comparing them with machine-oriented variants (optimal-channel and optimal-patch). They train a logistic regression-based filter to predict the confidence of the downstream prediction, using classifier output statistics. The key findings show that existing prioritization methods are already near the practical limit for machine vision tasks, and the adaptive controller successfully balances compression efficiency with desired task performance. The paper demonstrates comparable task performance and transmission efficiency to state-of-the-art human-oriented progressive codecs and machine-oriented non-progressive codecs. This research matters because it opens up new avenues for adaptive image transmission in machine-centric applications constrained by bandwidth and computational resources.

Key Insights

  • PICM-Net achieves fine-grained scalability in image compression while maintaining competitive rate-accuracy performance for machine vision tasks.
  • The adaptive decoding controller leverages classifier output logits to dynamically assess prediction confidence and request additional bits only when necessary, improving real-world adaptability.
  • Surprisingly, the study found that no single prioritization strategy (variance-based, sigma-based, optimal-channel, optimal-patch) consistently outperforms the others across the entire bitrate range, suggesting that existing methods already capture much of the practical benefit for machine-oriented codecs.
  • The paper shows that a larger `lambda_MSE` in the loss function results in higher task performance but consumes more rate to increase PSNR, focusing more on human perception.
  • The adaptive decoding controller degrades BD-rate and BD-accuracy compared to PICM-Net without the controller, but enables well-calibrated predictions, achieving desired task performance.
  • The bit allocation map of PICM-Net aligns with the regions of interests (ROIs) of the downstream machine prediction.
  • PICM-Net demonstrates finer granular scalability compared to machine-oriented non-progressive codecs, enabling more flexible bit allocations.

Practical Implications

  • The PICM-Net codec can be used in applications where images are primarily consumed by machines, such as autonomous driving, surveillance systems, and remote sensing.
  • Practitioners and engineers can use the adaptive decoding controller to dynamically adjust the compression level based on the available bandwidth and the required confidence level for the downstream machine vision task.
  • The research highlights the importance of task-aware optimization in image compression for machine perception, suggesting that future codecs should be designed with specific machine vision tasks in mind.
  • Future research can explore more sophisticated adaptive decoding controllers that incorporate other factors such as computational cost and energy consumption.
  • The trit-plane coding approach can be extended to other types of data, such as video and 3D models, for efficient and adaptive transmission.

Links & Resources

Authors