Perceptual Image Codec: What Matters in Practical Learned Image Compression

Hacker News Top Papers

Summary

PICO (Perceptual Image Codec) is a new learned codec from Apple that is optimized for the human visual system and achieves 2.3–3× bitrate savings over traditional codecs like AV1 and VVC, while running in 230ms encode / 150ms decode on an iPhone 17 Pro Max.

No content available
Original Article
View Cached Full Text

Cached at: 05/24/26, 03:40 PM

# What Matters in Practical Learned Image Compression Source: [https://apple.github.io/ml-pico/](https://apple.github.io/ml-pico/) ## [About](https://apple.github.io/ml-pico/index.html#about) We introduce PICO \(Perceptual Image Codec\) — the first learned codec that is both practical, and optimized directly for the human visual system\. To derive it, we perform a comprehensive study of modeling choices for practical learned codecs, and search over millions of model configurations to jointly optimize over perceptual quality and on\-device runtime\. Based on large\-scale subjective user studies, PICO provides**2\.3\-3× bitrate savings**against AV1, AV2, VVC, ECM and JPEG\-AI, and**20\-40% bitrate savings**against the best learned codec alternatives\. At the same time, on an iPhone 17 Pro Max, it encodes 12MP images as fast as**230ms**, and decodes them in**150ms**— faster than most top ML\-based codecs run on a V100 GPU\. Different from most learned codecs, PICO furthermore comes with cross\-platform robustness guarantees\. Interactive comparison across different images\. PICO \(Ours\) is fixed on the left\. Select an image and comparison method from the overlay buttons, then drag the slider to compare\.Best viewed on a large screen\. Comparisons of state\-of\-the\-art traditional and learned codecs across different considerations of practicality\. ![Performance comparison of PICO against traditional and learned codecs](https://apple.github.io/ml-pico/assets/spotlight_figure.png) Comparisons of state\-of\-the\-art traditional and learned codecs\. Perceptual BD\-rates are based on human ratings from a large\-scale subjective study\. Speed benchmarks on iPhone 17 Pro Max use identical compiler optimizations\. ## [Citation](https://apple.github.io/ml-pico/index.html#citation) If you find our work useful, please cite: ``` @article{tatwawadi2026pico, title={What Matters in Practical Learned Image Compression}, author={Tatwawadi, Kedar and Rahimzadeh, Parisa and Sun, Zhanghao and Chen, Zhiqi and Yang, Ziyun and Nair, Sanjay and Hasteer, Divija and Rippel, Oren}, journal={arXiv preprint arXiv:2605.05148}, year={2026} } ```

Similar Articles

PivCo-Huffman

Lobsters Hottest

This paper presents PivCo-Huffman, a new approach to Huffman coding using pivot coding from wavelet trees, enabling high-performance SIMD-friendly encoding and decoding. It consistently outperforms state-of-the-art Huffman codecs and shows how ANS coding can be selectively applied to skewed nodes to approach ANS compression ratios while preserving high decompression speeds.

AdaCodec: A Predictive Visual Code for Video MLLMs

Hugging Face Daily Papers

AdaCodec reduces video encoding redundancy in multimodal LLMs by transmitting full visual tokens only when scene prediction fails, otherwise using compact inter-frame change descriptions. It outperforms per-frame RGB baselines at matched token budgets and achieves better or comparable results with significantly fewer tokens, reducing time-to-first-token from 9.26s to 1.62s.

PiD: Fast and High-Resolution Latent Decoding with Pixel Diffusion

Hugging Face Daily Papers

PiD introduces a pixel diffusion decoder that reformulates latent decoding as conditional pixel diffusion, enabling fast and high-quality image synthesis at high resolutions with reduced computational requirements. It decodes latents into 4x or 8x upscaled images in under a second on consumer hardware.

FRAPPE: Full Input, Residual Output Autoencoding with Projection Pursuit Encoder

Hugging Face Daily Papers

FRAPPE is a novel autoencoding framework that uses a projection pursuit encoder to predict residuals from full input, enabling efficient variable-rate image compression with fast CPU-based encoding. At high compression ratios, FRAPPE-Image achieves higher perceptual quality than AVIF with 47x faster encoding, making real-time 1080p 30fps CPU-only encoding possible.