Perceptual Image Codec: What Matters in Practical Learned Image Compression
Summary
PICO (Perceptual Image Codec) is a new learned codec from Apple that is optimized for the human visual system and achieves 2.3–3× bitrate savings over traditional codecs like AV1 and VVC, while running in 230ms encode / 150ms decode on an iPhone 17 Pro Max.
View Cached Full Text
Cached at: 05/24/26, 03:40 PM
Similar Articles
PivCo-Huffman
This paper presents PivCo-Huffman, a new approach to Huffman coding using pivot coding from wavelet trees, enabling high-performance SIMD-friendly encoding and decoding. It consistently outperforms state-of-the-art Huffman codecs and shows how ANS coding can be selectively applied to skewed nodes to approach ANS compression ratios while preserving high decompression speeds.
AdaCodec: A Predictive Visual Code for Video MLLMs
AdaCodec reduces video encoding redundancy in multimodal LLMs by transmitting full visual tokens only when scene prediction fails, otherwise using compact inter-frame change descriptions. It outperforms per-frame RGB baselines at matched token budgets and achieves better or comparable results with significantly fewer tokens, reducing time-to-first-token from 9.26s to 1.62s.
PiD: Fast and High-Resolution Latent Decoding with Pixel Diffusion
PiD introduces a pixel diffusion decoder that reformulates latent decoding as conditional pixel diffusion, enabling fast and high-quality image synthesis at high resolutions with reduced computational requirements. It decodes latents into 4x or 8x upscaled images in under a second on consumer hardware.
Local iPhone AI image generation is getting practical - only 3 seconds per image
Benchmark shows local Stable Diffusion 1.5 on iPhone can generate 512x512 images in as little as 3.1 seconds using optimized models like Realistic Vision V5.1 Hyper, making on-device AI image generation practical.
FRAPPE: Full Input, Residual Output Autoencoding with Projection Pursuit Encoder
FRAPPE is a novel autoencoding framework that uses a projection pursuit encoder to predict residuals from full input, enabling efficient variable-rate image compression with fast CPU-based encoding. At high compression ratios, FRAPPE-Image achieves higher perceptual quality than AVIF with 47x faster encoding, making real-time 1080p 30fps CPU-only encoding possible.