Training a vision model from scratch on iPod touch 4 images
Summary
Trained a DCGAN from scratch on 350 photos of a red solo cup taken with an iPod touch 4, producing results reminiscent of early DALL-E.
Similar Articles
Local iPhone AI image generation is getting practical - only 3 seconds per image
Benchmark shows local Stable Diffusion 1.5 on iPhone can generate 512x512 images in as little as 3.1 seconds using optimized models like Realistic Vision V5.1 Hyper, making on-device AI image generation practical.
DCGAN inference on a microcontroller: 12.6M parameters, 512KB SRAM, 26-second generation, pure C [P]
Demonstrates running a DCGAN with 12.6M int8 quantized parameters on a low-cost RISC-V microcontroller (CH32H417), generating 64x64 cat faces in 26 seconds using pure C inference and quantum entropy sampling.
Improved Techniques for Training Consistency Models
OpenAI presents improved techniques for training consistency models that enable high-quality single-step image generation without distillation, achieving significant FID improvements on CIFAR-10 and ImageNet 64×64 through novel loss functions and training strategies.
Qwen-Image-Flash (26 minute read)
This paper from Alibaba revisits few-step distillation for visual generative models, focusing on training recipe factors such as data composition, teacher guidance, and task mixture, using Qwen-Image-2.0 as a case study to develop Qwen-Image-Flash.
Thinking with images
OpenAI releases o3 and o4-mini models that can reason with images in their chain-of-thought process, enabling visual understanding through native image manipulation tools like cropping and zooming without separate specialized models. These models achieve state-of-the-art performance on multimodal benchmarks including STEM questions, chart reading, and visual search tasks.