real-time-inference

Tag

Cards List
#real-time-inference

Flash-WAM: Modality-Aware Distillation for World Action Models

Hugging Face Daily Papers · 4d ago Cached

Flash-WAM introduces a modality-aware distillation method for world-action models, achieving real-time inference by compressing diffusion to a single step per modality, resulting in 23x speedup.

0 favorites 0 likes
#real-time-inference

@HotAisle: This is awesome. I wonder who's MI300x they used... ;-)

X AI KOLs Following · 2026-05-29 Cached

Kog announces real-time LLM inference achieving 3000+ output tokens per second per request on standard datacenter GPUs, bringing high-speed inference previously limited to custom silicon to production hardware.

0 favorites 0 likes
#real-time-inference

Systematic Optimization of Real-Time Diffusion Model Inference on Apple M3 Ultra

arXiv cs.LG · 2026-05-19 Cached

This paper presents a systematic optimization study of real-time diffusion model inference on the Apple M3 Ultra, achieving 22.7 FPS at 512x512 resolution using CoreML conversion and a distillation model, revealing that CUDA-optimized techniques do not directly transfer to Apple's unified memory architecture.

0 favorites 0 likes
← Back to home

Submit Feedback