Tag
vllm-swift 0.6.3 and longctx 0.3.2 releases bring triattentionv3 with 256k context on Apple Silicon, Gemma 4 MTP drafter support, Hermes tool calling with auto-recovery, and a longctx-svc daemon for scaling to 12M-token corpora.
Redis creator antirez released an open-source project called ds4, a DeepSeek V4 Flash local inference engine optimized for Mac Metal, featuring disk KV caching, ultra-long context, and excellent performance.
antirez announces receiving an M5 Max 128GB MacBook Pro from audreyt to develop DwarfStar4 and experiment with distributed inference across M3 Max and M5 Max hardware.
antirez pushed a major refactoring of DS4 backends, adding CUDA support and single direction activation steering while preserving the Metal path. Only M3 and DGX Spark hardware are supported for now.
ds4 is a native local inference engine for DeepSeek V4 Flash optimized for Apple Silicon, featuring disk-based KV cache persistence and Metal acceleration.