Tag
GenBench is a free iOS app that lets users download, run, and benchmark GGUF models on iPhone/iPad using llama.cpp and Metal, with features like offline chat, standardized benchmarks, and a global leaderboard.
User shares their experience using llama.cpp with the GGUF Q4_K_M quantized version of Gemma-4-12b on a Mac, achieving local inference speed of about 36 tok/s and memory usage of about 10GB.
An interactive map visualizing the subgenres of heavy metal music.
Releases a kernel on Hugging Face that accelerates MaxSim late-interaction retrieval by using tiled scoring with SIMD group matrix operations (Metal and WMMA), achieving 3–5× speedup over the naive implementation.
vllm-swift 0.6.3 and longctx 0.3.2 releases bring triattentionv3 with 256k context on Apple Silicon, Gemma 4 MTP drafter support, Hermes tool calling with auto-recovery, and a longctx-svc daemon for scaling to 12M-token corpora.
Redis creator antirez released ds4, a local inference engine optimized for DeepSeek V4 Flash with 2-bit quantization and support for 1M context KV cache on Metal and CUDA.
Redis creator antirez released an open-source project called ds4, a DeepSeek V4 Flash local inference engine optimized for Mac Metal, featuring disk KV caching, ultra-long context, and excellent performance.
antirez announces receiving an M5 Max 128GB MacBook Pro from audreyt to develop DwarfStar4 and experiment with distributed inference across M3 Max and M5 Max hardware.
antirez pushed a major refactoring of DS4 backends, adding CUDA support and single direction activation steering while preserving the Metal path. Only M3 and DGX Spark hardware are supported for now.
ds4 is a native local inference engine for DeepSeek V4 Flash optimized for Apple Silicon, featuring disk-based KV cache persistence and Metal acceleration.