metal

#metal

I built a iOS app to benchmark GGUF models on your iPhone/iPad

Reddit r/LocalLLaMA ↗ · 4d ago

GenBench is a free iOS app that lets users download, run, and benchmark GGUF models on iPhone/iPad using llama.cpp and Metal, with features like offline chat, standardized benchmarks, and a global leaderboard.

0 favorites 0 likes

#metal

@mylifcc: I'm already running Gemma-4-12b on my Mac. Tech stack: llama.cpp + GGUF Q4_K_M + Metal 32K context, local OpenAI-compatible API. Measured about 36 tok/s, resident RSS about…

X AI KOLs Timeline ↗ · 6d ago Cached

User shares their experience using llama.cpp with the GGUF Q4_K_M quantized version of Gemma-4-12b on a Mac, achieving local inference speed of about 36 tok/s and memory usage of about 10GB.

0 favorites 0 likes

#metal

Map of Metal

Hacker News Top ↗ · 2026-05-20

An interactive map visualizing the subgenres of heavy metal music.

0 favorites 0 likes

#metal

@ErikKaum: Releasing my first kernel on @huggingface: MaxSim Late-interaction retrieval (ColBERT / PyLate) bottlenecks on material…

X AI KOLs Following ↗ · 2026-05-18 Cached

Releases a kernel on Hugging Face that accelerates MaxSim late-interaction retrieval by using tiled scoring with SIMD group matrix operations (Metal and WMMA), achieving 3–5× speedup over the naive implementation.

0 favorites 0 likes

#metal

@no_stp_on_snek: vllm-swift 0.6.3 + longctx 0.3.2 are out. highlights: - triattentionv3 + longctx rescue path hits 256k niah on apple si…

X AI KOLs Following ↗ · 2026-05-14 Cached

vllm-swift 0.6.3 and longctx 0.3.2 releases bring triattentionv3 with 256k context on Apple Silicon, Gemma 4 MTP drafter support, Hermes tool calling with auto-recovery, and a longctx-svc daemon for scaling to 12M-token corpora.

0 favorites 0 likes

#metal

@axiaisacat: Redis creator antirez drops another hardcore project: ds4. Not just another GGUF runner, but a local inference engine specifically written for DeepSeek V4 Flash: Metal / CUDA 2-bit quantization 1M context KV ...

X AI KOLs Timeline ↗ · 2026-05-14 Cached

Redis creator antirez released ds4, a local inference engine optimized for DeepSeek V4 Flash with 2-bit quantization and support for 1M context KV cache on Metal and CUDA.

0 favorites 0 likes

#metal

@VincentLogic: Discovered an amazing open-source project! Redis creator antirez made a splash! ds4 — DeepSeek V4 Flash local inference engine, optimized for Mac Metal, topping GitHub charts for days! And here's the killer part: 128GB…

X AI KOLs Timeline ↗ · 2026-05-13

Redis creator antirez released an open-source project called ds4, a DeepSeek V4 Flash local inference engine optimized for Mac Metal, featuring disk KV caching, ultra-long context, and excellent performance.

0 favorites 0 likes

#metal

@antirez: Announcing with gratitude that @audreyt just gifted me an M5 Max 128GB MacBook Pro! It will let me develop DwarfStar4 (…

X AI KOLs Timeline ↗ · 2026-05-12

antirez announces receiving an M5 Max 128GB MacBook Pro from audreyt to develop DwarfStar4 and experiment with distributed inference across M3 Max and M5 Max hardware.

0 favorites 0 likes

#metal

@antirez: I just pushed a big refactoring of DS4 backends with CUDA support and single direction activation steering. The Metal p…

X AI KOLs Timeline ↗ · 2026-05-11

antirez pushed a major refactoring of DS4 backends, adding CUDA support and single direction activation steering while preserving the Metal path. Only M3 and DGX Spark hardware are supported for now.

0 favorites 0 likes

#metal

DeepSeek 4 Flash local inference engine for Metal

Hacker News Top ↗ · 2026-05-07 Cached

ds4 is a native local inference engine for DeepSeek V4 Flash optimized for Apple Silicon, featuring disk-based KV cache persistence and Metal acceleration.

0 favorites 0 likes

metal

Submit Feedback