vulkan

Tag

Cards List
#vulkan

I turned an Android phone into a Vulkan-accelerated local LLM node (GGUF + LiteLLM + Tailscale)

Reddit r/LocalLLaMA · 2d ago

An Android phone is repurposed as a portable GGUF inference server with Vulkan acceleration, exposing an OpenAI-compatible endpoint via LiteLLM and Tailscale mesh for integration into a self-hosted AI cluster.

0 favorites 0 likes
#vulkan

@no_stp_on_snek: got it here if ya want to try it out:

X AI KOLs Following · 2026-05-23 Cached

A fork of llama.cpp integrating TurboQuant+ for advanced KV-cache and weight quantization, with cross-backend kernel support (Apple Silicon, NVIDIA CUDA, AMD ROCm, Vulkan) and used in production by LocalAI, Chronara, and AtomicChat.

0 favorites 0 likes
#vulkan

Gemma4 26b a4b Apex quant is quite good

Reddit r/LocalLLaMA · 2026-05-23

User benchmarks the APEX quantized version of Gemma4 26B A4B model on AMD RX 9060 XT, achieving 38 tps at 90k context with no quality degradation, finding it better than previous quantizations.

0 favorites 0 likes
#vulkan

Can't believe I got it working! Dual GPU - 48gb VRAM llama-cpp server - R7900 + 7800XT

Reddit r/LocalLLaMA · 2026-05-22

A user successfully set up a dual-GPU llama-cpp server with 48GB VRAM using an AMD Radeon PRO and 7800 XT via Vulkan in Docker on Kubuntu 24.04.

0 favorites 0 likes
#vulkan

Strix Halo ROCm + MTP Notes (May 2026)

Reddit r/LocalLLaMA · 2026-05-17

Technical benchmark comparing ROCm and Vulkan backends for LLM inference on Strix Halo hardware after MTP merged into llama.cpp, revealing ROCm suffers severe performance drops at full context while Vulkan remains stable.

0 favorites 0 likes
#vulkan

Linux - Why does llama.cpp ROCm consume SO much VRAM for KV cache compared to Vulkan?

Reddit r/LocalLLaMA · 2026-05-14

A user reports that llama.cpp with ROCm consumes significantly more VRAM for the KV cache than the Vulkan backend, despite identical model and settings, prompting investigation into potential causes.

0 favorites 0 likes
#vulkan

@binsquares: omg, GPU acceleration on smolvm works way better than I thought. can run llama.cpp inside the smol machine with close t…

X AI KOLs Following · 2026-05-11 Cached

User @binsquares reports that GPU acceleration on smolvm achieves nearly 90% of host performance when running llama.cpp via the Vulkan backend.

0 favorites 0 likes
#vulkan

I was tired of "babysitting" my AI. So I spent 6 months building a C++20 Autonomous Software House that ships while I sleep

Reddit r/AI_Agents · 2026-05-09

Neon Sovereign is a native C++20/Vulkan autonomous software development workstation that uses a multi-agent swarm to execute software briefs end-to-end, running local LLM weights via Ollama/GGUF with no cloud dependency. The creator is seeking systems engineers and early testers as it enters Active Alpha.

0 favorites 0 likes
#vulkan

Nvidia RTX 3090 vs Intel Arc Pro B70 llama.cpp Benchmarks

Reddit r/LocalLLaMA · 2026-04-23

Community benchmark shows Intel Arc Pro B70 averages ~71% slower prompt processing and ~54% slower token generation than RTX 3090 under llama.cpp, with SYCL backend sometimes beating Vulkan on the same card.

0 favorites 0 likes
← Back to home

Submit Feedback