Tag
nbd-vram is a Linux tool that uses NVIDIA GPU VRAM as swap space via the NBD protocol and CUDA, providing extra memory for systems with soldered RAM and no upgrade path.
A user shares a tip to use Ollama's local llama3.1:8b model for compressing conversation context in agent workflows, reducing latency and token usage compared to sending context to providers.
Build 9254 of llama.cpp fixes a token generation regression and adds Programmatic Dependent Launch (PDL) support for NVIDIA GPUs, yielding up to 10% speedup in token generation on newer hardware.