Llama.cpp server running ~2 weeks straight. Loses its mind?

Reddit r/LocalLLaMA News

Summary

User reports that Qwen3.6 models running on llama.cpp server become significantly less capable after ~2 weeks of continuous operation, and restarting sessions does not resolve the issue.

I’ve got Qwen3.6 27b and Qwen3.6 35b running in two separate instances for over two weeks and they are considerably dumber now than when I launched them. is this a thing? am I going crazy? edit: sorry I’ve been using opencode and have started new sessions, which didn’t fix the situation.
Original Article

Similar Articles

qwen3.6 just stops

Reddit r/LocalLLaMA

A user reports an issue where the Qwen 3.6 model stops mid-task when served via vLLM with specific Docker and speculative decoding configurations.

How do i prevent llama.cpp from offloading on Swap?

Reddit r/LocalLLaMA

User seeks advice on preventing llama.cpp from offloading KV cache to swap before RAM is fully exhausted, sharing their configuration on an M2 Max with 96GB RAM and a large Qwen model.