low-vram

#low-vram

Developing open source LLM from ground up from pretrain - rlhf(PPO/GRPO)

Reddit r/LocalLLaMA ↗ · yesterday

A developer shares progress on training a 7B parameter open source LLM from scratch using a DeepSeek architecture optimized for low VRAM, with the goal of democratizing AI development and eventually surpassing large proprietary models.

0 favorites 0 likes

#low-vram

Qwen3.6 35B MoE on 8GB VRAM — working llama-server config + a max_tokens / thinking trap I ran into

Reddit r/LocalLLaMA ↗ · 2026-04-21

Author shares a working llama-server config to run the 35B-MoE Qwen3.6 model on an 8GB RTX 4060, highlighting a max_tokens trap caused by unconstrained internal reasoning and the fix using per-request thinking_budget_tokens.

0 favorites 0 likes

low-vram

Developing open source LLM from ground up from pretrain - rlhf(PPO/GRPO)

Qwen3.6 35B MoE on 8GB VRAM — working llama-server config + a max_tokens / thinking trap I ran into

Submit Feedback