Tag
A user discusses building a small autocomplete model (25M parameters) as a learning project, mentions hardware constraints (32GB VRAM), data requirements (~100M tokens), and seeks advice on datasets and data formatting for autocomplete-style training.
The article argues that AI defensibility comes from owning the full feedback loop—custom models post-trained on proprietary data, tuned to specific workflows, and evaluated by user-defined standards—rather than renting frontier APIs from suppliers who can change terms. It emphasizes model customization as key to differentiation and margin control.
Demonstrates running a custom Qwen model (Qwen3.6-35B-A3B-Claude-4.7-Opus-abliterated-MTP-GGUF) on dual Nvidia RTX PRO 6000 Blackwell GPUs at 195 tokens per second using Hugging Face Inference.
DavidAU releases a custom 40B parameter model based on Qwen 3.6, expanded and fine-tuned with Claude 4.6 Opus distill and Deckard datasets, featuring optimized GGUF quantizations for improved precision and uncensored capabilities.