Tag
A developer shares surprising lessons from fine-tuning a small open model, including that base models often already max out on intended improvements, the real weakness is behavior (caving), and fine-tuning requires careful measurement and balancing.
The article presents 'knowledge agents', a methodology that injects relevant knowledge into AI agents via a hybrid retrieval system, allowing smaller models to outperform large frontier models across specialized domains like financial markets, policy, and healthcare.
An analysis exploring why Gemma 4, despite advantages like QAT and vision support, lacks community finetunes compared to Mistral, and whether community inertia will eventually shift.
Harvey partnered with Applied Compute to train a legal agent, optimizing the agent stack and post-training the GLM-5.1 model using reward signals from their Legal Agent Benchmark.
This article delves into the principles of LoRA and its variants (QLoRA, VeRA, DoRA), explaining how low-rank decomposition reduces trainable parameters to enable efficient fine-tuning of large models.
A fine-tuned version of Qwen3.5-MoE called NEX-N2-mini reportedly fixes overthinking issues seen in Qwen 3.5 and 3.6 models.
A comprehensive free guide explaining LLMs from first principles, covering tokens, transformers, attention, fine-tuning, and local deployment.
A developer fine-tunes a small Qwen 3 0.6B model using the Unsloth framework to categorize household questions, achieving good results with only 850 training examples.
Empero released Qwythos-9B-Claude-Mythos-5, a full-parameter reasoning model fine-tuned with 1M context, based on synthetic chain-of-thought data from Fable-5 and Mythos-5 session logs.
A new fine-tuned version of Gemma 4 12B, trained on Fable 5's reasoning, achieves a significant jump in agentic coding benchmarks (from 15% to 55%) and can run locally on an 8GB VRAM GPU using a custom fork of llama.cpp.
This paper presents a systematic empirical study of fine-tuning pretrained Transformer models (Wav2Vec2.0, HuBERT, XLS-R) for Quranic Automatic Speech Recognition (ASR), achieving a WER of 0.08 on the EveryAyah subset and reducing training time from 140 to 40 hours, with Wav2Vec2-XLSR-53 providing the best representation.
SupraLabs released supra-title-FFT-preview, a full fine-tuned 0.4B parameter model for chat title generation, trained on 115K samples — nearly 10x larger than their previous dataset.
Unsloth enables free fine-tuning of a 31B parameter multimodal model on Kaggle using 4-bit quantization, requiring only 22-24GB VRAM for local runs.
This paper demonstrates that training models on chain-of-thought demonstrations fails for tasks requiring backtracking search, showing that search procedures cannot be faithfully imitated. The authors find that even when models perform well on sub-components, they cannot carry forward a left-to-right derivation for cryptarithmetic tasks.
This paper introduces Tiered Language Models (TLMs), which allow a single set of open-weight model parameters to support multiple capability levels controlled by secret keys. The method enables selective exposure of private capabilities while preserving public model behavior and resisting extraction.
OpenAI reports that their model shows increased resistance to harmful behavior through adversarial prompting and fine-tuning, indicating improved alignment persistence under pressure.
The thread shares systematic experimental findings on fine-tuning best practices, varying one SFT lever at a time across dense and MoE models up to 235B on four real-world customer datasets with custom evals to eliminate confounders.
Mia-AiLab released Gemmable 4 12B, a fine-tuned version of Google's Gemma 4 12B model using Fable-5 style reasoning and assistant traces, available in GGUF and MLX formats for local inference.
A community member proposes creating a crowdsourced coding dataset for local LLMs to enable collaborative model training and fine-tuning, addressing concerns about future availability of open-weight models.
PragReST is a self-supervised framework that improves LLM pragmatic reasoning by generating counterfactual reasoning traces and training models via supervised fine-tuning and reinforcement learning, achieving significant gains on pragmatic benchmarks without human-labeled data.