Tag
This paper proposes H-Res, a method to adapt large transformer models by shaping the energy landscape of associative memories without modifying weights or adding prompts, preserving memory capacity and outperforming LoRA.
This paper investigates the effectiveness of top-1 collapse rate as a stability monitor for short-horizon LoRA fine-tuning of discrete diffusion language models, finding it has zero precision, and proposes max gradient norm as a more reliable alternative with higher precision and F1 score on LLaDA-family models.
This paper introduces the Continual IVON (CoVON) optimizer, which integrates fast and slow adaptation into variational continual learning to balance stability and plasticity, outperforming existing methods in domain-incremental learning, continual pre-training, and fine-tuning of large language models.
This paper investigates whether different offline reinforcement learning losses (RFT, RIFT, DFT, Offline GRPO, DPO) for reasoning distillation produce mechanistically distinct weight updates in a small language model. Using identical math rollouts and a controlled setup with Qwen3-4B and attention-only LoRA, they find that SFT, RFT, and RIFT yield nearly colinear weight deltas, while DPO sits in a near-orthogonal subspace and achieves the highest accuracy.
This paper proposes a reinforcement learning framework for computer-use agents that uses autonomous vision-language evaluation as a scalable reward signal, modeling evaluator noise to improve task success rates across desktop environments.
This paper introduces BehaviorBench, a comprehensive benchmark for evaluating foundation models on behavioral science tasks including behavior prediction, strategic decision-making, subject-trait inference, and behavioral knowledge application. It also presents Be.FM-1.5, a fine-tuned model that achieves strong distributional alignment, highlighting the gap between general-purpose and behaviorally adapted models.
Introduces Neuro-Symbolic Drive, a framework that uses rule-grounded reasoning traces from classical planners to fine-tune a driving VLA (Qwen3.5-4B), achieving significant reductions in average displacement error and miss rate compared to standard CoT reasoning.
A developer shares surprising lessons from fine-tuning a small open model, including that base models often already max out on intended improvements, the real weakness is behavior (caving), and fine-tuning requires careful measurement and balancing.
This paper introduces OpenThoughts-Agent, an open-source data curation pipeline for training agentic language models, achieving a 44.8% average accuracy across seven benchmarks and outperforming prior open datasets through systematic experiments.
The article presents 'knowledge agents', a methodology that injects relevant knowledge into AI agents via a hybrid retrieval system, allowing smaller models to outperform large frontier models across specialized domains like financial markets, policy, and healthcare.
An analysis exploring why Gemma 4, despite advantages like QAT and vision support, lacks community finetunes compared to Mistral, and whether community inertia will eventually shift.
Harvey partnered with Applied Compute to train a legal agent, optimizing the agent stack and post-training the GLM-5.1 model using reward signals from their Legal Agent Benchmark.
This article delves into the principles of LoRA and its variants (QLoRA, VeRA, DoRA), explaining how low-rank decomposition reduces trainable parameters to enable efficient fine-tuning of large models.
A fine-tuned version of Qwen3.5-MoE called NEX-N2-mini reportedly fixes overthinking issues seen in Qwen 3.5 and 3.6 models.
Daniel Han is hosting a 3-hour advanced workshop at the AI Engineer World's Fair, sharing insights on the history of open-source large models, classification of training stages (pre-training, intermediate training, supervised fine-tuning, post-training, reinforcement fine-tuning), and the leap in reasoning models. He also introduced his team's open-source contributions to fine-tuning optimization.
A comprehensive free guide explaining LLMs from first principles, covering tokens, transformers, attention, fine-tuning, and local deployment.
A developer fine-tunes a small Qwen 3 0.6B model using the Unsloth framework to categorize household questions, achieving good results with only 850 training examples.
Empero released Qwythos-9B-Claude-Mythos-5, a full-parameter reasoning model fine-tuned with 1M context, based on synthetic chain-of-thought data from Fable-5 and Mythos-5 session logs.
A new fine-tuned version of Gemma 4 12B, trained on Fable 5's reasoning, achieves a significant jump in agentic coding benchmarks (from 15% to 55%) and can run locally on an 8GB VRAM GPU using a custom fork of llama.cpp.
This paper presents a systematic empirical study of fine-tuning pretrained Transformer models (Wav2Vec2.0, HuBERT, XLS-R) for Quranic Automatic Speech Recognition (ASR), achieving a WER of 0.08 on the EveryAyah subset and reducing training time from 140 to 40 hours, with Wav2Vec2-XLSR-53 providing the best representation.