Tag
Weave of Formal Thought (WoFT) introduces a sound and complete constrained decoder for code generation that guarantees syntactic validity relative to the full Tree-sitter specification, and a fine-tuning method that trains models to interleave grammar symbols using reweighted wake-sleep, improving perplexity on Python code generation.
Riazi-8B is an Urdu large language model fine-tuned for mathematical reasoning, achieving improved performance on MGSM-Urdu through continued pre-training and supervised fine-tuning on Urdu Chain-of-Thought data.
This paper presents fine-tuning of PEGASUS on the XL-Sum English corpus, achieving state-of-the-art results with significant improvements over the baseline mT5 model across ROUGE scores.
This paper introduces Lie-bracket prediction of transfer order for sequential learning, using commutators of gradient fields to determine pairwise order and scaling to many domains. Experiments show high accuracy in predicting optimal curriculum orders for fine-tuning and instruction tuning.
This paper proposes a Supervised Reinforcement Learning (SRL) framework for coordinating distributed energy resources, pre-training on demonstration data and fine-tuning with RL to improve sample efficiency and performance.
NVIDIA NeMo AutoModel leverages HuggingFace Transformers v5 to deliver 3.4-3.7x higher training throughput and 29-32% less GPU memory for fine-tuning Mixture-of-Experts models, with no code changes beyond a single import.
This paper proposes H-Res, a method to adapt large transformer models by shaping the energy landscape of associative memories without modifying weights or adding prompts, preserving memory capacity and outperforming LoRA.
This paper investigates the effectiveness of top-1 collapse rate as a stability monitor for short-horizon LoRA fine-tuning of discrete diffusion language models, finding it has zero precision, and proposes max gradient norm as a more reliable alternative with higher precision and F1 score on LLaDA-family models.
This paper introduces the Continual IVON (CoVON) optimizer, which integrates fast and slow adaptation into variational continual learning to balance stability and plasticity, outperforming existing methods in domain-incremental learning, continual pre-training, and fine-tuning of large language models.
This paper investigates whether different offline reinforcement learning losses (RFT, RIFT, DFT, Offline GRPO, DPO) for reasoning distillation produce mechanistically distinct weight updates in a small language model. Using identical math rollouts and a controlled setup with Qwen3-4B and attention-only LoRA, they find that SFT, RFT, and RIFT yield nearly colinear weight deltas, while DPO sits in a near-orthogonal subspace and achieves the highest accuracy.
This paper proposes a reinforcement learning framework for computer-use agents that uses autonomous vision-language evaluation as a scalable reward signal, modeling evaluator noise to improve task success rates across desktop environments.
This paper introduces BehaviorBench, a comprehensive benchmark for evaluating foundation models on behavioral science tasks including behavior prediction, strategic decision-making, subject-trait inference, and behavioral knowledge application. It also presents Be.FM-1.5, a fine-tuned model that achieves strong distributional alignment, highlighting the gap between general-purpose and behaviorally adapted models.
Introduces Neuro-Symbolic Drive, a framework that uses rule-grounded reasoning traces from classical planners to fine-tune a driving VLA (Qwen3.5-4B), achieving significant reductions in average displacement error and miss rate compared to standard CoT reasoning.
A developer shares surprising lessons from fine-tuning a small open model, including that base models often already max out on intended improvements, the real weakness is behavior (caving), and fine-tuning requires careful measurement and balancing.
This paper introduces OpenThoughts-Agent, an open-source data curation pipeline for training agentic language models, achieving a 44.8% average accuracy across seven benchmarks and outperforming prior open datasets through systematic experiments.
The article presents 'knowledge agents', a methodology that injects relevant knowledge into AI agents via a hybrid retrieval system, allowing smaller models to outperform large frontier models across specialized domains like financial markets, policy, and healthcare.
An analysis exploring why Gemma 4, despite advantages like QAT and vision support, lacks community finetunes compared to Mistral, and whether community inertia will eventually shift.
Harvey partnered with Applied Compute to train a legal agent, optimizing the agent stack and post-training the GLM-5.1 model using reward signals from their Legal Agent Benchmark.
This article delves into the principles of LoRA and its variants (QLoRA, VeRA, DoRA), explaining how low-rank decomposition reduces trainable parameters to enable efficient fine-tuning of large models.
A fine-tuned version of Qwen3.5-MoE called NEX-N2-mini reportedly fixes overthinking issues seen in Qwen 3.5 and 3.6 models.