fine-tuning

Tag

Cards List
#fine-tuning

Weave of Formal Thought

arXiv cs.CL · 8h ago Cached

Weave of Formal Thought (WoFT) introduces a sound and complete constrained decoder for code generation that guarantees syntactic validity relative to the full Tree-sitter specification, and a fine-tuning method that trains models to interleave grammar symbols using reweighted wake-sleep, improving perplexity on Python code generation.

0 favorites 0 likes
#fine-tuning

Riazi-8B: An Urdu Large Language Model for Mathematical Reasoning

arXiv cs.CL · 8h ago Cached

Riazi-8B is an Urdu large language model fine-tuned for mathematical reasoning, achieving improved performance on MGSM-Urdu through continued pre-training and supervised fine-tuning on Urdu Chain-of-Thought data.

0 favorites 0 likes
#fine-tuning

Optimizing Abstractive Summarization With Fine-Tuned PEGASUS

arXiv cs.CL · 8h ago Cached

This paper presents fine-tuning of PEGASUS on the XL-Sum English corpus, achieving state-of-the-art results with significant improvements over the baseline mT5 model across ROUGE scores.

0 favorites 0 likes
#fine-tuning

The Geometry of Sequential Learning: Lie-Bracket Prediction of Transfer Order

arXiv cs.LG · 8h ago Cached

This paper introduces Lie-bracket prediction of transfer order for sequential learning, using commutators of gradient fields to determine pairwise order and scaling to many domains. Experiments show high accuracy in predicting optimal curriculum orders for fine-tuning and instruction tuning.

0 favorites 0 likes
#fine-tuning

Supervised Reinforcement Learning for the Coordination of Distributed Energy Resources

arXiv cs.LG · 8h ago Cached

This paper proposes a Supervised Reinforcement Learning (SRL) framework for coordinating distributed energy resources, pre-training on demonstration data and fine-tuning with RL to improve sample efficiency and performance.

0 favorites 0 likes
#fine-tuning

Accelerating Transformers Fine-Tuning with NVIDIA NeMo AutoModel

Hugging Face Blog · 20h ago Cached

NVIDIA NeMo AutoModel leverages HuggingFace Transformers v5 to deliver 3.4-3.7x higher training throughput and 29-32% less GPU memory for fine-tuning Mixture-of-Experts models, with no code changes beyond a single import.

0 favorites 0 likes
#fine-tuning

Parallel Manifold Steering: Efficient Adaptation of Large Associative Memories via Residual Energy Shaping

arXiv cs.LG · yesterday Cached

This paper proposes H-Res, a method to adapt large transformer models by shaping the energy landscape of associative memories without modifying weights or adding prompts, preserving memory capacity and outperforming LoRA.

0 favorites 0 likes
#fine-tuning

When Top-1 Fails: Calibrating LoRA Monitors for Masked Diffusion LMs

arXiv cs.LG · yesterday Cached

This paper investigates the effectiveness of top-1 collapse rate as a stability monitor for short-horizon LoRA fine-tuning of discrete diffusion language models, finding it has zero precision, and proposes max gradient norm as a more reliable alternative with higher precision and F1 score on LLaDA-family models.

0 favorites 0 likes
#fine-tuning

Fast and Slow Variational Continual Learning

arXiv cs.LG · yesterday Cached

This paper introduces the Continual IVON (CoVON) optimizer, which integrates fast and slow adaptation into variational continual learning to balance stability and plasticity, outperforming existing methods in domain-incremental learning, continual pre-training, and fine-tuning of large language models.

0 favorites 0 likes
#fine-tuning

Weight-Space Geometry of Offline Reasoning Training

arXiv cs.LG · yesterday Cached

This paper investigates whether different offline reinforcement learning losses (RFT, RIFT, DFT, Offline GRPO, DPO) for reasoning distillation produce mechanistically distinct weight updates in a small language model. Using identical math rollouts and a controlled setup with Qwen3-4B and attention-only LoRA, they find that SFT, RFT, and RIFT yield nearly colinear weight deltas, while DPO sits in a near-orthogonal subspace and achieves the highest accuracy.

0 favorites 0 likes
#fine-tuning

Reinforcement Learning for Computer-Use Agents with Autonomous Evaluation

arXiv cs.AI · yesterday Cached

This paper proposes a reinforcement learning framework for computer-use agents that uses autonomous vision-language evaluation as a scalable reward signal, modeling evaluator noise to improve task success rates across desktop environments.

0 favorites 0 likes
#fine-tuning

BehaviorBench: Benchmarking Foundation Models for Behavioral Science Tasks

arXiv cs.CL · yesterday Cached

This paper introduces BehaviorBench, a comprehensive benchmark for evaluating foundation models on behavioral science tasks including behavior prediction, strategic decision-making, subject-trait inference, and behavioral knowledge application. It also presents Be.FM-1.5, a fine-tuned model that achieves strong distributional alignment, highlighting the gap between general-purpose and behaviorally adapted models.

0 favorites 0 likes
#fine-tuning

Neuro-Symbolic Drive: Rule-Grounded Faithful Reasoning for Driving VLAs

arXiv cs.AI · yesterday Cached

Introduces Neuro-Symbolic Drive, a framework that uses rule-grounded reasoning traces from classical planners to fine-tune a driving VLA (Qwen3.5-4B), achieving significant reductions in average displacement error and miss rate compared to standard CoT reasoning.

0 favorites 0 likes
#fine-tuning

@no_stp_on_snek: what actually surprised me fine-tuning a small open model. note im failry new in this area so some of this may seem obv…

X AI KOLs Timeline · yesterday Cached

A developer shares surprising lessons from fine-tuning a small open model, including that base models often already max out on intended improvements, the real weakness is behavior (caving), and fine-tuning requires careful measurement and balancing.

0 favorites 0 likes
#fine-tuning

OpenThoughts-Agent: Data Recipes for Agentic Models

Hugging Face Daily Papers · 2d ago Cached

This paper introduces OpenThoughts-Agent, an open-source data curation pipeline for training agentic language models, achieving a 44.8% average accuracy across seven benchmarks and outperforming prior open datasets through systematic experiments.

0 favorites 0 likes
#fine-tuning

Knowledge Agents: Beat Frontier Models with Better Structure (18 minute read)

TLDR AI · 2d ago Cached

The article presents 'knowledge agents', a methodology that injects relevant knowledge into AI agents via a hybrid retrieval system, allowing smaller models to outperform large frontier models across specialized domains like financial markets, policy, and healthcare.

0 favorites 0 likes
#fine-tuning

Is Gemma 4 going to be the next Mistral (or Qwen3.6) one day? Concerning the lack of finetunes

Reddit r/LocalLLaMA · 2d ago

An analysis exploring why Gemma 4, despite advantages like QAT and vision support, lacks community finetunes compared to Mistral, and whether community inertia will eventually shift.

0 favorites 0 likes
#fine-tuning

@gabepereyra: Harvey partnered with @appliedcompute to train a legal agent. We optimized each part of the agent stack, including the …

X AI KOLs Following · 2d ago Cached

Harvey partnered with Applied Compute to train a legal agent, optimizing the agent stack and post-training the GLM-5.1 model using reward signals from their Legal Agent Benchmark.

0 favorites 0 likes
#fine-tuning

@0xSero: Highly recommended educational content. LoRA is one of the coolest things to dabble in, lets anyone fine tune models re…

X AI KOLs Timeline · 2d ago Cached

This article delves into the principles of LoRA and its variants (QLoRA, VeRA, DoRA), explaining how low-rank decomposition reduces trainable parameters to enable efficient fine-tuning of large models.

0 favorites 0 likes
#fine-tuning

NEX-N2-mini: "There is no Pareto frontier. I am Pareto". This Qwen3.5-MoE fine tune fixed 3.5 and 3.6 overthinking apparently on my tests.

Reddit r/LocalLLaMA · 2d ago

A fine-tuned version of Qwen3.5-MoE called NEX-N2-mini reportedly fixes overthinking issues seen in Qwen 3.5 and 3.6 models.

0 favorites 0 likes
Next →
← Back to home

Submit Feedback