fine-tuning

Tag

Cards List
#fine-tuning

Parallel Manifold Steering: Efficient Adaptation of Large Associative Memories via Residual Energy Shaping

arXiv cs.LG · 9h ago Cached

This paper proposes H-Res, a method to adapt large transformer models by shaping the energy landscape of associative memories without modifying weights or adding prompts, preserving memory capacity and outperforming LoRA.

0 favorites 0 likes
#fine-tuning

When Top-1 Fails: Calibrating LoRA Monitors for Masked Diffusion LMs

arXiv cs.LG · 9h ago Cached

This paper investigates the effectiveness of top-1 collapse rate as a stability monitor for short-horizon LoRA fine-tuning of discrete diffusion language models, finding it has zero precision, and proposes max gradient norm as a more reliable alternative with higher precision and F1 score on LLaDA-family models.

0 favorites 0 likes
#fine-tuning

Fast and Slow Variational Continual Learning

arXiv cs.LG · 9h ago Cached

This paper introduces the Continual IVON (CoVON) optimizer, which integrates fast and slow adaptation into variational continual learning to balance stability and plasticity, outperforming existing methods in domain-incremental learning, continual pre-training, and fine-tuning of large language models.

0 favorites 0 likes
#fine-tuning

Weight-Space Geometry of Offline Reasoning Training

arXiv cs.LG · 9h ago Cached

This paper investigates whether different offline reinforcement learning losses (RFT, RIFT, DFT, Offline GRPO, DPO) for reasoning distillation produce mechanistically distinct weight updates in a small language model. Using identical math rollouts and a controlled setup with Qwen3-4B and attention-only LoRA, they find that SFT, RFT, and RIFT yield nearly colinear weight deltas, while DPO sits in a near-orthogonal subspace and achieves the highest accuracy.

0 favorites 0 likes
#fine-tuning

Reinforcement Learning for Computer-Use Agents with Autonomous Evaluation

arXiv cs.AI · 9h ago Cached

This paper proposes a reinforcement learning framework for computer-use agents that uses autonomous vision-language evaluation as a scalable reward signal, modeling evaluator noise to improve task success rates across desktop environments.

0 favorites 0 likes
#fine-tuning

BehaviorBench: Benchmarking Foundation Models for Behavioral Science Tasks

arXiv cs.CL · 9h ago Cached

This paper introduces BehaviorBench, a comprehensive benchmark for evaluating foundation models on behavioral science tasks including behavior prediction, strategic decision-making, subject-trait inference, and behavioral knowledge application. It also presents Be.FM-1.5, a fine-tuned model that achieves strong distributional alignment, highlighting the gap between general-purpose and behaviorally adapted models.

0 favorites 0 likes
#fine-tuning

Neuro-Symbolic Drive: Rule-Grounded Faithful Reasoning for Driving VLAs

arXiv cs.AI · 9h ago Cached

Introduces Neuro-Symbolic Drive, a framework that uses rule-grounded reasoning traces from classical planners to fine-tune a driving VLA (Qwen3.5-4B), achieving significant reductions in average displacement error and miss rate compared to standard CoT reasoning.

0 favorites 0 likes
#fine-tuning

@no_stp_on_snek: what actually surprised me fine-tuning a small open model. note im failry new in this area so some of this may seem obv…

X AI KOLs Timeline · 23h ago Cached

A developer shares surprising lessons from fine-tuning a small open model, including that base models often already max out on intended improvements, the real weakness is behavior (caving), and fine-tuning requires careful measurement and balancing.

0 favorites 0 likes
#fine-tuning

OpenThoughts-Agent: Data Recipes for Agentic Models

Hugging Face Daily Papers · yesterday Cached

This paper introduces OpenThoughts-Agent, an open-source data curation pipeline for training agentic language models, achieving a 44.8% average accuracy across seven benchmarks and outperforming prior open datasets through systematic experiments.

0 favorites 0 likes
#fine-tuning

Knowledge Agents: Beat Frontier Models with Better Structure (18 minute read)

TLDR AI · yesterday Cached

The article presents 'knowledge agents', a methodology that injects relevant knowledge into AI agents via a hybrid retrieval system, allowing smaller models to outperform large frontier models across specialized domains like financial markets, policy, and healthcare.

0 favorites 0 likes
#fine-tuning

Is Gemma 4 going to be the next Mistral (or Qwen3.6) one day? Concerning the lack of finetunes

Reddit r/LocalLLaMA · yesterday

An analysis exploring why Gemma 4, despite advantages like QAT and vision support, lacks community finetunes compared to Mistral, and whether community inertia will eventually shift.

0 favorites 0 likes
#fine-tuning

@gabepereyra: Harvey partnered with @appliedcompute to train a legal agent. We optimized each part of the agent stack, including the …

X AI KOLs Following · yesterday Cached

Harvey partnered with Applied Compute to train a legal agent, optimizing the agent stack and post-training the GLM-5.1 model using reward signals from their Legal Agent Benchmark.

0 favorites 0 likes
#fine-tuning

@0xSero: Highly recommended educational content. LoRA is one of the coolest things to dabble in, lets anyone fine tune models re…

X AI KOLs Timeline · yesterday Cached

This article delves into the principles of LoRA and its variants (QLoRA, VeRA, DoRA), explaining how low-rank decomposition reduces trainable parameters to enable efficient fine-tuning of large models.

0 favorites 0 likes
#fine-tuning

NEX-N2-mini: "There is no Pareto frontier. I am Pareto". This Qwen3.5-MoE fine tune fixed 3.5 and 3.6 overthinking apparently on my tests.

Reddit r/LocalLLaMA · yesterday

A fine-tuned version of Qwen3.5-MoE called NEX-N2-mini reportedly fixes overthinking issues seen in Qwen 3.5 and 3.6 models.

0 favorites 0 likes
#fine-tuning

@danielhanchen: I’m running a 3 hour advanced workshop at AI Engineer World’s Fair! 2026 has greatly changed how one should learn lower…

X AI KOLs Following · yesterday Cached

Daniel Han is hosting a 3-hour advanced workshop at the AI Engineer World's Fair, sharing insights on the history of open-source large models, classification of training stages (pre-training, intermediate training, supervised fine-tuning, post-training, reinforcement fine-tuning), and the leap in reasoning models. He also introduced his team's open-source contributions to fine-tuning optimization.

0 favorites 0 likes
#fine-tuning

@TheAhmadOsman: INCREDIBLE RESOURCE The MOST COMPLETE GUIDE for understanding LLMs from first principles is now available online to rea…

X AI KOLs Timeline · 2d ago Cached

A comprehensive free guide explaining LLMs from first principles, covering tokens, transformers, attention, fine-tuning, and local deployment.

0 favorites 0 likes
#fine-tuning

Good results fine tuning a local LLM like Qwen 3:0.6B to categorize questions

Hacker News Top · 2d ago Cached

A developer fine-tunes a small Qwen 3 0.6B model using the Unsloth framework to categorize household questions, achieving good results with only 850 training examples.

0 favorites 0 likes
#fine-tuning

@uzairansar: Qwythos-9B-Claude-Mythos-5 Fine Tune with 1M Context released! Empero just released their Claude Mythos Fine Tune based…

X AI KOLs Timeline · 2d ago Cached

Empero released Qwythos-9B-Claude-Mythos-5, a full-parameter reasoning model fine-tuned with 1M context, based on synthetic chain-of-thought data from Fable-5 and Mythos-5 session logs.

0 favorites 0 likes
#fine-tuning

@analogalok: gemma-4-12B-agentic-fable5-composer2.5 V2 is out. the agentic upgrade to the model trained on Fable 5's reasoning. Runn…

X AI KOLs Timeline · 3d ago Cached

A new fine-tuned version of Gemma 4 12B, trained on Fable 5's reasoning, achieves a significant jump in agentic coding benchmarks (from 15% to 55%) and can run locally on an 8GB VRAM GPU using a custom fork of llama.cpp.

0 favorites 0 likes
#fine-tuning

A Comparative Study of Pretrained Transformer Models for Quranic ASR: Speech Representations, Label Formats, and Dataset Composition

arXiv cs.AI · 4d ago Cached

This paper presents a systematic empirical study of fine-tuning pretrained Transformer models (Wav2Vec2.0, HuBERT, XLS-R) for Quranic Automatic Speech Recognition (ASR), achieving a WER of 0.08 on the EveryAyah subset and reducing training time from 140 to 40 hours, with Wav2Vec2-XLSR-53 providing the best representation.

0 favorites 0 likes
Next →
← Back to home

Submit Feedback