fine-tuning

#fine-tuning

Retrieve, Don't Retrain: Extending Vision Language Action Models to New Tasks at Test Time

Hugging Face Daily Papers ↗ · 2026-06-14 Cached

This paper introduces a retrieval-augmented vision-language-action policy that eliminates per-task fine-tuning by using pre-trained models with indexed demonstrations, enabling efficient cross-embodiment generalization and task adaptation at test time.

0 favorites 0 likes

#fine-tuning

Finetuned a Early 2023-Era Model on 2 Instruction Following Datasets and it Became Good

Reddit r/LocalLLaMA ↗ · 2026-06-12

A finetuned Pythia-6.9B model on two instruction-following datasets for 550 steps becomes capable in 13 languages, showing significant improvement over the base model.

0 favorites 0 likes

#fine-tuning

@FinanceYF5: Claude Fable 5 completed his 4-month fine-tuning work in 3 hours. Complete 7-stage pipeline, TUI interface, HTML dashboard, 39 specialized skills, 8700 lines of code, 235 tests. 98% completion, one-shot. 4…

X AI KOLs Timeline ↗ · 2026-06-12 Cached

Claude Fable 5 completed a project that typically takes 4 months in just 3 hours, including a complete 7-stage pipeline, TUI interface, HTML dashboard, 39 specialized skills, 8700 lines of code, and 235 tests, achieving 98% completion in one shot.

0 favorites 0 likes

#fine-tuning

AAbAAC: An Annotated Corpus for Autoimmunity Information Extraction

arXiv cs.AI ↗ · 2026-06-12 Cached

AAbAAC is a manually annotated corpus of 115 PubMed abstracts for autoimmunity information extraction, focusing on entities like autoimmune diseases and autoantibodies. The study demonstrates improved NER performance after fine-tuning on this corpus.

0 favorites 0 likes

#fine-tuning

The Hidden Power of Scaling Factor in LoRA Optimization

arXiv cs.AI ↗ · 2026-06-12 Cached

This paper reveals that the scaling factor α in LoRA optimization is more influential than the learning rate, and proposes LoRA-α, a framework that improves performance and simplifies hyperparameter search by restoring α to its principled regime.

0 favorites 0 likes

#fine-tuning

PolyAlign: Conditional Human-Distribution Alignment

arXiv cs.CL ↗ · 2026-06-12 Cached

PolyAlign is a distribution-aware alignment framework that aligns language models to context-specific human response distributions rather than a single global style, improving naturalness and faithfulness across bilingual settings.

0 favorites 0 likes

#fine-tuning

Direct Preference Optimization for Chatbot Fine-Tuning: An Empirical Study

arXiv cs.CL ↗ · 2026-06-12 Cached

This paper presents an empirical study of Direct Preference Optimization (DPO) for fine-tuning a large language model, showing that DPO simplifies the training pipeline and achieves competitive performance while addressing training instability.

0 favorites 0 likes

#fine-tuning

Small LLMs for Biomedical Claim Verification: Cost-Effective Fine-Tuning, Structural Dataset Shortcuts, and Cross-Domain Generalization

arXiv cs.CL ↗ · 2026-06-12 Cached

Fine-tuning small LLMs (3B-7B) with QLoRA on biomedical claim verification achieves higher F1 than GPT-4o and GPT-5 at 44.5x lower cost, and reveals a structural artifact in SciFact. The study demonstrates robust cross-domain transfer when training on structurally sound data.

0 favorites 0 likes

#fine-tuning

MentalMARBERT: Domain-Adaptive Pre-training and Two-Stage Fine-Tuning for Arabic Mental Health Disorders Detection

arXiv cs.CL ↗ · 2026-06-12 Cached

This paper presents MentalMARBERT, a domain-adapted Arabic language model for detecting mental health disorders from social media text. The framework uses domain-adaptive pre-training and a two-stage fine-tuning approach, achieving 0.877 accuracy and 0.861 macro-F1 on a newly constructed Arabic mental health dataset of 50,670 tweets.

0 favorites 0 likes

#fine-tuning

FastContext: Training Efficient Repository Explorer for Coding Agents

Papers with Code Trending ↗ · 2026-06-12 Cached

FastContext introduces specialized exploration models that separate repository exploration from code solving in LLM agents, reducing token consumption by up to 60% while improving resolution rates on software engineering benchmarks.

0 favorites 0 likes

#fine-tuning

ClinHallu: A Benchmark for Diagnosing Stage-Wise Hallucinations in Medical MLLM Reasoning

Hugging Face Daily Papers ↗ · 2026-06-12 Cached

ClinHallu is a benchmark for diagnosing and mitigating hallucinations in medical multimodal large language models by decomposing reasoning into visual recognition, knowledge recall, and reasoning integration stages, using trace-supervised fine-tuning to reduce errors.

0 favorites 0 likes

#fine-tuning

Hy-Embodied-0.5-VLA: From Vision-Language-Action Models to a Real-World Robot Learning Stack

Hugging Face Daily Papers ↗ · 2026-06-12 Cached

HyVLA-0.5 is an end-to-end robotic learning system that integrates data collection, model design, pre-training, fine-tuning, and reinforcement learning for real-world deployment.

0 favorites 0 likes

#fine-tuning

Predictive Data Debugging: Reveal and Shape What Your Model Learns, Before You Train (11 minute read)

TLDR AI ↗ · 2026-06-12 Cached

This research introduces a method using interpretability to predict which behaviors DPO will amplify or suppress from a preference dataset before training, enabling data debugging to prevent undesired effects. The technique achieves R²=0.9 prediction accuracy and is integrated into Goodfire's Silico platform.

0 favorites 0 likes

#fine-tuning

Gemma 4 Quadruple Release, 12B, 12B QAT, 26B-A4B QAT and 31B QAT Uncensored Heretics!

Reddit r/LocalLLaMA ↗ · 2026-06-11 Cached

llmfan46 released a quadruple set of uncensored, fine-tuned and quantized Gemma-4 models on Hugging Face, including 12B, 26B-A4B, and 31B variants with QAT and GGUF formats.

0 favorites 0 likes

#fine-tuning

New models released: Nex-N2 Pro 397B and Nex-N2 Mini 35B

Reddit r/LocalLLaMA ↗ · 2026-06-11

Release of fine-tuned versions of Qwen3.5: the Nex-N2 Pro 397B and Nex-N2 Mini 35B, with strong benchmark results.

0 favorites 0 likes

#fine-tuning

@_rohit_tiwari_: This 230-page book unlocks the secrets of LLMs. https://drive.google.com/file/d/1ZqV0wByb65_wvzWUbaLw6pCbtXgyXDHG/view……

X AI KOLs Timeline ↗ · 2026-06-11 Cached

A 230-page book that comprehensively covers LLM concepts including pre-training, fine-tuning, alignment, and prompting techniques.

0 favorites 0 likes

#fine-tuning

Architecture-Aware Reinforcement Learning Makes Sliding-Window Attention Competitive in Math Reasoning

arXiv cs.AI ↗ · 2026-06-11 Cached

This paper introduces SWARR, a two-stage recipe using supervised fine-tuning and reinforcement learning to adapt sliding-window attention models for mathematical reasoning, showing that RL can narrow the performance gap with self-attention while maintaining efficiency.

0 favorites 0 likes

#fine-tuning

ISE: An Execution-Grounded Recipe for Multi-Turn OS-Agent Trajectories

arXiv cs.CL ↗ · 2026-06-11 Cached

This paper introduces ISE, a three-stage synthesis paradigm for generating multi-turn OS-agent trajectories with grounded execution, demonstrating that fine-tuning on the resulting ISE-Trace dataset significantly improves agent performance on ClawEval.

0 favorites 0 likes

#fine-tuning

Compatibility-Aware Dynamic Fine-Tuning for Large Language Models

arXiv cs.CL ↗ · 2026-06-11 Cached

Introduces Compatibility-Aware Dynamic Fine-Tuning (CADFT), an extension of Dynamic Fine-Tuning that controls sample-level optimization variance in LLM supervised fine-tuning, improving stability and generalization.

0 favorites 0 likes

#fine-tuning

A Benchmark and Framework for Evaluating Next Action Predictions in Spreadsheets

Hugging Face Daily Papers ↗ · 2026-06-11 Cached

This paper introduces a benchmark for predicting spreadsheet user actions, addressing challenges in edit history availability and complex action spaces through manual curation and online evaluation methodology.

0 favorites 0 likes

fine-tuning

Submit Feedback