Tag
Mia-AiLab releases Qwable-3.6-27b, a full fine-tuned checkpoint of Qwen3.6-27B on a cleaned reasoning and instruction dataset, optimized for coding, technical assistance, and structured responses.
This post demonstrates how to fine-tune a model for free using a single prompt, leveraging the new Google Colab CLI along with Hugging Face's TRL and trackio tools, all orchestrated by an AI agent.
This paper investigates how fine-tuning vision-language models to produce dense coordinate lists creates a controllable interference surface, finding that duplicate pressure can be removed without sacrificing localization accuracy.
This paper proposes sparsity-induced adaptations to LoRA, including Cheap LoRA (cLA) and a chained circulant variant (c³LA), and provides theoretical generalization bounds along with empirical evaluations showing up to 10% training time reduction and 15% peak GPU memory savings while maintaining competitive performance.
BayLing-Duplex is a native full-duplex speech language model that enables a single autoregressive LLM to manage turn-taking and interruptions without external VAD modules, achieving high success rates and improved response quality over prior models.
This paper presents a synthetic data generation method for fine-tuning small LLMs to convert natural language to Cypher queries for property graphs, achieving competitive performance with large proprietary models while enabling local deployment and data sovereignty.
ProCUA-SFT is a large-scale synthetic dataset of 3.1M step-level SFT samples for training computer-use agents, produced via an automated pipeline using a single VLM (Kimi-K2.5). Fine-tuning UI-TARS 7B on it achieves 45.0% on OSWorld, an 18.7 point improvement over the base model.
This paper proposes Hierarchical Advantage-Weighted Behavior Cloning (HABC) for fine-tuning Vision-Language-Action (VLA) policies using online reinforcement learning with sparse binary episode outcomes. HABC separates viability and efficiency objectives via adaptive critic heads and intervention-aware credit assignment, significantly improving success rates on contact-rich bimanual manipulation tasks.
MLX-LoRA-Studio is a native macOS app for fine-tuning LLMs on Apple Silicon, offering a user-friendly interface and support for various training algorithms including SFT, DPO, and QAT. It is fully open-source and allows local, private fine-tuning without cloud dependency.
A municipal employee in Brazil claims to have discovered a method that makes LLM fine-tuning 1000x faster, though analysis suggests the resulting model, Rio 3.5, is essentially a mixture of existing open-source models Nex N2 Pro and Qwen 3.5.
Release of Qwopus3.6-27B-v2-MTP, a fine-tuned multi-token prediction reasoning model based on Qwen3.6-27B, optimized for coding, DevOps, and math tasks with improved generation speed.
A tweet from @TheAhmadOsman emphasizes that local AI is the future and recommends learning skills like running open-source models, conducting evals, and customizing models through fine-tuning.
This paper introduces a retrieval-augmented vision-language-action policy that eliminates per-task fine-tuning by using pre-trained models with indexed demonstrations, enabling efficient cross-embodiment generalization and task adaptation at test time.
A finetuned Pythia-6.9B model on two instruction-following datasets for 550 steps becomes capable in 13 languages, showing significant improvement over the base model.
Claude Fable 5 completed a project that typically takes 4 months in just 3 hours, including a complete 7-stage pipeline, TUI interface, HTML dashboard, 39 specialized skills, 8700 lines of code, and 235 tests, achieving 98% completion in one shot.
AAbAAC is a manually annotated corpus of 115 PubMed abstracts for autoimmunity information extraction, focusing on entities like autoimmune diseases and autoantibodies. The study demonstrates improved NER performance after fine-tuning on this corpus.
This paper reveals that the scaling factor α in LoRA optimization is more influential than the learning rate, and proposes LoRA-α, a framework that improves performance and simplifies hyperparameter search by restoring α to its principled regime.
PolyAlign is a distribution-aware alignment framework that aligns language models to context-specific human response distributions rather than a single global style, improving naturalness and faithfulness across bilingual settings.
This paper presents an empirical study of Direct Preference Optimization (DPO) for fine-tuning a large language model, showing that DPO simplifies the training pipeline and achieves competitive performance while addressing training instability.
Fine-tuning small LLMs (3B-7B) with QLoRA on biomedical claim verification achieves higher F1 than GPT-4o and GPT-5 at 44.5x lower cost, and reveals a structural artifact in SciFact. The study demonstrates robust cross-domain transfer when training on structurally sound data.