Tag
This paper introduces a retrieval-augmented vision-language-action policy that eliminates per-task fine-tuning by using pre-trained models with indexed demonstrations, enabling efficient cross-embodiment generalization and task adaptation at test time.
A finetuned Pythia-6.9B model on two instruction-following datasets for 550 steps becomes capable in 13 languages, showing significant improvement over the base model.
Claude Fable 5 completed a project that typically takes 4 months in just 3 hours, including a complete 7-stage pipeline, TUI interface, HTML dashboard, 39 specialized skills, 8700 lines of code, and 235 tests, achieving 98% completion in one shot.
AAbAAC is a manually annotated corpus of 115 PubMed abstracts for autoimmunity information extraction, focusing on entities like autoimmune diseases and autoantibodies. The study demonstrates improved NER performance after fine-tuning on this corpus.
This paper reveals that the scaling factor α in LoRA optimization is more influential than the learning rate, and proposes LoRA-α, a framework that improves performance and simplifies hyperparameter search by restoring α to its principled regime.
PolyAlign is a distribution-aware alignment framework that aligns language models to context-specific human response distributions rather than a single global style, improving naturalness and faithfulness across bilingual settings.
This paper presents an empirical study of Direct Preference Optimization (DPO) for fine-tuning a large language model, showing that DPO simplifies the training pipeline and achieves competitive performance while addressing training instability.
Fine-tuning small LLMs (3B-7B) with QLoRA on biomedical claim verification achieves higher F1 than GPT-4o and GPT-5 at 44.5x lower cost, and reveals a structural artifact in SciFact. The study demonstrates robust cross-domain transfer when training on structurally sound data.
This paper presents MentalMARBERT, a domain-adapted Arabic language model for detecting mental health disorders from social media text. The framework uses domain-adaptive pre-training and a two-stage fine-tuning approach, achieving 0.877 accuracy and 0.861 macro-F1 on a newly constructed Arabic mental health dataset of 50,670 tweets.
FastContext introduces specialized exploration models that separate repository exploration from code solving in LLM agents, reducing token consumption by up to 60% while improving resolution rates on software engineering benchmarks.
ClinHallu is a benchmark for diagnosing and mitigating hallucinations in medical multimodal large language models by decomposing reasoning into visual recognition, knowledge recall, and reasoning integration stages, using trace-supervised fine-tuning to reduce errors.
HyVLA-0.5 is an end-to-end robotic learning system that integrates data collection, model design, pre-training, fine-tuning, and reinforcement learning for real-world deployment.
This research introduces a method using interpretability to predict which behaviors DPO will amplify or suppress from a preference dataset before training, enabling data debugging to prevent undesired effects. The technique achieves R²=0.9 prediction accuracy and is integrated into Goodfire's Silico platform.
llmfan46 released a quadruple set of uncensored, fine-tuned and quantized Gemma-4 models on Hugging Face, including 12B, 26B-A4B, and 31B variants with QAT and GGUF formats.
Release of fine-tuned versions of Qwen3.5: the Nex-N2 Pro 397B and Nex-N2 Mini 35B, with strong benchmark results.
A 230-page book that comprehensively covers LLM concepts including pre-training, fine-tuning, alignment, and prompting techniques.
This paper introduces SWARR, a two-stage recipe using supervised fine-tuning and reinforcement learning to adapt sliding-window attention models for mathematical reasoning, showing that RL can narrow the performance gap with self-attention while maintaining efficiency.
This paper introduces ISE, a three-stage synthesis paradigm for generating multi-turn OS-agent trajectories with grounded execution, demonstrating that fine-tuning on the resulting ISE-Trace dataset significantly improves agent performance on ClawEval.
Introduces Compatibility-Aware Dynamic Fine-Tuning (CADFT), an extension of Dynamic Fine-Tuning that controls sample-level optimization variance in LLM supervised fine-tuning, improving stability and generalization.
This paper introduces a benchmark for predicting spreadsheet user actions, addressing challenges in edit history availability and complex action spaces through manual curation and online evaluation methodology.