Tag
This technical guide explains why organizations should build their own learning loops on open-source AI models rather than renting intelligence from frontier labs, drawing on case studies from finance, robotics, and biotech.
A critical analysis warning that many Qwen/Claude distillation models use too few training samples (e.g., 4K) to transfer actual capabilities, often degrading quality instead of improving it, compared to official distills like DeepSeek-R1 which used ~700K samples.
New Q3 quantizations added to the gemma-4-12B-coder-fable5-composer2.5 GGUF model, enabling the coding-focused fine-tune to run on GPUs with around 6GB VRAM using importance-matrix quantized versions.
EveryonesLLM is an open-source tutorial that provides 29 chapters of Colab notebooks. It teaches users step by step to build a complete large language model from scratch on Google Colab, including pre-training and instruction fine-tuning, and supports Chinese.
This paper presents the first systematic study of multilingual instruction following in Vision-Language-Action (VLA) models, revealing significant performance degradation when models trained on English are evaluated on other languages. The authors propose Multilingual Principal Component Alignment (MPCA) to reduce the multilingual performance gap.
This paper introduces SHARD, a self-reframing distillation method that rewrites sensitive prompts to surface benign intent and fine-tunes models on safe, helpful responses, improving helpfulness while preserving safety.
This paper explores transfer learning for mapping FHIR questionnaire items to LOINC codes using retrieval methods, comparing six approaches on a small evaluation set.
This paper proposes CoTE-SQL, a self-enhanced fine-tuning framework for text-to-SQL that integrates self-reasoning traces, structured chain-of-thought prompting, and execution feedback to achieve state-of-the-art performance on Spider and Bird benchmarks.
ChatPlanner is a novel framework that uses fine-tuned LLMs with Retrieval-Augmented Generation (RAG) to interpret user preferences from natural language queries and integrate them into public transit routing algorithms, outperforming existing route planners.
CogGuard is a proactive-warning framework for edge intelligent services that decouples offline LLM-based profile construction from online SLM-based score prediction, reducing construction time by 48% and fine-tuning time by 19% while achieving lower prediction errors on education and operation datasets.
Proposes ac-gpt, a simple modification to causal Transformers that enables evaluating and sampling from arbitrary conditionals (past, future, mixed) in a single forward pass while preserving left-to-right ordering and next-token prediction, allowing existing LLMs to be fine-tuned for arbitrary conditioning.
This paper introduces AdaNAGED, a method that combines zero-order optimization, parameter-free adaptation, and non-Euclidean update geometry for memory-efficient fine-tuning of large language models, with theoretical convergence guarantees and validation on the OPT-1.3B model.
LangChain and Fireworks fine-tuned a Qwen model to detect 'Perceived Error' from agent traces, achieving 100x cost reduction while maintaining frontier performance. The judge model is designed to enrich traces with error signals for monitoring agentic systems.
Emphasizes the importance of verifiers for LLM-based agents, noting that out-of-distribution tasks cause failures, and suggests tuning custom verifiers.
An open-source LLM called OpenMythos was trained for cybersecurity tasks using SFT and RLVR, with datasets available on HuggingFace. The model aims to reduce hallucinations and improve precision in security-related queries.
A call for open training frameworks in AI research, introducing FeynRL, a modular and explicit framework for RL post-training of LLMs, VLMs, and agents, designed to make training processes visible and modifiable.
The post outlines a future agent recipe for building scalable intelligence by fine-tuning efficient, specialized open models to surpass frontier performance on LLM-as-a-judge tasks, and applying this to extract signals from trace data for continual learning. LangChain Labs and FireworksAI release new work demonstrating this approach.
A joint study by LangChain Labs and Fireworks AI demonstrates fine-tuning an open Qwen model to create a trace judge that detects 'perceived error' in production traces, achieving frontier performance at up to 100x lower cost. The model is evaluated on two internal datasets and shows generality across applications.
Developer @cjzafir announces Finetuner.dev, a CLI tool that uses orchestrator models like Codex 5.5 and Chinese models to generate high-quality, handcrafted datasets for fine-tuning small language models (1B-30B), claiming 10x lower costs and 5x better quality.
Mia-AiLab releases Qwable-3.6-27b, a full fine-tuned checkpoint of Qwen3.6-27B on a cleaned reasoning and instruction dataset, optimized for coding, technical assistance, and structured responses.