Tag
Pyrecall is a new open-source tool that detects catastrophic forgetting during LLM fine-tuning by snapshotting skill scores before and after training, flagging regressions, and rolling back LoRA adapters. It runs fully locally with no external APIs.
The article argues that the biggest bottleneck in production AI today is not initial model deployment but the continuous iteration cycle—turning production usage (inference logs, user feedback) into datasets for fine-tuning and redeployment. It highlights the need for integrated feedback loops rather than one-off projects.
SenseNova U1 releases an infographic-specific finetune of its U1-8B-MoT base model, achieving significant benchmark improvements in infographic accuracy, chart understanding, and text rendering.
This article explains how to use GRPO to fine-tune an LLM (Qwen3-8B) for reliable JSON structured output, improving schema accuracy from 62% to 82%, surpassing GPT-4.1's 58%.
ART (Art-based Reinforcement Training) enables parameter-efficient fine-tuning of frozen multimodal LLMs by optimizing raw visual input via gradient backpropagation, achieving performance comparable to LoRA while supporting pre-compiled computational graphs for high-throughput engines like vLLM.
This paper introduces DualSelect, a coupled framework for selecting task samples and safety references jointly to preserve safety during LLM fine-tuning without losing task utility. The method improves safety by at least 5 points over existing baselines on 1B–8B LLMs.
ConvMemory v2 is a recall-preserving reranker that reorders the top-10 candidates from ConvMemory v1 using a fine-tuned cross-encoder, improving MRR on the LoCoMo benchmark while preserving recall.
ParaBridge is an on-policy self-distillation method that bridges the gap between paralinguistic perception and dialogue behavior in speech language models, significantly improving safety and empathy without external rewards.
This paper investigates sequential fine-tuning of LLaMA-3.1-8B for automated essay scoring using a curriculum aligned with discourse structure, showing improved coherence and performance compared to independent or randomized training.
OpenRTLSet introduces the largest fully open-source dataset for hardware design with over 131,000 Verilog code samples, enabling fine-tuning of LLMs for Verilog code generation.
This paper introduces Program-based Posterior Training (PPT), a method that uses LLM-generated probabilistic programs to create distributional targets for fine-tuning inductive reasoning, improving estimation accuracy and calibration on held-out tasks and human-alignment benchmarks.
Presented DV-DPO, a method to fine-tune Qwen2.5-7B on domain-specific tasks using only ~$3 in API calls and zero human labelers, achieving 96% composite performance of Claude Haiku via adversarial cross-examination.
Introduces stationary representations learned via d-Simplex fixed classifiers to ensure model compatibility during sequential fine-tuning, enabling continuous retrieval services without reprocessing. Combines cross-entropy and contrastive losses to capture higher-order dependencies.
A new CLI tool for Google Colab enables GPU/TPU provisioning, remote script execution, and interactive REPL access from the terminal, with built-in agent skills for automated tasks like fine-tuning models.
This paper presents a deployment-focused study comparing LoRA fine-tuning of 24 model variants (270M–8B parameters) for merchant information extraction from financial transaction strings. The authors find that smaller models like Qwen 3.5 4B achieve 96.6% F1, within 0.35 points of the 8B baseline, while offering significant reductions in latency and cost.
OmniMem introduces a modality-aware memory allocation and perturbation-aware selection strategy for streaming audio-visual LLMs, achieving 2-4% absolute accuracy gains over compression baselines on long-video benchmarks.
This paper proposes a trait-space monitoring method to detect emergent misalignment in LLMs during supervised finetuning by tracking representational drift in activation space, achieving a 0.990 AUROC with low false positive and false negative rates, outperforming unsupervised baselines.
A post-hoc method reduces spurious correlations in fine-tuned LLMs by truncating the tail of the SVD of the weight update matrix. It reduces the spurious-group gap by up to 5x with less than 2pp accuracy loss, without retraining or group labels.
Omi Health founder fine-tuned NVIDIA's Parakeet TDT 0.6B for medical ASR, releasing open-weights model Omi Med STT v1 that achieves competitive medical-WER while running locally on Mac, CUDA, or CPU.
DeNovoSWE is a large-scale dataset for training code agents to generate entire software repositories from documentation, using a sandboxed agentic workflow and difficulty-aware filtering. Fine-tuning Qwen3-30B-A3B on it boosts performance on the BeyondSWE-Doc2Repo benchmark from 5.8% to 47.2%.