fine-tuning

Tag

Cards List
#fine-tuning

Pyrecall open source tool for detecting catastrophic forgetting during LLM fine-tuning[P]

Reddit r/MachineLearning · 2026-06-10

Pyrecall is a new open-source tool that detects catastrophic forgetting during LLM fine-tuning by snapshotting skill scores before and after training, flagging regressions, and rolling back LoRA adapters. It runs fully locally with no external APIs.

0 favorites 0 likes
#fine-tuning

The biggest AI bottleneck today with deployment layer is model iteration

Reddit r/artificial · 2026-06-10

The article argues that the biggest bottleneck in production AI today is not initial model deployment but the continuous iteration cycle—turning production usage (inference logs, user feedback) into datasets for fine-tuning and redeployment. It highlights the need for integrated feedback loops rather than one-off projects.

0 favorites 0 likes
#fine-tuning

SenseNova U1 dropped an infographic-specific finetune

Reddit r/LocalLLaMA · 2026-06-10

SenseNova U1 releases an infographic-specific finetune of its U1-8B-MoT base model, achieving significant benchmark improvements in infographic accuracy, chart understanding, and text rendering.

0 favorites 0 likes
#fine-tuning

@akshay_pachaar: https://x.com/akshay_pachaar/status/2064700531600458093

X AI KOLs Following · 2026-06-10 Cached

This article explains how to use GRPO to fine-tune an LLM (Qwen3-8B) for reliable JSON structured output, improving schema accuracy from 62% to 82%, surpassing GPT-4.1's 58%.

0 favorites 0 likes
#fine-tuning

Fine-tuning Multi-modal LLMs with ART: Art-based Reinforcement Training

Hugging Face Daily Papers · 2026-06-10 Cached

ART (Art-based Reinforcement Training) enables parameter-efficient fine-tuning of frozen multimodal LLMs by optimizing raw visual input via gradient backpropagation, achieving performance comparable to LoRA while supporting pre-compiled computational graphs for high-throughput engines like vLLM.

0 favorites 0 likes
#fine-tuning

Two to Tango: Coupled Task-Reference Selection for Safe LLM Fine-tuning

arXiv cs.LG · 2026-06-10 Cached

This paper introduces DualSelect, a coupled framework for selecting task samples and safety references jointly to preserve safety during LLM fine-tuning without losing task utility. The method improves safety by at least 5 points over existing baselines on 1B–8B LLMs.

0 favorites 0 likes
#fine-tuning

ConvMemory v2: A Recall-Preserving Top-10 Evidence Reranker for Conversational Memory Retrieval

arXiv cs.CL · 2026-06-10 Cached

ConvMemory v2 is a recall-preserving reranker that reorders the top-10 candidates from ConvMemory v1 using a fine-tuned cross-encoder, improving MRR on the LoCoMo benchmark while preserving recall.

0 favorites 0 likes
#fine-tuning

ParaBridge: Bridging Paralinguistic Perception and Dialogue Behavior in Speech Language Models

arXiv cs.CL · 2026-06-10 Cached

ParaBridge is an on-policy self-distillation method that bridges the gap between paralinguistic perception and dialogue behavior in speech language models, significantly improving safety and empathy without external rewards.

0 favorites 0 likes
#fine-tuning

The Order Matters: Sequential Fine-Tuning of LLaMA for Coherent Automated Essay Scoring

arXiv cs.CL · 2026-06-10 Cached

This paper investigates sequential fine-tuning of LLaMA-3.1-8B for automated essay scoring using a curriculum aligned with discourse structure, showing improved coherence and performance compared to independent or randomized training.

0 favorites 0 likes
#fine-tuning

OpenRTLSet: A Fully Open-Source Dataset for Large Language Model-based Verilog Module Design

arXiv cs.CL · 2026-06-10 Cached

OpenRTLSet introduces the largest fully open-source dataset for hardware design with over 131,000 Verilog code samples, enabling fine-tuning of LLMs for Verilog code generation.

0 favorites 0 likes
#fine-tuning

Using Probabilistic Programs to Train Inductive Reasoning in Large Language Models

arXiv cs.CL · 2026-06-10 Cached

This paper introduces Program-based Posterior Training (PPT), a method that uses LLM-generated probabilistic programs to create distributional targets for fine-tuning inductive reasoning, improving estimation accuracy and calibration on held-out tasks and human-alignment benchmarks.

0 favorites 0 likes
#fine-tuning

Fine-tuned Qwen2.5-7B to 96% of Claude Haiku on a domain-specific task using ~$3 of API calls and zero human labelers

Reddit r/LocalLLaMA · 2026-06-10

Presented DV-DPO, a method to fine-tune Qwen2.5-7B on domain-specific tasks using only ~$3 in API calls and zero human labelers, achieving 96% composite performance of Claude Haiku via adversarial cross-examination.

0 favorites 0 likes
#fine-tuning

A Stationary (and Therefore Compatible) Representation is All You Need

Hugging Face Daily Papers · 2026-06-10 Cached

Introduces stationary representations learned via d-Simplex fixed classifiers to ensure model compatibility during sequential fine-tuning, enabling continuous retrieval services without reprocessing. Combines cross-entropy and contrastive losses to capture higher-order dependencies.

0 favorites 0 likes
#fine-tuning

@_philschmid: Google Colab CLI and Skills are out. Full Colab runtimes from your terminal. - GPU/TPU provisioning (colab --gpu A100) …

X AI KOLs Following · 2026-06-09 Cached

A new CLI tool for Google Colab enables GPU/TPU provisioning, remote script execution, and interactive REPL access from the terminal, with built-in agent skills for automated tasks like fine-tuning models.

0 favorites 0 likes
#fine-tuning

How Small Can You Go? LoRA Fine-Tuning 270M-8B Models for Merchant Information Extraction in Financial Transactions

arXiv cs.AI · 2026-06-09 Cached

This paper presents a deployment-focused study comparing LoRA fine-tuning of 24 model variants (270M–8B parameters) for merchant information extraction from financial transaction strings. The authors find that smaller models like Qwen 3.5 4B achieve 96.6% F1, within 0.35 points of the 8B baseline, while offering significant reductions in latency and cost.

0 favorites 0 likes
#fine-tuning

OmniMem: Perturbation-aware Memory Compression for Streaming Audio-Visual LLMs

arXiv cs.AI · 2026-06-09 Cached

OmniMem introduces a modality-aware memory allocation and perturbation-aware selection strategy for streaming audio-visual LLMs, achieving 2-4% absolute accuracy gains over compression baselines on long-video benchmarks.

0 favorites 0 likes
#fine-tuning

Trait-space Monitoring for Emergent Misalignment During Supervised Finetuning

arXiv cs.LG · 2026-06-09 Cached

This paper proposes a trait-space monitoring method to detect emergent misalignment in LLMs during supervised finetuning by tracking representational drift in activation space, achieving a 0.990 AUROC with low false positive and false negative rates, outperforming unsupervised baselines.

0 favorites 0 likes
#fine-tuning

Shortcuts in the Tail: Debiasing via Post-Hoc Spectral Compression of Fine-Tuning Updates

arXiv cs.LG · 2026-06-09 Cached

A post-hoc method reduces spurious correlations in fine-tuned LLMs by truncating the tail of the SVD of the weight update matrix. It reduces the spurious-group gap by up to 5x with less than 2pp accuracy loss, without retraining or group labels.

0 favorites 0 likes
#fine-tuning

I fine-tuned Parakeet 0.6B for medical ASR — open weights, local Mac/CUDA/CPU

Reddit r/LocalLLaMA · 2026-06-09

Omi Health founder fine-tuned NVIDIA's Parakeet TDT 0.6B for medical ASR, releasing open-weights model Omi Med STT v1 that achieves competitive medical-WER while running locally on Mac, CUDA, or CPU.

0 favorites 0 likes
#fine-tuning

DeNovoSWE: Scaling Long-Horizon Environments for Generating Entire Repositories from Scratch

Hugging Face Daily Papers · 2026-06-09 Cached

DeNovoSWE is a large-scale dataset for training code agents to generate entire software repositories from documentation, using a sandboxed agentic workflow and difficulty-aware filtering. Fine-tuning Qwen3-30B-A3B on it boosts performance on the BeyondSWE-Doc2Repo benchmark from 5.8% to 47.2%.

0 favorites 0 likes
← Previous
Next →
← Back to home

Submit Feedback