data-efficient

#data-efficient

Data-Efficient Autoregressive-to-Diffusion Language Models via On-Policy Distillation

arXiv cs.CL ↗ · 4d ago Cached

The paper introduces OPDLM, a method that transforms autoregressive language models into diffusion language models via on-policy distillation, requiring 15x to 7000x fewer training tokens while retaining knowledge from the original model.

0 favorites 0 likes

#data-efficient

Which Anatomy Matters Under Limited Labels? A Data-Efficient Anatomy-Aware Benchmark for Cardiac Pathology Prediction

arXiv cs.AI ↗ · 4d ago Cached

This paper presents a data-efficient anatomy-aware benchmark for cardiac pathology prediction on the ACDC MRI dataset, showing that under limited labels, anatomical representation matters more than model complexity.

0 favorites 0 likes

#data-efficient

Learning Robust and Task-Invariant Functional Representation from fMRI through Siamese Self-Supervised Learning

arXiv cs.LG ↗ · 2026-05-29 Cached

This paper introduces BrainSimSiam, a lightweight self-supervised framework using siamese networks to learn robust fMRI representations from positive-only pairs, achieving strong performance on downstream tasks even with limited data.

0 favorites 0 likes

#data-efficient

Retrieval-Based Multi-Label Legal Annotation: Extensible, Data-Efficient and Hallucination-Free

arXiv cs.CL ↗ · 2026-05-19 Cached

This paper proposes a retrieval-based approach for multi-label legal annotation that uses frozen embedding models to retrieve labels via k-nearest neighbors, achieving competitive accuracy, high data efficiency, and eliminating label hallucination by design.

0 favorites 0 likes

#data-efficient

FrameSkip: Learning from Fewer but More Informative Frames in VLA Training

Hugging Face Daily Papers ↗ · 2026-05-13 Cached

FrameSkip is a data-layer frame selection method that improves Vision-Language-Action (VLA) policy training by prioritizing high-importance frames based on action variation and visual-coherence metrics, achieving a macro-average success rate of 76.15% across three benchmarks while using only 20% of unique frames.

0 favorites 0 likes

#data-efficient

Hint Tuning: Less Data Makes Better Reasoners

arXiv cs.CL ↗ · 2026-05-12 Cached

This paper introduces 'Hint Tuning,' a data-efficient method that reduces token usage in reasoning models by calibrating reasoning depth based on problem difficulty. It achieves significant token reduction (24–66%) on models like Qwen3-Thinking and DeepSeek-R1-Distill using only 1K self-annotated samples.

0 favorites 0 likes

data-efficient

Data-Efficient Autoregressive-to-Diffusion Language Models via On-Policy Distillation

Which Anatomy Matters Under Limited Labels? A Data-Efficient Anatomy-Aware Benchmark for Cardiac Pathology Prediction

Learning Robust and Task-Invariant Functional Representation from fMRI through Siamese Self-Supervised Learning

Retrieval-Based Multi-Label Legal Annotation: Extensible, Data-Efficient and Hallucination-Free

FrameSkip: Learning from Fewer but More Informative Frames in VLA Training

Hint Tuning: Less Data Makes Better Reasoners

Submit Feedback