efficient-training

#efficient-training

ARIA: Adaptive Region-Based Importance Allocation for Conditional Diffusion Distillation

arXiv cs.LG ↗ · yesterday Cached

This paper introduces ARIA, a framework that adaptively allocates training effort across regions of the conditioning space for distilling conditional diffusion models, improving performance on unseen and underrepresented conditions.

0 favorites 0 likes

#efficient-training

@andimarafioti: Can a VLM see without a vision encoder? We trained one for $100, inspired by Gemma 4 12B. Latency on an M3 Pro MacBook:…

X AI KOLs Timeline ↗ · 6d ago Cached

Researchers trained a vision-language model without a vision encoder for only $100, inspired by Gemma 4 12B, achieving a 30% reduction in end-to-end latency on an M3 Pro MacBook.

0 favorites 0 likes

#efficient-training

Implicit Reasoning for Large Language Model-based Generative Recommendation

arXiv cs.CL ↗ · 2026-06-15 Cached

This paper proposes PauseRec, a lightweight implicit reasoning paradigm for LLM-based generative recommendation that outperforms explicit chain-of-thought methods while significantly reducing training and inference costs.

0 favorites 0 likes

#efficient-training

LC-QAT: Data-Efficient 2-Bit QAT for LLMs via Linear-Constrained Vector Quantization

arXiv cs.CL ↗ · 2026-06-10 Cached

Proposes LC-QAT, a 2-bit weight-only vector quantization aware training framework for LLMs that uses a learned affine mapping to enable end-to-end training, achieving state-of-the-art results with only 0.1%-10% of training data.

0 favorites 0 likes

#efficient-training

DOG-DPO:Dynamic Optimization in Geometry for Safety Alignment

arXiv cs.LG ↗ · 2026-06-09 Cached

DOG-DPO is a training-free data selection framework that treats preference pairs as structured geometric signals, decomposing multi-dataset preference geometry into anchor and residual subspaces to select diverse subsets for safety alignment. It achieves strong utility-robustness trade-offs using only 11% of preference pairs across six safety benchmarks.

0 favorites 0 likes

#efficient-training

MaskAlign: Token-Subset Representation Alignment for Efficient Diffusion Training

Hugging Face Daily Papers ↗ · 2026-06-07 Cached

MaskAlign proposes a token-subset representation alignment method that improves diffusion transformer training by reducing reliance on complete token sets and maintaining stable alignment under perturbations.

0 favorites 0 likes

#efficient-training

AI directly in DRAM: The Float Detox – How Pure Logic Unleashes the Future of Learning

Reddit r/artificial ↗ · 2026-06-02

BIN16 replaces all floating-point operations with boolean operations (XNOR+popcount) for neural network training and inference, enabling direct computation in off-the-shelf DRAM with zero floats, gradients, or hyperparameter tuning. It achieves 82% accuracy on MNIST in a single epoch, using only 220 lines of C.

0 favorites 0 likes

#efficient-training

Token-weighted Direct Preference Optimization with Attention

arXiv cs.CL ↗ · 2026-05-22 Cached

Proposes AttentionPO, a token-weighted direct preference optimization method that uses attention from the LLM itself to estimate token weights, improving alignment performance on AlpacaEval, MT-Bench, and ArenaHard without requiring a separate reward model.

0 favorites 0 likes

#efficient-training

HRM Seems To Be Going Off Right Now

Reddit r/LocalLLaMA ↗ · 2026-05-19 Cached

Sapient Intelligence has released HRM-Text, a 1B parameter text generation model, trained on only 0.04 trillion tokens (costing approximately $1000), surpassing much larger models trained on 100-1000 times more data on multiple reasoning benchmarks, marking the beginning of a new paradigm for AI training.

0 favorites 0 likes

#efficient-training

New SOTA 1B model? HRM-text

Reddit r/LocalLLaMA ↗ · 2026-05-19 Cached

HRM-text is a 1B-parameter hierarchical reasoning language model proposed by Sapient Intelligence. It thinks efficiently through internal latent space, achieving performance surpassing most models of the same size with extremely low training cost.

0 favorites 0 likes

#efficient-training

@Sapient_Int: Introducing HRM-Text. An ultra-lean 1B-parameter reasoning language model designed to deliver strong general performanc…

X AI KOLs Timeline ↗ · 2026-05-18 Cached

Sapient Intelligence introduces HRM-Text, a 1B-parameter reasoning language model trained on only 40B tokens with a budget of $1,000, achieving competitive performance while drastically reducing data and compute requirements.

0 favorites 0 likes

#efficient-training

Always Learning, Always Mixing: Efficient and Simple Data Mixing All The Time

arXiv cs.CL ↗ · 2026-05-18 Cached

This paper introduces OP-Mix, a data mixing algorithm that uses low-rank adapters trained on the current model to cheaply simulate candidate data mixtures, enabling efficient and unified data mixing across pretraining, continual midtraining, and continual instruction tuning. OP-Mix consistently finds near-optimal mixtures while using a fraction of the compute of baselines, improving pretraining perplexity by 6.3% and reducing compute by 66-95% in continual learning settings.

0 favorites 0 likes

#efficient-training

microsoft/Lens-Turbo

Hugging Face Models Trending ↗ · 2026-05-15 Cached

Microsoft releases Lens, a 3.8B-parameter foundational text-to-image model with efficient training and fast high-resolution generation, featuring dense-caption pre-training and mixed-resolution learning.

0 favorites 0 likes

#efficient-training

EndPrompt: Efficient Long-Context Extension via Terminal Anchoring

arXiv cs.CL ↗ · 2026-05-15 Cached

EndPrompt proposes a method for extending the context window of large language models using only short training sequences, by anchoring a terminal prompt with target-length positional indices. It achieves strong benchmark results with substantially less computation than full-length fine-tuning.

0 favorites 0 likes

#efficient-training

microsoft/Lens

Hugging Face Models Trending ↗ · 2026-05-15 Cached

Microsoft releases Lens, a 3.8B-parameter foundational text-to-image model designed for efficient training and fast high-resolution generation, achieving competitive quality with reduced compute.

0 favorites 0 likes

#efficient-training

GRACE: Gradient-aligned Reasoning Data Curation for Efficient Post-training

arXiv cs.AI ↗ · 2026-05-14 Cached

GRACE proposes a gradient-aligned method that scores individual reasoning steps to select the most valuable data for post-training, achieving 108.8% of full-data performance with only 20% of the data.

0 favorites 0 likes

#efficient-training

@berryxia: Moonshot AI founder Yang Zhilin recently released a 40-minute video. Born in 1992, valedictorian of Tsinghua CS undergrad, PhD from CMU, co-author of Transformer-XL and XLNet, former researcher at Google Brain and Meta, he calmly deconstructs Kimi K2 in front of the camera...

X AI KOLs Timeline ↗ · 2026-05-14

Moonshot AI founder Yang Zhilin released a 40-minute video detailing the training process of the Kimi K2 model, which cost only $4.6 million. In an 8-model real-time programming competition, Kimi K2 took first place, defeating GPT-5.5 and others, demonstrating how a small team can overturn the traditional compute-stacking paradigm through architecture optimization.

1 favorites 1 likes

#efficient-training

jina-embeddings-v5-omni: Text-Geometry-Preserving Multimodal Embeddings via Frozen-Tower Composition

arXiv cs.CL ↗ · 2026-05-12 Cached

This paper introduces jina-embeddings-v5-omni, a suite of multimodal embedding models that extend text embeddings to image, audio, and video using frozen-tower composition. The method trains only 0.35% of the total weights, maintaining text geometry while achieving competitive state-of-the-art performance with significantly lower computational cost.

0 favorites 0 likes

#efficient-training

CapVector: Learning Transferable Capability Vectors in Parametric Space for Vision-Language-Action Models

Hugging Face Daily Papers ↗ · 2026-05-11 Cached

This paper introduces CapVector, a method that decouples auxiliary training objectives from standard supervised finetuning in Vision-Language-Action models. By extracting transferable capability vectors and applying orthogonal regularization, it enhances model performance and generalization while significantly reducing computational overhead.

0 favorites 0 likes

#efficient-training

Long Context Pre-Training with Lighthouse Attention

Hugging Face Daily Papers ↗ · 2026-05-07 Cached

Lighthouse Attention is a training-only hierarchical selection-based attention algorithm that reduces computational complexity for long sequence training of causal transformers, enabling faster pre-training with competitive final loss after a recovery phase.

0 favorites 0 likes

efficient-training

Submit Feedback