efficient-training

#efficient-training

GRACE: Gradient-aligned Reasoning Data Curation for Efficient Post-training

arXiv cs.AI ↗ · 4h ago Cached

GRACE proposes a gradient-aligned method that scores individual reasoning steps to select the most valuable data for post-training, achieving 108.8% of full-data performance with only 20% of the data.

0 favorites 0 likes

#efficient-training

@berryxia: Moonshot AI founder Yang Zhilin recently released a 40-minute video. Born in 1992, valedictorian of Tsinghua CS undergrad, PhD from CMU, co-author of Transformer-XL and XLNet, former researcher at Google Brain and Meta, he calmly deconstructs Kimi K2 in front of the camera...

X AI KOLs Timeline ↗ · 6h ago

Moonshot AI founder Yang Zhilin released a 40-minute video detailing the training process of the Kimi K2 model, which cost only $4.6 million. In an 8-model real-time programming competition, Kimi K2 took first place, defeating GPT-5.5 and others, demonstrating how a small team can overturn the traditional compute-stacking paradigm through architecture optimization.

1 favorites 1 likes

#efficient-training

jina-embeddings-v5-omni: Text-Geometry-Preserving Multimodal Embeddings via Frozen-Tower Composition

arXiv cs.CL ↗ · 2d ago Cached

This paper introduces jina-embeddings-v5-omni, a suite of multimodal embedding models that extend text embeddings to image, audio, and video using frozen-tower composition. The method trains only 0.35% of the total weights, maintaining text geometry while achieving competitive state-of-the-art performance with significantly lower computational cost.

0 favorites 0 likes

#efficient-training

CapVector: Learning Transferable Capability Vectors in Parametric Space for Vision-Language-Action Models

Hugging Face Daily Papers ↗ · 3d ago Cached

This paper introduces CapVector, a method that decouples auxiliary training objectives from standard supervised finetuning in Vision-Language-Action models. By extracting transferable capability vectors and applying orthogonal regularization, it enhances model performance and generalization while significantly reducing computational overhead.

0 favorites 0 likes

#efficient-training

Motif-Video 2B: Technical Report

Hugging Face Daily Papers ↗ · 2026-04-14 Cached

Motif-Video 2B is a 2B parameter text-to-video generation model that achieves 83.76% on VBench, surpassing Wan2.1 14B while using 7x fewer parameters and trained on fewer than 10M clips with less than 100,000 H200 GPU hours. The model uses a specialized architecture with shared cross-attention and a three-part backbone to separate prompt alignment, temporal consistency, and detail refinement.

0 favorites 0 likes

#efficient-training

LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models

Papers with Code Trending ↗ · 2024-03-20 Cached

LlamaFactory is a unified framework that enables efficient fine-tuning of over 100 large language models via a web-based interface, eliminating the need for coding.

0 favorites 0 likes

efficient-training

GRACE: Gradient-aligned Reasoning Data Curation for Efficient Post-training

@berryxia: Moonshot AI founder Yang Zhilin recently released a 40-minute video. Born in 1992, valedictorian of Tsinghua CS undergrad, PhD from CMU, co-author of Transformer-XL and XLNet, former researcher at Google Brain and Meta, he calmly deconstructs Kimi K2 in front of the camera...

jina-embeddings-v5-omni: Text-Geometry-Preserving Multimodal Embeddings via Frozen-Tower Composition

CapVector: Learning Transferable Capability Vectors in Parametric Space for Vision-Language-Action Models

Motif-Video 2B: Technical Report

LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models

Submit Feedback