HuggingFace

Articles from HuggingFace

Cards List

Accelerating Transformers Fine-Tuning with NVIDIA NeMo AutoModel

Hugging Face Blog · 11h ago Cached

NVIDIA NeMo AutoModel leverages HuggingFace Transformers v5 to deliver 3.4-3.7x higher training throughput and 29-32% less GPU memory for fine-tuning Mixture-of-Experts models, with no code changes beyond a single import.

0 favorites 0 likes

Introducing the FFASR Leaderboard: Benchmarking ASR in the Real World

Hugging Face Blog · yesterday Cached

Introduces the FFASR Leaderboard, an open, community-driven benchmark for evaluating automatic speech recognition models under realistic far-field acoustic conditions, highlighting the significant performance gap between near-field and far-field scenarios.

0 favorites 0 likes

Build real agentic apps using CUGA: two dozen working examples on a lightweight harness

Hugging Face Blog · yesterday Cached

IBM introduces CUGA, an open-source agent harness that handles plumbing for state, tool calls, and orchestration, allowing developers to focus on defining tools and prompts. The article showcases two dozen single-file example apps built with CUGA, demonstrating how it eliminates repetitive framework setup.

0 favorites 0 likes

InSight: Self-Guided Skill Acquisition via Steerable VLAs

Hugging Face Daily Papers · 2d ago Cached

InSight presents a framework for autonomous skill acquisition in vision-language-action (VLA) models by enabling steerability at the primitive-action level and using a VLM-guided data flywheel to generate demonstrations, achieving manipulation tasks like block flipping and pouring without human demonstrations.

0 favorites 0 likes

FLAT: Feedforward Latent Triangle Splatting for Geometrically Accurate Scene Generation

Hugging Face Daily Papers · 2d ago Cached

FLAT proposes a method to decode explicit triangle splats directly from video diffusion latents for geometrically accurate 3D scene generation. It introduces a ray-centered rotation parameterization and a product window function to improve gradient flow, achieving better geometric accuracy than prior feedforward methods while supporting real-time rendering.

0 favorites 0 likes

DiffusionBench: On Holistic Evaluation of Diffusion Transformers

Hugging Face Daily Papers · 2d ago Cached

Researchers introduce NanoGen, a unified framework for training and evaluating diffusion transformers, and propose DiffusionBench, a holistic benchmark combining ImageNet class-conditional and text-to-image generation to better assess progress in generative modeling.

0 favorites 0 likes

FLUX3D: High-Fidelity 3D Gaussian Generation with Diffusion-Aligned Sparse Representation

Hugging Face Daily Papers · 2d ago Cached

FLUX3D introduces a framework for high-fidelity image-to-3D Gaussian Splatting generation by enhancing representation learning and cross-modal alignment with diffusion-aligned structured latents and a sparse-structure-aware diffusion transformer, achieving state-of-the-art results.

0 favorites 0 likes

OpenThoughts-Agent: Data Recipes for Agentic Models

Hugging Face Daily Papers · 2d ago Cached

This paper introduces OpenThoughts-Agent, an open-source data curation pipeline for training agentic language models, achieving a 44.8% average accuracy across seven benchmarks and outperforming prior open datasets through systematic experiments.

0 favorites 0 likes

World Value Models for Robotic Manipulation

Hugging Face Daily Papers · 2d ago Cached

The paper presents World Value Model (WVM), a generalist robotic value model that combines world models with value estimation to accurately assess task progression and improve robotic policy learning from mixed-quality data, achieving state-of-the-art results on standard benchmarks and a new suboptimal data benchmark.

0 favorites 0 likes

Are Text-to-Image Models Inductivist Turkeys? A Counterfactual Benchmark for Causal Reasoning

Hugging Face Daily Papers · 2d ago Cached

This paper introduces CF-World, a counterfactual benchmark to evaluate whether text-to-image models rely on causal reasoning or mere pattern matching. Experiments show all models degrade sharply in counterfactual settings, suggesting their understanding is limited to tightly coupled visual-textual patterns rather than genuine causal reasoning.

0 favorites 0 likes

Holistic Data Scheduler for LLM Pre-training via Multi-Objective Reinforcement Learning

Hugging Face Daily Papers · 2d ago Cached

Introduces Holistic Data Scheduler (HDS), a reinforcement learning-based framework that dynamically adjusts data mixtures during LLM pre-training using a multi-objective reward function, achieving 44% fewer iterations to reach target perplexity and a 7.2% improvement on MMLU.

0 favorites 0 likes

ReMMD: Realistic Multilingual Multi-Image Agentic Verification for Multimodal Misinformation Detection

Hugging Face Daily Papers · 2d ago Cached

ReMMD introduces a realistic multilingual multi-image agentic verification framework for multimodal misinformation detection, including a benchmark (ReMMDBench) with 500 samples and 2,756 images, and an agent (ReMMD-Agent) that achieves superior veracity performance with reduced costs.

0 favorites 0 likes

DREAM: Dense Retrieval Embeddings via Autoregressive Modeling

Hugging Face Daily Papers · 2d ago Cached

DREAM trains dense retrieval embeddings by using autoregressive language model attention to supervise query-document similarity, eliminating the need for labeled data. It consistently outperforms baselines on BEIR and RTEB benchmarks across model scales.

0 favorites 0 likes

NatureBench: Can Coding Agents Match the Published SOTA of Nature-Family Papers?

Hugging Face Daily Papers · 2d ago Cached

NatureBench is a cross-disciplinary benchmark of 90 scientific tasks from Nature publications, designed to evaluate AI coding agents' ability to achieve genuine discovery. Current agents succeed mainly through methodological translation, not scientific innovation.

0 favorites 0 likes

FlowR2A: Learning Reward-to-Action Distribution for Multimodal Driving Planning

Hugging Face Daily Papers · 2d ago Cached

FlowR2A proposes a novel method that combines dense reward supervision with dynamic proposal generation using a flow-matching decoder for multimodal driving planning, achieving state-of-the-art results on the NAVSIM benchmarks.

0 favorites 0 likes

Escaping the Self-Confirmation Trap: An Execute-Distill-Verify Paradigm for Agentic Experience Learning

Hugging Face Daily Papers · 2d ago Cached

This paper proposes the EDV framework, which uses multiple heterogeneous agents in execute-distill-verify stages to build reliable experiences for LLM agents, preventing self-confirmatory errors and improving performance on long-horizon benchmarks.

0 favorites 0 likes

Experimenting with the proposed Cross-Origin Storage API in Transformers.js

Hugging Face Blog · 2d ago Cached

This guest post explores the proposed Cross-Origin Storage API to improve caching of AI model resources in Transformers.js, enabling efficient reuse across origins while maintaining privacy and integrity for in-browser inference.

0 favorites 0 likes

Shipping huggingface_hub every week with AI, open tools, and a human in the loop

Hugging Face Blog · 2d ago Cached

Hugging Face describes how they built a weekly release pipeline for their huggingface_hub library using AI, open-source tools, and human oversight, enabling faster and more reliable releases.

0 favorites 0 likes

Qwen/Qwen-AgentWorld-35B-A3B

Hugging Face Models Trending · 2d ago Cached

Qwen releases Qwen-AgentWorld-35B-A3B, a native language world model that simulates agentic environments across seven domains via long chain-of-thought reasoning. The model is trained with a three-stage pipeline and supports MCP, Search, Terminal, SWE, Android, Web, and OS interactions.

0 favorites 0 likes

PP-OCRv6 on Hugging Face: 50-Language OCR from 1.5M to 34.5M Parameters

Hugging Face Blog · 2d ago Cached

PP-OCRv6 is the latest generation of PaddleOCR's universal OCR model family, offering three tiers from 1.5M to 34.5M parameters, supporting 50 languages, and achieving significant accuracy improvements over previous versions.

0 favorites 0 likes
Next →
← Back to home

Submit Feedback