Articles from HuggingFace
IBM introduces CUGA, an open-source agent harness that handles plumbing for state, tool calls, and orchestration, allowing developers to focus on defining tools and prompts. The article showcases two dozen single-file example apps built with CUGA, demonstrating how it eliminates repetitive framework setup.
Hugging Face describes how they built a weekly release pipeline for their huggingface_hub library using AI, open-source tools, and human oversight, enabling faster and more reliable releases.
PP-OCRv6 is the latest generation of PaddleOCR's universal OCR model family, offering three tiers from 1.5M to 34.5M parameters, supporting 50 languages, and achieving significant accuracy improvements over previous versions.
Arbor introduces explicit geometric control for 3D asset generation by using constraint meshes (hull, avoidance, touch regions) to condition latent generation, improving spatial constraint adherence without sacrificing object quality.
HAKARI-Bench is a lightweight benchmark for comparing retrieval methods across multiple configurations and languages, enabling efficient model selection and performance analysis. It reproduces full benchmarks like MTEB at high correlation while being faster to run.
MeshFlow introduces an equivariant optimal-transport flow matching model for direct triangle mesh generation, achieving state-of-the-art quality while providing approximately 18x inference speedup over autoregressive methods.
Foresight is a failure detection framework for long-horizon robotic manipulation that uses action-conditioned world model latents and functional conformal prediction to monitor trajectories, trained only with final task labels. It demonstrates state-of-the-art performance across simulation and real robot tasks.
The blog post describes using local open-weight models like Gemma and Qwen in an agent harness to automatically triage issues and pull requests in the OpenClaw repository, enabling real-time notifications without relying on costly closed API models.
KaLM-Reranker-V1 is a fast reranker that decouples query and passage computation using an encoder-decoder architecture with Matryoshka embedding pooling and cross-attention, achieving state-of-the-art reranking performance on BEIR and competitive results on multilingual benchmarks.
VESFlow is a training-free safety method for flow matching-based text-to-image generation that edits velocity fields to ensure safe output while maintaining prompt integrity.
EnterpriseClawBench presents a benchmark for enterprise agents based on real-world workplace sessions, offering 852 reproducible tasks and comprehensive evaluation metrics beyond single performance scores.
This paper argues that language model agents should assist causal discovery workflows by providing contextual support and explanations rather than generating causal conclusions, and introduces causal-learn+ platform to demonstrate this principle.
DR-MV3D presents a map-grounded learning framework with dense rewards to improve multi-view 3D visual question answering through global map construction, view-trajectory planning, and egocentric grounding.
SelfCompact is a scaffolding approach that lets language models autonomously decide when and how to compact long agent traces, achieving better performance with reduced token costs compared to fixed-interval methods.
PhoneBuddy combines real and mock app environments to train open models for agentic phone use, achieving 45.33% task success rate on real phones through mixed reinforcement learning, showing that mock-app training complements real-app training.
CLI-Universe is a synthesis engine that generates verifiable terminal-agent tasks via multi-dimensional capability taxonomy and evidence-guided research, producing a distilled dataset of 6,000 trajectories. Fine-tuning Qwen3-32B on this dataset achieves 33.4% on Terminal-Bench 2.0, setting a new state-of-the-art for open-source models at or below 32B parameters.
Unlimited OCR introduces Reference Sliding Window Attention to eliminate growing memory consumption in long-sequence OCR tasks, enabling efficient transcription of multiple pages in a single forward pass.
This paper introduces Tapered Language Models (TLMs), an architecture principle that allocates more parameters to earlier layers and fewer to later layers, consistently improving perplexity and downstream performance across multiple architectures without extra cost.
UniverSat introduces a Universal Patch Encoder for Vision Transformers that enables robust, sensor-agnostic spatial feature extraction across diverse Earth Observation data types, achieving strong results on classification and segmentation benchmarks.
Tmax introduces a simplified RL training recipe for terminal agents, achieving state-of-the-art performance with a 9B parameter model using a novel data generation taxonomy and an expanded open-source dataset.