HuggingFace

Articles from HuggingFace

Cards List

CyberSecQwen-4B: Why Defensive Cyber Needs Small, Specialized, Locally-Runnable Models

Hugging Face Blog · 19h ago Cached

CyberSecQwen-4B is a small, specialized 4B parameter model fine-tuned for defensive cybersecurity tasks, designed to run locally on a single GPU, addressing privacy, cost, and air-gapped deployment needs.

1 favorites 1 likes

EMO: Pretraining mixture of experts for emergent modularity

Hugging Face Blog · 21h ago Cached

Allen AI releases EMO, a mixture-of-experts model where modular structure emerges naturally from data, enabling use of just 12.5% of experts for a task while maintaining near full-model performance.

0 favorites 0 likes

HiDream-ai/HiDream-O1-Image

Hugging Face Models Trending · yesterday Cached

HiDream-ai has open-sourced HiDream-O1-Image (8B), a unified image generative foundation model built on a Pixel-level Unified Transformer (UiT) that natively handles text-to-image, image editing, and subject-driven personalization at up to 2048×2048 resolution without external VAEs or disjoint text encoders. It debuted at #8 in the Artificial Analysis Text to Image Arena and is positioned as a leading open-weights text-to-image model.

0 favorites 0 likes

MedQA: Fine-Tuning a Clinical AI on AMD ROCm — No CUDA Required

Hugging Face Blog · yesterday Cached

A tutorial and project demonstrating LoRA fine-tuning of Qwen3-1.7B on AMD MI300X using ROCm for clinical question answering, providing a CUDA-free alternative for medical AI development.

0 favorites 0 likes

EMO: Pretraining Mixture of Experts for Emergent Modularity

Hugging Face Daily Papers · 2d ago Cached

EMO is a Mixture-of-Experts model that enables modular deployment by grouping similar domain tokens with shared experts, achieving performance comparable to standard MoEs while allowing significant expert pruning (25% experts retain 99% performance) without performance degradation.

0 favorites 0 likes

PianoCoRe: Combined and Refined Piano MIDI Dataset

Hugging Face Daily Papers · 2d ago Cached

PianoCoRe is a large-scale piano MIDI dataset unifying and refining open-source corpora with 250,046 performances of 5,625 pieces by 483 composers, featuring note-level alignments for music information retrieval and including a MIDI quality classifier and alignment refinement pipeline.

0 favorites 0 likes

GeoStack: A Framework for Quasi-Abelian Knowledge Composition in VLMs

Hugging Face Daily Papers · 2d ago Cached

GeoStack introduces a geometric framework to compose independently trained domain experts in Vision-Language Models without catastrophic forgetting, achieving constant-time inference and a 10x reduction in geometric error.

0 favorites 0 likes

StraTA: Incentivizing Agentic Reinforcement Learning with Strategic Trajectory Abstraction

Hugging Face Daily Papers · 2d ago Cached

StraTA proposes strategic trajectory abstraction for long-horizon LLM agents, using hierarchical GRPO-style rollout with diverse strategy sampling and critical self-judgment to improve sample efficiency and final performance over frontier models and prior RL baselines.

0 favorites 0 likes

When No Benchmark Exists: Validating Comparative LLM Safety Scoring Without Ground-Truth Labels

Hugging Face Daily Papers · 2d ago Cached

This paper introduces a framework for validating comparative LLM safety scoring without ground-truth labels, using an 'instrumental-validity chain' to establish deployment evidence. It demonstrates the method using a local-first tool called SimpleAudit on Norwegian safety packs and compares models like Borealis and Gemma 3.

0 favorites 0 likes

Sparkle: Realizing Lively Instruction-Guided Video Background Replacement via Decoupled Guidance

Hugging Face Daily Papers · 2d ago Cached

This paper introduces Sparkle, a new dataset and benchmark for instruction-guided video background replacement, addressing the lack of high-quality training data in this domain. It proposes a scalable pipeline with decoupled guidance to generate realistic foreground-background interactions.

0 favorites 0 likes

The Granularity Axis: A Micro-to-Macro Latent Direction for Social Roles in Language Models

Hugging Face Daily Papers · 2d ago Cached

This research paper investigates how Large Language Models encode social role granularity as a structured latent dimension. It demonstrates that this 'Granularity Axis' is consistent across architectures like Qwen3 and Llama-3, and can be causally manipulated via activation steering.

0 favorites 0 likes

Auto Research with Specialist Agents Develops Effective and Non-Trivial Training Recipes

Hugging Face Daily Papers · 2d ago Cached

This paper introduces an auto-research framework using specialist agents to iteratively refine training recipes through an empirical loop of code execution and feedback. The system autonomously improves performance on tasks like Parameter Golf and NanoChat without human intervention by leveraging lineage feedback.

0 favorites 0 likes

MARBLE: Multi-Aspect Reward Balance for Diffusion RL

Hugging Face Daily Papers · 2d ago Cached

This paper introduces MARBLE, a gradient-space optimization framework for multi-reward reinforcement learning fine-tuning of diffusion models, which harmonizes policy gradients without manual weighting.

0 favorites 0 likes

Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key

Hugging Face Daily Papers · 2d ago Cached

This paper introduces ScaleLogic, a framework demonstrating that RL training compute scales as a power law with reasoning depth in LLMs. It highlights that logical expressiveness is key to improving downstream transfer and training efficiency.

0 favorites 0 likes

AI Co-Mathematician: Accelerating Mathematicians with Agentic AI

Hugging Face Daily Papers · 2d ago Cached

This paper introduces the AI Co-Mathematician, a workbench that uses agentic AI to support mathematicians in open-ended research tasks like ideation and theorem proving. Early tests show the system achieving state-of-the-art results on hard problem-solving benchmarks, including a 48% score on FrontierMath Tier 4.

0 favorites 0 likes

Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning

Hugging Face Daily Papers · 2d ago Cached

Skill1 is a unified framework that trains a single policy to co-evolve skill selection, utilization, and distillation using a shared task-outcome objective. Experiments on ALFWorld and WebShop show it outperforms existing baselines in complex task environments.

0 favorites 0 likes

SkillOS: Learning Skill Curation for Self-Evolving Agents

Hugging Face Daily Papers · 2d ago Cached

This paper introduces SkillOS, a reinforcement learning framework that enables LLM agents to learn long-term skill curation policies for self-evolution, improving performance and generalization across tasks.

0 favorites 0 likes

Continuous Latent Diffusion Language Model

Hugging Face Daily Papers · 2d ago Cached

Cola DLM is a hierarchical latent diffusion language model that uses text-to-latent mapping and conditional decoding to achieve efficient, non-autoregressive text generation.

0 favorites 0 likes

UniPool: A Globally Shared Expert Pool for Mixture-of-Experts

Hugging Face Daily Papers · 2d ago Cached

UniPool introduces a shared expert pool architecture for Mixture-of-Experts models, reducing parameter growth with depth while improving efficiency and performance over standard MoE baselines.

0 favorites 0 likes

Think, then Score: Decoupled Reasoning and Scoring for Video Reward Modeling

Hugging Face Daily Papers · 2d ago Cached

This paper introduces DeScore, a video reward model that decouples reasoning and scoring processes to improve training efficiency and generalization. It addresses the limitations of existing discriminative and generative reward models by using a 'think-then-score' paradigm with multimodal large language models.

0 favorites 0 likes
Next →
← Back to home

Submit Feedback