unified-model

#unified-model

Audio Interaction Model

Hugging Face Daily Papers ↗ · 2026-06-03 Cached

This paper introduces Audio-Interaction, a unified streaming audio model that combines offline task execution with real-time audio instruction following via an end-to-end framework. It proposes SoundFlow for the perceive-decide-respond loop and evaluates competitive performance across benchmarks.

0 favorites 0 likes

#unified-model

Representation Forcing for Bottleneck-Free Unified Multimodal Models

Hugging Face Daily Papers ↗ · 2026-05-29 Cached

Introduces Representation Forcing (RF), a technique that enables unified multimodal models to perform both perception and generation end-to-end without external VAE latent spaces, matching state-of-the-art VAE-based models in image generation while improving understanding.

0 favorites 0 likes

#unified-model

Lumos-Nexus: Efficient Frequency Bridging with Homogeneous Latent Space for Video Unified Models

Hugging Face Daily Papers ↗ · 2026-05-29 Cached

Lumos-Nexus is a training-efficient video generation framework that uses a two-stage design with a lightweight generator for training and a high-capacity pretrained generator for inference, achieving enhanced visual fidelity through Unified Progressive Frequency Bridging.

0 favorites 0 likes

#unified-model

Unified Neural Scaling Laws

Hugging Face Daily Papers ↗ · 2026-05-25 Cached

Presents a unified neural scaling law that accurately models deep neural network scaling across multiple dimensions including parameters, dataset size, training steps, and compute, validated across diverse architectures and tasks.

0 favorites 0 likes

#unified-model

UniT: Unified Geometry Learning with Group Autoregressive Transformer

Hugging Face Daily Papers ↗ · 2026-05-20 Cached

UniT is a unified feed-forward model for geometry perception using a Group Autoregressive Transformer that integrates multiple paradigms (online/offline, multi-modal, long-horizon) while maintaining metric-scale accuracy via scale-adaptive loss and queue-style KV caching. It achieves state-of-the-art performance on ten benchmarks spanning seven tasks.

0 favorites 0 likes

#unified-model

Uni-Edit: Intelligent Editing Is A General Task For Unified Model Tuning

Hugging Face Daily Papers ↗ · 2026-05-20 Cached

Uni-Edit proposes using intelligent image editing as a single general task to simultaneously improve unified multimodal models' understanding, generation, and editing capabilities, with an automated data synthesis pipeline creating complex editing instructions.

0 favorites 0 likes

#unified-model

Lance: Unified Multimodal Modeling by Multi-Task Synergy

Hugging Face Daily Papers ↗ · 2026-05-18 Cached

Lance is a unified multimodal model that leverages a dual-stream mixture-of-experts architecture and collaborative multi-task training to achieve strong performance in understanding, generation, and editing of both images and videos, outperforming existing open-source unified models.

0 favorites 0 likes

#unified-model

bytedance-research/Lance

Hugging Face Models Trending ↗ · 2026-05-15 Cached

ByteDance Research introduces Lance, a 3B-parameter unified multimodal model trained from scratch on 128 A100 GPUs, capable of image and video understanding, generation, and editing within a single framework.

0 favorites 0 likes

#unified-model

Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation

Papers with Code Trending ↗ · 2026-04-27 Cached

Tuna-2 is a unified multimodal model that achieves state-of-the-art performance by processing visual understanding and generation directly from pixel embeddings, eliminating the need for pretrained vision encoders.

0 favorites 0 likes

#unified-model

LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model

Hugging Face Daily Papers ↗ · 2026-04-22 Cached

LLaDA2.0-Uni unifies multimodal understanding and generation within a single diffusion-based large language model architecture.

0 favorites 0 likes

#unified-model

UniCorn: Towards Self-Improving Unified Multimodal Models through Self-Generated Supervision

Papers with Code Trending ↗ · 2026-01-06 Cached

UniCorn is a framework that enables unified multimodal models to self-improve by using a multi-agent system for prompt generation, image creation, and quality evaluation, achieving state-of-the-art results on text-to-image benchmarks like TIIF, WISE, and OneIG-EN.

0 favorites 0 likes

unified-model

Submit Feedback