multimodal-generation

Tag

Cards List
#multimodal-generation

TIGER: Traceable Inference with Graph-Based Evidence Routing for Mitigating Hallucinations in Multimodal Generation

arXiv cs.AI · 6d ago Cached

TIGER is an inference-time framework that mitigates hallucinations in multimodal generation by extracting observation and claim graphs and assigning risk scores to repair unsupported facts. It reduces unsupported content across image-to-text, image+text-to-text, audio-to-text, and video-to-text tasks.

0 favorites 0 likes
#multimodal-generation

AlphaGRPO: Unlocking Self-Reflective Multimodal Generation in UMMs via Decompositional Verifiable Reward

Hugging Face Daily Papers · 2026-05-12 Cached

AlphaGRPO is a new framework that applies Group Relative Policy Optimization to Unified Multimodal Models, enhancing generation through self-reflective refinement and decompositional verifiable rewards.

0 favorites 0 likes
#multimodal-generation

STARFlow2: Bridging Language Models and Normalizing Flows for Unified Multimodal Generation

Hugging Face Daily Papers · 2026-05-08 Cached

STARFlow2 is a new research paper introducing an architecture that bridges language models and autoregressive normalizing flows for unified multimodal generation. It addresses structural mismatches in existing systems by using a shared causal masking mechanism for interleaved text-image sequences.

0 favorites 0 likes
#multimodal-generation

MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation

Hugging Face Daily Papers · 2026-04-16 Cached

MM-WebAgent is a hierarchical agentic framework that generates coherent and visually consistent webpages by coordinating AIGC-based element generation through joint optimization of layout and multimodal content. The paper introduces a benchmark and multi-level evaluation protocol, demonstrating improvements over code-generation and agent-based baselines.

0 favorites 0 likes
← Back to home

Submit Feedback