Tag
TIGER is an inference-time framework that mitigates hallucinations in multimodal generation by extracting observation and claim graphs and assigning risk scores to repair unsupported facts. It reduces unsupported content across image-to-text, image+text-to-text, audio-to-text, and video-to-text tasks.
AlphaGRPO is a new framework that applies Group Relative Policy Optimization to Unified Multimodal Models, enhancing generation through self-reflective refinement and decompositional verifiable rewards.
STARFlow2 is a new research paper introducing an architecture that bridges language models and autoregressive normalizing flows for unified multimodal generation. It addresses structural mismatches in existing systems by using a shared causal masking mechanism for interleaved text-image sequences.
MM-WebAgent is a hierarchical agentic framework that generates coherent and visually consistent webpages by coordinating AIGC-based element generation through joint optimization of layout and multimodal content. The paper introduces a benchmark and multi-level evaluation protocol, demonstrating improvements over code-generation and agent-based baselines.