Tag
AlphaGRPO is a new framework that applies Group Relative Policy Optimization to Unified Multimodal Models, enhancing generation through self-reflective refinement and decompositional verifiable rewards.
GPT-Image-2 now has the ability to review its own generated outputs and iteratively refine them until satisfied with correctness, though this process can take around 11 minutes per image.