chain-of-thought-reasoning

#chain-of-thought-reasoning

Adapting Reinforcement Learning with Chain-of-Thought Supervision for Explainable Detection of Hateful and Propagandistic Memes

arXiv cs.CL ↗ · 2026-06-16 Cached

Proposes a reinforcement learning-based post-training method using Group Relative Policy Optimization (GRPO) and chain-of-thought supervision to improve classification and explanation quality for hateful and propagandistic meme detection in thinking-based multimodal large language models, achieving improvements on the Hateful Memes and ArMeme benchmarks.

0 favorites 0 likes

chain-of-thought-reasoning

Adapting Reinforcement Learning with Chain-of-Thought Supervision for Explainable Detection of Hateful and Propagandistic Memes

Submit Feedback