Tag
Proposes Mixture of Debaters (MoD), a framework using Mixture-of-Experts to enable dynamic self-debate within a single LLM, achieving superior accuracy with drastically lower latency and token consumption.
This paper proposes the LLM-as-Environment-Engineer framework, where a policy model analyzes failures to automatically redesign the training environment for reinforcement learning, and introduces MAPF-FrozenLake as a controllable testbed. The framework, using Qwen3-4B, outperforms larger models like GPT and Gemini, showing that policy learning improves the model's ability to diagnose weaknesses.