Tag
This paper identifies a 'routing plateau' phenomenon where diverse LLM routing methods converge to similar accuracy, far below the oracle, due to a predictability bottleneck that limits query-specific routing. It then shows that larger datasets, stronger encoders, and fine-tuning can help break through this plateau.
OpenAI presents a large-scale empirical study of curiosity-driven reinforcement learning without extrinsic rewards across 54 benchmark environments, showing strong performance and investigating the role of feature spaces in prediction-based reward signals.