Tag
Discusses the structural weakness of current evaluation methods for LLMs, which fail to anticipate qualitative shifts in capability, and argues that developing proactive evaluation infrastructure is the critical bottleneck for safe capability jumps.