Tag
The post explains why Reinforcement Learning struggles with long-horizon tasks due to sparse rewards and highlights GEPA, a method that uses trajectory-level textual reflection to preserve richer feedback signals for optimization.