Tag
The post explains why Reinforcement Learning struggles with long-horizon tasks due to sparse rewards and highlights GEPA, a method that uses trajectory-level textual reflection to preserve richer feedback signals for optimization.
TACO is a self-evolving framework that automatically discovers and refines context compression rules for long-horizon terminal agents.