reward-engineering

#reward-engineering

@akshay_pachaar: Karpathy's prediction about RL is coming true now! He called reward functions unreliable and argued that a single rewar…

X AI KOLs Following ↗ · 6d ago Cached

Karpathy's critique of reward functions in RL is addressed by OpenPipe's ART framework using RULER, which allows natural language reward definitions evaluated by an LLM, replacing manual reward engineering.

0 favorites 0 likes

reward-engineering

@akshay_pachaar: Karpathy's prediction about RL is coming true now! He called reward functions unreliable and argued that a single rewar…

Submit Feedback