@andykonwinski: 3 take-aways from chatting w/ top AI researchers last month: - Evals are the “source code” of AI agents (48:35) - BigAI…

X AI KOLs Following 06/25/26, 05:49 PM News

ai-research evaluations reinforcement-learning pedagogy academia industry conference-takeaways

Summary

Summary of three key takeaways from conversations with leading AI researchers at CAISconf, covering the importance of evaluations for AI agents, the trade-offs between industry and academia, and a novel pedagogical RL approach.

3 take-aways from chatting w/ top AI researchers last month: - Evals are the “source code” of AI agents (48:35) - BigAI labs: $ + anonymous impact. Academia -> impact w/ your own individual voice (44:00) - Pedagogical RL: agent solves problem it knows the answer to (unintuitive!); reward solutions that don't take shortcuts, then distill them into a student model. (53:40)

Original Article

View Cached Full Text

Cached at: 06/26/26, 04:12 PM

3 take-aways from chatting w/ top AI researchers last month:

Evals are the “source code” of AI agents (48:35)
BigAI labs: $ + anonymous impact. Academia -> impact w/ your own individual voice (44:00)
Pedagogical RL: agent solves problem it knows the answer to (unintuitive!); reward solutions that don’t take shortcuts, then distill them into a student model. (53:40)

Laude Institute (@LaudeInstitute): At @CAISconf last month, @andykonwinski sat down with researchers on the conference floor – @matei_zaharia @istoica05 @lateinteraction @dawnsongtweets @gneubig @pgasawa @JonSaadFalcon @heathercmiller @ryanmart3n @alexgshaw @profjoeyg @swyx and Ioannis Ioannidis – to talk

@andykonwinski: 3 take-aways from chatting w/ top AI researchers last month: - Evals are the “source code” of AI agents (48:35) - BigAI…

Similar Articles

@pauliusztin_: Every day, 100+ people ask me, "How can I learn AI evals?" I copy-paste these 11 links (every time): 1. AI evals & obse…

Demystifying evals for AI agents

@zodchiii: Three Anthropic engineers just spent 16 minutes on what makes AI agents actually succeed in production. If the people w…

@OpenAI: Let’s talk about evals. We’re always looking for better ways to measure and forecast model progress, especially as benc…

@levie: Almost all AI model and agent progress is downstream from evals. Open weights post training for specific domains comes …

Submit Feedback

Similar Articles

@pauliusztin_: Every day, 100+ people ask me, "How can I learn AI evals?" I copy-paste these 11 links (every time): 1. AI evals & obse…

Demystifying evals for AI agents

@zodchiii: Three Anthropic engineers just spent 16 minutes on what makes AI agents actually succeed in production. If the people w…

@OpenAI: Let’s talk about evals. We’re always looking for better ways to measure and forecast model progress, especially as benc…

@levie: Almost all AI model and agent progress is downstream from evals. Open weights post training for specific domains comes …