tau2-bench

Tag

Cards List
#tau2-bench

Step 3.7 Flash open weights dropped TODAY and the agent reliability numbers are actually interesting

Reddit r/artificial · 2026-05-29

Step 3.7 Flash, an open-weight 198B sparse MoE model, claims 98% agent reliability on tau2-bench across all difficulty levels, with mid raw capability but strong multi-step consistency.

0 favorites 0 likes
← Back to home

Submit Feedback