Tag
Introduces BenchBench, a benchmark that tests AI models' ability to create effective benchmarks for other models, with GPT 5.2 being the only successful winner so far while frontier models like GPT 5.5 and Opus 4.6 struggled.
A new study reanalyzing old footage shows that beluga whales exhibit behavioral hallmarks of mirror self-recognition, a test of self-awareness, adding them to a short list of species that pass the test.
This paper proposes a hierarchical multi-agent reference architecture called HANA for achieving Level 4/5 autonomous networks. It integrates agent self-awareness to harmonize strategic governance with reflexive fault recovery, validated in a 5G Core environment achieving 86% reduction in Mean Time to Repair.