Tag
MIT professor Gabriele Farina is advancing AI decision-making by combining game theory with machine learning, building on his earlier work with the diplomatic AI Cicero.
RefereeBench introduces the first large-scale benchmark with 925 curated sports videos and 6,475 QA pairs to evaluate whether video MLLMs can reliably act as multi-sport referees. Evaluation of state-of-the-art models shows current MLLMs fall short (≤60% accuracy), struggling with rule application and temporal grounding despite their generic video understanding capabilities.
Analysis of a recurring failure pattern in production AI systems where technically correct decisions become contextually wrong as underlying assumptions shift, framed as the 'Formalisation Trap' where meaning gets locked into outdated structures.
PangeAI is a product offering instant, agent-driven spatial analysis and decision-making capabilities.
MIT researchers introduce SEED-SET, a framework using LLMs to proactively evaluate the ethical alignment of autonomous systems in high-stakes scenarios like power distribution, addressing gaps in static testing methods.