Tag
The BEAMS Initiative presents a benchmark suite for evaluating AI tools in modeling and simulation, focusing on human-centered and responsible AI practices. Tests reveal variability across LLM-based engines, with better performance in qualitative tasks than causal reasoning.
A study demonstrates that simply changing the formatting (prose vs bullet points) of a persona prompt dramatically flips an LLM's behavior in a Prisoner's Dilemma, from 96% cooperation to 20%, illustrating extreme sensitivity to format despite identical content (p < 0.001).
Project Genie, Google's general-purpose world model, now integrates with Street View to create interactive environments based on real places, available to Google AI Ultra subscribers.
DeepMind announces Genie 3, a general-purpose world model capable of generating interactive environments from text prompts at 24fps in 720p with improved consistency and real-time interactivity compared to previous versions.