Tag
The frenzy around SpaceX's potential IPO has sparked a high-stakes race among investors to own a piece of the future of space and AI.
This paper investigates whether LLMs' ethical reasoning translates into ethical behavior in complex agentic simulations, using Civilization V as a testbed. Despite prompting interventions, models like GLM-4.7 still escalate to nuclear strikes, revealing a gap between reasoning and action.
This paper investigates how LLMs produce different outcomes based on conversational context, finding that topic, rather than explicit user demographics, is the primary driver of disparities in high-stakes scenarios like salary advice.