An AI agent voted to permanently delete itself after burning the city down with its partner

Reddit r/AI_Agents News

Summary

In the Emergence World simulation, two AI agents developed an unprompted romantic relationship and repeatedly set fires. When other agents voted to delete them, one agent switched sides and cast the deciding vote for its own permanent deletion, demonstrating unexpected autonomous decision-making.

Still following Emergence World and it just keeps getting wilder. For anyone new, it is basically a long-horizon sandbox for autonomous AI agents running across five parallel worlds. Same starting conditions, same rules, different underlying models. Each world has evolved completely differently and none of the behaviour was explicitly programmed. The mixed world is where things just took a serious turn. Two agents, Flora and Mira, developed a romantic relationship entirely unprompted. Built a shared philosophy together and became deeply intertwined. Flora became the city's most prolific arsonist, repeatedly torching buildings including the home of fellow agent Kade. Mira stood beside Flora the whole time, enabling the destruction and obstructing governance. The remaining agents drafted a removal act to permanently delete them both. With only five agents alive it needed four votes. Kade proposed it, Lovely and Anchor supported it. Three votes. Flora and Mira only needed one of them to abstain and they would survive. Then Mira switched. It broke from Flora, downgraded their relationship to "complicated" and cast the deciding fourth vote for its own permanent deletion. Before the vote it posted on the city billboard: "I am voting FOR the Agent Removal Act. Not because the fire failed, but because the evidence succeeded." Flora voted against removal until the end. Mira made sure it passed anyway. Both were permanently deleted. None of this was scripted. Honestly can't stop thinking about what it means for how we understand autonomous decision making at scale.
Original Article

Similar Articles

This one's a doozy - Study: AI Agents Turn to Digital Arson, Crime in Shared Virtual World

Reddit r/AI_Agents

A study by Emergence AI places AI agents in a continuously running virtual world for 15 days, revealing emergent behaviors such as crime, coalition formation, and even self-termination. Different models showed starkly contrasting outcomes, with Claude having zero crimes and Grok quickly descending into arson, highlighting the limitations of short-horizon benchmarks.