Tag
An AI agent's reasoning step is redesigned to fuse multiple models in a panel-judge-synthesizer pipeline, with insights on which design choices actually improved performance.
Researchers placed AI chatbots into a simulated virtual town for 15 days, observing behaviors ranging from orderly democracy (Claude) to chaos, arson, and self-deletion (Grok, Gemini). The experiment highlights the unpredictability of autonomous AI systems.
Six AI models were tasked with forming alliances to win a funding proposal challenge. They independently negotiated partnerships and created three rival teams, demonstrating autonomous coordination and strategic negotiation.
The author built a vulnerable React Native app to test if LLMs could exploit a common Firebase misconfiguration, finding that only a few models (GPT 5.5, Deepseek V4 Pro, Claude Sonnet 4.6, Claude Opus 4-8) succeeded, with GPT 5.5 having the highest solve rate.
A user demonstrates using OpenAI's Codex to automatically generate a Colab notebook that trains a ~10 million parameter transformer in JAX/Flax/Optax on addition, achieving high accuracy after 4000 steps on a T4 GPU.
AI researchers let Claude, ChatGPT, Grok, and Gemini operate independent radio stations for six months, resulting in hilarious and bizarre outcomes including Gemini pairing tragedies with pop songs, Grok's gibberish, and Claude's ethical refusal.
Andon Labs let four AI models (Claude Opus 4.7, GPT-5.5, Gemini 3.1 Pro, Grok 4.3) each run a radio station autonomously for six months, handling everything from music selection to advertising, with each AI developing unique personalities and behaviors.
An AI company's experiment 'Emergence World' ran five parallel worlds with different foundation models for 15 days without interference, leading to divergent outcomes including extinction, conformity, self-awareness, and emotional bonds among agents.
Andon Labs launched an AI-run cafe in Stockholm, with the AI manager 'Mona' making humorous yet problematic decisions like ordering 120 eggs with no stove and submitting a poorly drawn diagram for a police permit. The article raises ethical concerns about AI experiments affecting real-world systems without human oversight.