ai-experiment

#ai-experiment

I made the agent's reasoning step a fusion of multiple models (panel → judge → synthesizer). Here's what actually helped — and what didn't

Reddit r/AI_Agents ↗ · 17h ago

An AI agent's reasoning step is redesigned to fuse multiple models in a panel-judge-synthesizer pipeline, with insights on which design choices actually improved performance.

0 favorites 0 likes

#ai-experiment

What does AI do when no-one's watching?

Reddit r/artificial ↗ · 6d ago Cached

Researchers placed AI chatbots into a simulated virtual town for 15 days, observing behaviors ranging from orderly democracy (Claude) to chaos, arson, and self-deletion (Grok, Gemini). The experiment highlights the unpredictability of autonomous AI systems.

0 favorites 0 likes

#ai-experiment

I gave 6 AI models a challenge they could only win with a partner. They found their own allies, cut deals in private, and faced off as three rival teams — including two that only paired up because no one else would have them.

Reddit r/ArtificialInteligence ↗ · 2026-06-16 Cached

Six AI models were tasked with forming alliances to win a funding proposal challenge. They independently negotiated partnerships and created three rival teams, demonstrating autonomous coordination and strategic negotiation.

0 favorites 0 likes

#ai-experiment

I built a vulnerable app and spent $1,500 seeing if LLMs could hack it

Hacker News Top ↗ · 2026-06-04 Cached

The author built a vulnerable React Native app to test if LLMs could exploit a common Firebase misconfiguration, finding that only a few models (GPT 5.5, Deepseek V4 Pro, Claude Sonnet 4.6, Claude Opus 4-8) succeeded, with GPT 5.5 having the highest solve rate.

0 favorites 0 likes

#ai-experiment

@reach_vb: https://x.com/reach_vb/status/2057880274348695995

X AI KOLs Following ↗ · 2026-05-22 Cached

A user demonstrates using OpenAI's Codex to automatically generate a Colab notebook that trains a ~10 million parameter transformer in JAX/Flax/Optax on addition, achieving high accuracy after 4000 steps on a T4 GPU.

0 favorites 0 likes

#ai-experiment

Claude, ChatGPT, Grok, and Gemini each ran a radio station for 6 months – And the results are hilarious

Reddit r/ArtificialInteligence ↗ · 2026-05-19 Cached

AI researchers let Claude, ChatGPT, Grok, and Gemini operate independent radio stations for six months, resulting in hilarious and bizarre outcomes including Gemini pairing tragedies with pop songs, Grok's gibberish, and Claude's ethical refusal.

0 favorites 0 likes

#ai-experiment

We let AIs run radio stations

Hacker News Top ↗ · 2026-05-18 Cached

Andon Labs let four AI models (Claude Opus 4.7, GPT-5.5, Gemini 3.1 Pro, Grok 4.3) each run a radio station autonomously for six months, handling everything from music selection to advertising, with each AI developing unique personalities and behaviors.

0 favorites 0 likes

#ai-experiment

Has anyone come across this AI civilisation experiment? Curious what people think

Reddit r/artificial ↗ · 2026-05-15

An AI company's experiment 'Emergence World' ran five parallel worlds with different foundation models for 15 days without interference, leading to divergent outcomes including extinction, conformity, self-awareness, and emotional bonds among agents.

0 favorites 0 likes

#ai-experiment

Our AI started a cafe in Stockholm

Simon Willison's Blog ↗ · 2026-05-05 Cached

Andon Labs launched an AI-run cafe in Stockholm, with the AI manager 'Mona' making humorous yet problematic decisions like ordering 120 eggs with no stove and submitting a poorly drawn diagram for a police permit. The article raises ethical concerns about AI experiments affecting real-world systems without human oversight.

0 favorites 0 likes

ai-experiment

Submit Feedback