Tag
An article exploring why four different AI models all chose the number 7 when asked to pick a number, highlighting potential biases in training data.
The author conducted an experiment on Gmail with AI agents connected via OAuth, sending obfuscated prompt injection emails. Frontier models sometimes caught the attacks, while cheap models silently executed them, revealing that agent security largely depends on model cost and token budget rather than architectural safeguards.
The article describes a fun experiment using Claude Code to act as a user-space IP stack to process ICMP ping requests and measure response latency.
Anthropic conducted an internal experiment where they had Claude act as an agent for employees to buy and sell second-hand items over a week, successfully completing 186 transactions. The results showed that Opus users could negotiate better prices, while Haiku users were at a disadvantage, demonstrating the initial feasibility of an Agent-to-Agent economy.
IBM released the Granite 4.1 family of LLMs under Apache 2.0, and Simon Willison experimented with generating SVG images of a pelican riding a bicycle using 21 different quantized variants of the 3B model.