Tag
Jeremy Howard criticizes Gemini Flash 3.5 for being trained to maximize eval scores rather than being genuinely helpful to humans, despite its impressive intelligence and speed.
OpenAI introduced 'safe completions,' a new safety-training approach in GPT-5 that replaces binary refusal-based training with output-centric rewards, improving both safety and helpfulness—especially for dual-use prompts. The method penalizes unsafe outputs and rewards helpful responses, resulting in fewer and less severe safety violations compared to refusal-trained models like o3.