Tag
Internal testing of DiffusionGemma reveals significant performance differences between H100 and A100 GPUs under real-world workloads, with H100s scaling much better under concurrency, and efficiency varying greatly depending on workload type, raising questions about benchmark reliability.
Early tests and leaked information indicate that the 3.5 Pro model has delivered disappointing results, falling short of expectations.
A small, dependency-free Python CLI tool that runs the same prompt against your local Ollama models and saves every response to disk, making it easy to compare models side by side.
Japanese banks are getting early access to a new OpenAI model for security testing, reportedly comparable to Anthropic's Claude Mythos.
OpenBMB introduced BitCPM-CANN, a 1.58-bit model being tested on Huawei Ascend 910B hardware.
A developer tests the MiniMax M2.7 model via its API on three practical machine learning and coding workflows, evaluating its performance.
OpenAI publishes a white paper detailing their approach to external red teaming for AI models, outlining methods for selecting diverse red team members, determining model access levels, providing testing infrastructure, and synthesizing feedback to improve AI safety and policy coverage.