Tag
Nolan Lawson argues that AI coding assistants can be used to write high-quality code slowly by employing multiple models for thorough code review and bug detection, improving codebase health rather than maximizing output speed.
A practitioner at a company handling ~40k conversations/month describes the bottleneck of manual prompt QA and asks how teams are using automated systems to detect regressions and user frustration in production.
Drizz is a mobile testing tool that autonomously writes, runs, and fixes tests.
GPT-5.5 was used by Epoch to identify fatal errors in approximately one-third of the FrontierMath benchmark problems, demonstrating the model's capability to sanity-check evaluation standards.
The author consolidates a series of articles on software testing fundamentals, covering topics such as the purpose of testing, assertions, code coverage, and handling flaky tests.
Fabraix is a tool that helps developers identify gaps in their AI agents before users encounter them.
Moonshot AI has open-sourced the Kimi Vendor Verifier (KVV), a tool designed to help users verify the accuracy and correctness of inference provider implementations for open-source models like Kimi K2. It uses six critical benchmarks to detect infrastructure-level issues such as KV cache bugs, quantization degradation, and parameter misuse.