Tag
Antirez is consolidating community contributions from DwarfStar to improve Strix Halo support, with final QA and merge expected soon.
The paper investigates whether the performance gains from rewriting retrieved passages in RAG QA pipelines are causally driven by the presence of the gold answer string in the rewritten context, using controlled intervention audits across multiple models and datasets.
A developer shares a workflow using Cursor's Opus 4.8 Max Thinking model with subagent harness, and introduces a GitHub repository with installable skill files for AI coding agents, including a 'running-bug-review-board' skill that performs live QA testing.
LazyCodex is a tool that automates QA using AI computer use, allowing developers to set up automated testing without manual intervention.
After leaving the workplace, the author is curious whether the workflow of QA in big companies remains the same—submitting a ticket after finding a bug—and believes that submitting a bug can itself be seen as a prompt for AI, so it might be better to directly let AI modify the code.
ActiveGraph introduces a deterministic non-generative approach for evidence compilation before semantic memory, achieving 85.6% QA accuracy and 86.2% turn answer-in-context on LongMemEval-S.
Yohei Nakajima ran the LongMemEval benchmark on ActiveGraph, achieving 85.6% QA accuracy and 86.2% turn answer-in-context, demonstrating the effectiveness of event-based agent systems for long-term memory.
This paper proposes claim-selective certification for high-risk medical retrieval-augmented generation (RAG), decomposing responses into verifiable claims and scoring them against evidence to produce actions (full, partial, conflict, abstain) using an intent-aware selector, achieving low unsupported-claim risk and high action accuracy.
A tweet shares a prompt that configures Composer 2.5 to act as a QA engineer, creating test documentation and bug reports for development phases.
Violin is an open-source end-to-end video translation and video Q&A tool, integrating ASR, LLM translation, and TTS. It supports style adjustment and content re-creation, and can answer questions about video content.