Tag
Peter Yang is interviewing Karan4D, co-founder of NousResearch's Hermes, and asks for topic suggestions from the community.
Browser Use v4 introduces a QA skill that allows your agent to test flows, catch bugs, and evaluate UI by clicking around as a user, closing the feedback loop for developers.
A tweet observes that people in front-end, design, and QA roles are losing jobs, reflecting the current scary situation in the tech industry.
QApilot's CoWork claims to triple mobile automation efficiency without expanding the QA team.
The author describes a voice agent call cut off at 600 seconds without warning, and proposes a testing approach to handle max duration gracefully, including pre-cutoff warnings and state preservation.
The article argues that traditional chatbot QA is broken because it only tests happy paths, and proposes using an AI-powered user simulator that attacks the bot with diverse personas and edge cases to find vulnerabilities before deployment.
Neural_avb releases a lightweight Answer-eq Reward Model for RL training on QA tasks, claiming 80% agreement with external judge LM and faster than F1/ROUGE/BertScore.
Antirez is consolidating community contributions from DwarfStar to improve Strix Halo support, with final QA and merge expected soon.
The article discusses using LLMs as automated QA engineers to perform manual testing tasks, such as integration and regression testing, potentially raising software quality bar.
The paper investigates whether the performance gains from rewriting retrieved passages in RAG QA pipelines are causally driven by the presence of the gold answer string in the rewritten context, using controlled intervention audits across multiple models and datasets.
A developer shares a workflow using Cursor's Opus 4.8 Max Thinking model with subagent harness, and introduces a GitHub repository with installable skill files for AI coding agents, including a 'running-bug-review-board' skill that performs live QA testing.
LazyCodex is a tool that automates QA using AI computer use, allowing developers to set up automated testing without manual intervention.
After leaving the workplace, the author is curious whether the workflow of QA in big companies remains the same—submitting a ticket after finding a bug—and believes that submitting a bug can itself be seen as a prompt for AI, so it might be better to directly let AI modify the code.
ActiveGraph introduces a deterministic non-generative approach for evidence compilation before semantic memory, achieving 85.6% QA accuracy and 86.2% turn answer-in-context on LongMemEval-S.
Yohei Nakajima ran the LongMemEval benchmark on ActiveGraph, achieving 85.6% QA accuracy and 86.2% turn answer-in-context, demonstrating the effectiveness of event-based agent systems for long-term memory.
This paper proposes claim-selective certification for high-risk medical retrieval-augmented generation (RAG), decomposing responses into verifiable claims and scoring them against evidence to produce actions (full, partial, conflict, abstain) using an intent-aware selector, achieving low unsupported-claim risk and high action accuracy.
A tweet shares a prompt that configures Composer 2.5 to act as a QA engineer, creating test documentation and bug reports for development phases.
Violin is an open-source end-to-end video translation and video Q&A tool, integrating ASR, LLM translation, and TTS. It supports style adjustment and content re-creation, and can answer questions about video content.