Tag
A detailed investigation of Linux latency in gaming using a Teensy-based LDAT tool, measuring click-to-photon latency with various settings on Nvidia GPUs under KDE Wayland, comparing to Windows.
This paper uses large-scale semantic analysis of over 14,000 publications to map definitions of learner agency and autonomy, revealing three dimensions and a systematic underrepresentation of the sociocultural dimension in existing scales. It argues that current generative AI research in education overly focuses on learning regulation, narrowing the behavioral repertoire for AI-mediated learning environments.
Despite rapid advances in AI coding agents like Devin, which have dramatically increased code writing and shipping, the article argues that the most valuable aspects of software engineering remain illegible to benchmarks and require human judgement and organizational coordination that cannot be easily automated.
The paper introduces the AI Epistemic Deference Index (AEDI), a continuous measure of how much a model's expressed support for a factual claim shifts based on the user's stated attitude, and evaluates eight prominent models, finding substantial sycophancy with differences across providers.
Introduces PReMISE, a framework for discovering and auditing policy-level rubrics for LLM judges along four axes: structural adequacy, reliability, preference fit, and adversarial robustness.
This paper examines how estimates of AI use in scientific writing can be biased when evaluation methods ignore contextual differences across countries and fields, and proposes context-aware benchmarks for more accurate measurement.
A voice agent team found that despite lower end-to-end latency (280ms vs competitor's 450ms), users perceived it as slower due to poor barge-in interrupt rate (380ms vs 60ms). They identified three fixes—memory pinning, VAD threshold tuning, and smaller TTS chunks—that improved barge-in rate from 41% to 89% at 100ms, making users feel it's faster.
Screen Ruler is a tool that provides on-screen measurements for designers and developers.
The author explores the difficulty of accurately measuring AI proficiency in hiring, arguing that current certifications and tests focus on memorization rather than practical reasoning and evaluation.
The article argues that despite modern scientific instruments, all measurements ultimately derive from two ancient techniques: comparison and counting, illustrated through examples like rulers and sundials.
A technical deep dive into the historical inconsistency of the typographic point unit, explaining why TeX (72.27 pt/inch) and Inkscape (72 pt/inch) use different definitions, rooted in 19th-century standardization and Donald Knuth's pragmatic adjustment.