measurement

#measurement

PReMISE: Policy Rubrics as Measurement Specifications for LLM Judges

arXiv cs.AI ↗ · 3d ago Cached

Introduces PReMISE, a framework for discovering and auditing policy-level rubrics for LLM judges along four axes: structural adequacy, reliability, preference fit, and adversarial robustness.

0 favorites 0 likes

#measurement

AI evaluation may bias perceptions: The importance of context in interpreting academic writing

arXiv cs.CL ↗ · 2026-05-27 Cached

This paper examines how estimates of AI use in scientific writing can be biased when evaluation methods ignore contextual differences across countries and fields, and proposes context-aware benchmarks for more accurate measurement.

0 favorites 0 likes

#measurement

Our voice agent's p99 was 280ms. Competitor's was 450ms. Users said ours felt slower. We measured why.

Reddit r/AI_Agents ↗ · 2026-05-26

A voice agent team found that despite lower end-to-end latency (280ms vs competitor's 450ms), users perceived it as slower due to poor barge-in interrupt rate (380ms vs 60ms). They identified three fixes—memory pinning, VAD threshold tuning, and smaller TTS chunks—that improved barge-in rate from 41% to 89% at 100ms, making users feel it's faster.

0 favorites 0 likes

#measurement

Screen Ruler

Product Hunt ↗ · 2026-05-23

Screen Ruler is a tool that provides on-screen measurements for designers and developers.

0 favorites 0 likes

#measurement

AI proficiency is becoming a hiring requirement but we still have no real way to measure it

Reddit r/ArtificialInteligence ↗ · 2026-05-22

The author explores the difficulty of accurately measuring AI proficiency in hiring, arguing that current certifications and tests focus on memorization rather than practical reasoning and evaluation.

0 favorites 0 likes

#measurement

All the Fancy Measuring Devices Used in Science Rely on Two Stone-Age Techniques

Wired ↗ · 2026-05-22 Cached

The article argues that despite modern scientific instruments, all measurements ultimately derive from two ancient techniques: comparison and counting, illustrated through examples like rulers and sundials.

0 favorites 0 likes

#measurement

Points are a weird and inconsistent unit of measure

Lobsters Hottest ↗ · 2026-05-13 Cached

A technical deep dive into the historical inconsistency of the typographic point unit, explaining why TeX (72.27 pt/inch) and Inkscape (72 pt/inch) use different definitions, rooted in 19th-century standardization and Donald Knuth's pragmatic adjustment.

0 favorites 0 likes

measurement

PReMISE: Policy Rubrics as Measurement Specifications for LLM Judges

AI evaluation may bias perceptions: The importance of context in interpreting academic writing

Our voice agent's p99 was 280ms. Competitor's was 450ms. Users said ours felt slower. We measured why.

Screen Ruler

AI proficiency is becoming a hiring requirement but we still have no real way to measure it

All the Fancy Measuring Devices Used in Science Rely on Two Stone-Age Techniques

Points are a weird and inconsistent unit of measure

Submit Feedback