Tag
This paper presents a method using LLMs for stance detection in scientific discourse, specifically identifying realism vs. instrumentalism in Bayesian cognitive science articles. The approach combines theory-driven coding, expert annotations, and prompt optimization to achieve high reliability.
This paper introduces a scale-conditioned evaluation protocol for agent memory, analyzing how reliability degrades as irrelevant sessions accumulate. It identifies specific failure regimes and usable-scale boundaries across different memory interfaces and LLMs.