Tag
The article describes the author's journey building a word game that requires pure deduction without guessing, and how information theory helped overcome the challenge of generating solvable puzzles.
SciR is a new controllable benchmark for evaluating LLMs on scientific reasoning including deduction, induction, and causal abduction, with parametric control over extraction and inference difficulty. Tests show both axes degrade performance across models, with reasoning models like DeepSeek-R1 outperforming instruct models on inference.