abstract-reasoning

#abstract-reasoning

World Models Meet Language Models: On the Complementarity of Concrete and Abstract Reasoning

Hugging Face Daily Papers ↗ · 2d ago Cached

This paper proposes Privileged-Future On-Policy Self-Distillation (PF-OPSD) for controlled concrete reasoning, combining world models' visual simulation with language models' abstract reasoning to improve prediction accuracy and robustness on two new benchmarks.

0 favorites 0 likes

#abstract-reasoning

GraphARC: A Comprehensive Benchmark for Graph-Based Abstract Reasoning

arXiv cs.AI ↗ · 3d ago Cached

GraphARC is a new benchmark for abstract reasoning on graph-structured data, extending the ARC paradigm to graphs. Evaluations of state-of-the-art language models reveal a comprehension-execution gap and performance degradation on larger instances, highlighting scaling challenges.

0 favorites 0 likes

#abstract-reasoning

A2RBench: An Automatic Paradigm for Formally Verifiable Abstract Reasoning Benchmark Generation

Hugging Face Daily Papers ↗ · 2026-05-17 Cached

Introduces A2RBench, an automated pipeline for generating formally verifiable abstract reasoning benchmarks for LLMs, using cycle consistency to ensure unique solutions, and reveals that current LLMs underperform humans significantly on 3D reasoning tasks.

0 favorites 0 likes

abstract-reasoning

World Models Meet Language Models: On the Complementarity of Concrete and Abstract Reasoning

GraphARC: A Comprehensive Benchmark for Graph-Based Abstract Reasoning

A2RBench: An Automatic Paradigm for Formally Verifiable Abstract Reasoning Benchmark Generation

Submit Feedback