Many-Shot CoT-ICL: Making In-Context Learning Truly Learn

Hugging Face Daily Papers 05/13/26, 12:00 AM Papers

Summary

This paper investigates many-shot chain-of-thought in-context learning for reasoning tasks, revealing that standard scaling rules do not transfer and proposing Curvilinear Demonstration Selection (CDS) for improved ordering, achieving up to 5.42 percentage-point gain.

In-context learning (ICL) adapts large language models (LLMs) to new tasks by conditioning on demonstrations in the prompt without parameter updates. With long-context models, many-shot ICL can use dozens to hundreds of examples and achieve performance comparable to fine-tuning, yet current understanding of its scaling behavior is largely derived from non-reasoning tasks. We study many-shot chain-of-thought in-context learning (CoT-ICL) for reasoning and show that standard many-shot rules do not transfer. Across non-reasoning and reasoning-oriented LLMs and across non-reasoning and reasoning tasks, we find: (i) a setting-dependent scaling effect, where increasing the number of CoT demonstrations is unstable for non-reasoning LLMs and benefits mainly reasoning-oriented LLMs; (ii) similarity-based retrieval helps on non-reasoning tasks but fails on reasoning, since semantic similarity poorly predicts procedural (i.e., CoT) compatibility; and (iii) an order-scaling effect, where performance variance grows with more CoT demonstrations. We interpret these behaviors by viewing many-shot CoT-ICL as in-context test-time learning rather than scaled pattern matching, and suggests two principles: (i) demonstrations should be easy for the target model to understand, and (ii) they should be ordered to support a smooth conceptual progression. Guided by the principle, we propose Curvilinear Demonstration Selection (CDS), a simple ordering method that yields up to a 5.42 percentage-point gain on geometry with 64 demonstrations. Overall, our results reframe the long context window from a retrieval buffer into a structured curriculum for in-context test-time learning.

Original Article

View Cached Full Text

Cached at: 05/14/26, 04:16 AM

Paper page - Many-Shot CoT-ICL: Making In-Context Learning Truly Learn

Source: https://huggingface.co/papers/2605.13511 Published on May 13

Submitted byhttps://huggingface.co/ttchungc

Cindyon May 14

Abstract

Many-shot in-context learning for reasoning tasks exhibits different scaling behaviors than non-reasoning tasks, with demonstration ordering and selection significantly impacting performance.

In-context learning(ICL) adaptslarge language models(LLMs) to new tasks by conditioning on demonstrations in the prompt without parameter updates. With long-context models, many-shot ICL can use dozens to hundreds of examples and achieve performance comparable to fine-tuning, yet current understanding of its scaling behavior is largely derived from non-reasoning tasks. We study many-shotchain-of-thought in-context learning(CoT-ICL) for reasoning and show that standard many-shot rules do not transfer. Across non-reasoning and reasoning-oriented LLMs and across non-reasoning and reasoning tasks, we find: (i) a setting-dependent scaling effect, where increasing the number of CoT demonstrations is unstable for non-reasoning LLMs and benefits mainly reasoning-oriented LLMs; (ii) similarity-based retrieval helps on non-reasoning tasks but fails on reasoning, since semantic similarity poorly predicts procedural (i.e., CoT) compatibility; and (iii) an order-scaling effect, where performance variance grows with more CoT demonstrations. We interpret these behaviors by viewing many-shot CoT-ICL as in-contexttest-time learningrather than scaled pattern matching, and suggests two principles: (i) demonstrations should be easy for the target model to understand, and (ii) they should be ordered to support a smooth conceptual progression. Guided by the principle, we propose CurvilinearDemonstration Selection(CDS), a simple ordering method that yields up to a 5.42 percentage-point gain on geometry with 64 demonstrations. Overall, our results reframe the long context window from a retrieval buffer into a structured curriculum for in-contexttest-time learning.

View arXiv page View PDF Add to collection

Get this paper in your agent:

hf papers read 2605\.13511

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.13511 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.13511 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.13511 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Many-Shot CoT-ICL: Making In-Context Learning Truly Learn

Paper page - Many-Shot CoT-ICL: Making In-Context Learning Truly Learn

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

Long-Context Reasoning Through Proxy-Based Chain-of-Thought Tuning

ACIL: Auto Chain of Thoughts for In-Context Learning

Revisiting Chain-of-Thought Reasoning under Limited Supervision: Semi-supervised Chain-of-Thought Learning

LC-ICL: Label-Guided Contrastive In-Context Learning for Robust Information Extraction

Self-Improving In-Context Learning

Submit Feedback

Similar Articles

Long-Context Reasoning Through Proxy-Based Chain-of-Thought Tuning

ACIL: Auto Chain of Thoughts for In-Context Learning

Revisiting Chain-of-Thought Reasoning under Limited Supervision: Semi-supervised Chain-of-Thought Learning

LC-ICL: Label-Guided Contrastive In-Context Learning for Robust Information Extraction

Self-Improving In-Context Learning