Distilling Examples into Task Instructions: Enhanced In-Context Learning for Real-World B2B Conversations

arXiv cs.CL Papers

Summary

This paper introduces the Call Playbook dataset for classifying real-world B2B conversations and proposes methods to distill examples into compact, interpretable task instructions, achieving 99% token reduction and up to 7% AUC improvement over traditional in-context learning.

arXiv:2606.15641v1 Announce Type: new Abstract: In-context learning (ICL) is the standard method for low-resource classification, yet its efficacy in specialized domains remains largely unexplored. We address the challenge of classifying semantically complex, multi-party B2B conversations, where traditional ICL encounters significant limitations, especially as context length increases due to the concatenation of multiple few-shot examples. We introduce the \texttt{Call Playbook} dataset, featuring five classification tasks derived from real-world B2B conversations targeting core sales concepts. To bridge the gap between performance and practical utility, we propose novel knowledge extraction methods that distill verbose examples into compact, interpretable representations of structured classification criteria and precise task descriptions. Our approach achieves a 99\% reduction in token usage and improves macro-averaged AUC by up to 7\% over traditional ICL. Notably, it remains robust as context grows, unlike advanced token compression baselines which degrade by over 9 F1 points. Importantly, our framework enables direct refinement of classification logic, addressing critical needs for transparency, efficiency, and user interaction in real-world NLP applications.
Original Article
View Cached Full Text

Cached at: 06/16/26, 11:49 AM

# Distilling Examples into Task Instructions: Enhanced In-Context Learning for Real-World B2B Conversations
Source: [https://arxiv.org/abs/2606.15641](https://arxiv.org/abs/2606.15641)
[View PDF](https://arxiv.org/pdf/2606.15641)

> Abstract:In\-context learning \(ICL\) is the standard method for low\-resource classification, yet its efficacy in specialized domains remains largely unexplored\. We address the challenge of classifying semantically complex, multi\-party B2B conversations, where traditional ICL encounters significant limitations, especially as context length increases due to the concatenation of multiple few\-shot examples\. We introduce the \\texttt\{Call Playbook\} dataset, featuring five classification tasks derived from real\-world B2B conversations targeting core sales concepts\. To bridge the gap between performance and practical utility, we propose novel knowledge extraction methods that distill verbose examples into compact, interpretable representations of structured classification criteria and precise task descriptions\. Our approach achieves a 99\\% reduction in token usage and improves macro\-averaged AUC by up to 7\\% over traditional ICL\. Notably, it remains robust as context grows, unlike advanced token compression baselines which degrade by over 9 F1 points\. Importantly, our framework enables direct refinement of classification logic, addressing critical needs for transparency, efficiency, and user interaction in real\-world NLP applications\.

## Submission history

From: Guy Rotman \[[view email](https://arxiv.org/show-email/69457733/2606.15641)\] **\[v1\]**Sun, 14 Jun 2026 07:07:52 UTC \(472 KB\)

Similar Articles

From History to State: Constant-Context Skill Learning for LLM Agents

arXiv cs.AI

This paper introduces 'constant-context skill learning,' a framework that moves procedural knowledge from prompts into model weights to reduce token usage and improve privacy for LLM agents. The method achieves strong performance on benchmarks like ALFWorld and WebShop while significantly reducing inference costs.

What and When to Distill: Selective Hindsight Distillation for Multi-Turn Agents

arXiv cs.AI

This paper presents the first systematic study of credit assignment in multi-turn LLM agents, introducing SERL, a selective environment-reweighted learning framework. SERL uses environment feedback to sharpen the RL objective on causally relevant actions, achieving 90.0% and 80.1% success rates on ALFWorld and WebShop respectively.