Distilling Examples into Task Instructions: Enhanced In-Context Learning for Real-World B2B Conversations
Summary
This paper introduces the Call Playbook dataset for classifying real-world B2B conversations and proposes methods to distill examples into compact, interpretable task instructions, achieving 99% token reduction and up to 7% AUC improvement over traditional in-context learning.
View Cached Full Text
Cached at: 06/16/26, 11:49 AM
# Distilling Examples into Task Instructions: Enhanced In-Context Learning for Real-World B2B Conversations
Source: [https://arxiv.org/abs/2606.15641](https://arxiv.org/abs/2606.15641)
[View PDF](https://arxiv.org/pdf/2606.15641)
> Abstract:In\-context learning \(ICL\) is the standard method for low\-resource classification, yet its efficacy in specialized domains remains largely unexplored\. We address the challenge of classifying semantically complex, multi\-party B2B conversations, where traditional ICL encounters significant limitations, especially as context length increases due to the concatenation of multiple few\-shot examples\. We introduce the \\texttt\{Call Playbook\} dataset, featuring five classification tasks derived from real\-world B2B conversations targeting core sales concepts\. To bridge the gap between performance and practical utility, we propose novel knowledge extraction methods that distill verbose examples into compact, interpretable representations of structured classification criteria and precise task descriptions\. Our approach achieves a 99\\% reduction in token usage and improves macro\-averaged AUC by up to 7\\% over traditional ICL\. Notably, it remains robust as context grows, unlike advanced token compression baselines which degrade by over 9 F1 points\. Importantly, our framework enables direct refinement of classification logic, addressing critical needs for transparency, efficiency, and user interaction in real\-world NLP applications\.
## Submission history
From: Guy Rotman \[[view email](https://arxiv.org/show-email/69457733/2606.15641)\] **\[v1\]**Sun, 14 Jun 2026 07:07:52 UTC \(472 KB\)Similar Articles
Conv-to-Bench: Evaluating Language Models Via User-Assistant Dialogues In Code Tasks
Conv-to-Bench is a multi-stage framework that automatically transforms multi-turn user-assistant dialogues into structured, verifiable requirement checklists for evaluating large language models on code tasks, achieving near-perfect alignment with human-authored benchmarks at lower computational cost.
Less Context, Better Agents: Efficient Context Engineering for Long-Horizon Tool-Using LLM Agents
This paper evaluates context engineering configurations for LLM agents in enterprise tool-use workflows, showing that summarization with selective pruning achieves 91.6% accuracy while reducing token usage by over 60% compared to full-context baselines.
Enhancing Target-Guided Proactive Dialogue Systems via Conversational Scenario Modeling and Intent-Keyword Bridging
This paper proposes a method to enhance target-guided proactive dialogue systems by jointly modeling user profiles and domain knowledge as conversational scenarios and employing intent-keyword bridging to predict future dialogue turns.
From History to State: Constant-Context Skill Learning for LLM Agents
This paper introduces 'constant-context skill learning,' a framework that moves procedural knowledge from prompts into model weights to reduce token usage and improve privacy for LLM agents. The method achieves strong performance on benchmarks like ALFWorld and WebShop while significantly reducing inference costs.
What and When to Distill: Selective Hindsight Distillation for Multi-Turn Agents
This paper presents the first systematic study of credit assignment in multi-turn LLM agents, introducing SERL, a selective environment-reweighted learning framework. SERL uses environment feedback to sharpen the RL objective on causally relevant actions, achieving 90.0% and 80.1% success rates on ALFWorld and WebShop respectively.