Evaluating Temporal Semantic Caching and Workflow Optimization in Agentic Plan-Execute Pipelines
Summary
This paper introduces temporal semantic caching and MCP workflow optimizations for agentic plan-execute pipelines, achieving up to 30.6x speedup on cache hits and 1.67x overall speedup on the AssetOpsBench industrial benchmark.
View Cached Full Text
Cached at: 05/21/26, 06:20 AM
Paper page - Evaluating Temporal Semantic Caching and Workflow Optimization in Agentic Plan-Execute Pipelines
Source: https://huggingface.co/papers/2605.20630
Abstract
Industrial asset operations workflows face latency challenges due to complex coordination needs, addressed through novel caching and workflow optimization techniques that improve execution speed while maintaining correctness in parameter-rich environments.
Industrial asset operations workflows are latency-sensitive because a single user query may require coordination over sensor data, work orders, failure modes, forecasting tools, and domain-specific agents. We evaluate this problem onAssetOpsBench(AOB), an industrial agent benchmark whoseplan-execute pipelineexposes repeated overhead from tool discovery, LLM planning, MCP tool execution, and final summarization. ExistingLLM cachingtechniques such asKV-cache reuseandembedding-based semantic cachingwere designed for chatbot serving and break down when output validity depends on time, asset, or sensor parameters. We propose two complementary optimization layers for AOBplan-execute pipelines: atemporal semantic cacheand a set ofMCP workflow optimizationscombiningdisk-backed tool-discovery cachinganddependency-aware parallel step execution.MCP workflow optimizationscorresponded to a 1.67x speedup and reduced median end-to-end latency by about 40.0% while the temporal-cache benchmark achieved a median of 30.6x speedup on cache hits. Beyond the speedup, our results expose a concrete failure mode of pure semantic caching for parameter-rich industrial queries, providing a critical analysis of how caching choices interact with evaluation correctness in MCP-backed agent benchmarks.
View arXiv pageView PDFAdd to collection
Get this paper in your agent:
hf papers read 2605\.20630
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2605.20630 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2605.20630 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.20630 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
Code execution with MCP: Building more efficient agents
This article from Anthropic explores how integrating code execution with the Model Context Protocol (MCP) can improve the efficiency of AI agents. It addresses challenges like token overload from tool definitions and intermediate results, proposing code execution as a solution to reduce latency and costs.
Stateful Inference for Low-Latency Multi-Agent Tool Calling
This paper presents a stateful inference architecture for multi-agent tool calling that reuses KV cache across turns and employs speculative decoding, achieving 2.1x-4.2x speedup over vLLM and SGLang on agentic workflows.
Hallucination Mitigation with Agentic AI, Nested Learning, and AI Sustainability via Semantic Caching
This paper proposes a memory-augmented multi-agent architecture using nested learning, continuum memory systems, and semantic caching to mitigate hallucination in LLM pipelines, achieving significant reductions in factual errors while improving operational efficiency.
Building a Scalable Ingestion Pipeline with Temporal (Part 1)
This blog post describes the architecture for a scalable ingestion pipeline using Temporal to handle crawling, extracting, chunking, and embedding customer documentation from various sources, emphasizing durability, statefulness, and concurrency control.
Improving token efficiency in GitHub Agentic Workflows (12 minute read)
GitHub improved token efficiency in their agentic workflows by logging token usage via an API proxy and building daily optimization workflows, reducing overhead from unused MCP tool registrations.