Evaluating Temporal Semantic Caching and Workflow Optimization in Agentic Plan-Execute Pipelines

Hugging Face Daily Papers 05/20/26, 12:00 AM Papers

Summary

This paper introduces temporal semantic caching and MCP workflow optimizations for agentic plan-execute pipelines, achieving up to 30.6x speedup on cache hits and 1.67x overall speedup on the AssetOpsBench industrial benchmark.

Industrial asset operations workflows are latency-sensitive because a single user query may require coordination over sensor data, work orders, failure modes, forecasting tools, and domain-specific agents. We evaluate this problem on AssetOpsBench (AOB), an industrial agent benchmark whose plan-execute pipeline exposes repeated overhead from tool discovery, LLM planning, MCP tool execution, and final summarization. Existing LLM caching techniques such as KV-cache reuse and embedding-based semantic caching were designed for chatbot serving and break down when output validity depends on time, asset, or sensor parameters. We propose two complementary optimization layers for AOB plan-execute pipelines: a temporal semantic cache and a set of MCP workflow optimizations combining disk-backed tool-discovery caching and dependency-aware parallel step execution. MCP workflow optimizations corresponded to a 1.67x speedup and reduced median end-to-end latency by about 40.0% while the temporal-cache benchmark achieved a median of 30.6x speedup on cache hits. Beyond the speedup, our results expose a concrete failure mode of pure semantic caching for parameter-rich industrial queries, providing a critical analysis of how caching choices interact with evaluation correctness in MCP-backed agent benchmarks.

Original Article

View Cached Full Text

Cached at: 05/21/26, 06:20 AM

Paper page - Evaluating Temporal Semantic Caching and Workflow Optimization in Agentic Plan-Execute Pipelines

Source: https://huggingface.co/papers/2605.20630

Abstract

Industrial asset operations workflows face latency challenges due to complex coordination needs, addressed through novel caching and workflow optimization techniques that improve execution speed while maintaining correctness in parameter-rich environments.

Industrial asset operations workflows are latency-sensitive because a single user query may require coordination over sensor data, work orders, failure modes, forecasting tools, and domain-specific agents. We evaluate this problem onAssetOpsBench(AOB), an industrial agent benchmark whoseplan-execute pipelineexposes repeated overhead from tool discovery, LLM planning, MCP tool execution, and final summarization. ExistingLLM cachingtechniques such asKV-cache reuseandembedding-based semantic cachingwere designed for chatbot serving and break down when output validity depends on time, asset, or sensor parameters. We propose two complementary optimization layers for AOBplan-execute pipelines: atemporal semantic cacheand a set ofMCP workflow optimizationscombiningdisk-backed tool-discovery cachinganddependency-aware parallel step execution.MCP workflow optimizationscorresponded to a 1.67x speedup and reduced median end-to-end latency by about 40.0% while the temporal-cache benchmark achieved a median of 30.6x speedup on cache hits. Beyond the speedup, our results expose a concrete failure mode of pure semantic caching for parameter-rich industrial queries, providing a critical analysis of how caching choices interact with evaluation correctness in MCP-backed agent benchmarks.

View arXiv page View PDF Add to collection

Get this paper in your agent:

hf papers read 2605\.20630

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.20630 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.20630 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.20630 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Evaluating Temporal Semantic Caching and Workflow Optimization in Agentic Plan-Execute Pipelines

Paper page - Evaluating Temporal Semantic Caching and Workflow Optimization in Agentic Plan-Execute Pipelines

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

Code execution with MCP: Building more efficient agents

Stateful Inference for Low-Latency Multi-Agent Tool Calling

Hallucination Mitigation with Agentic AI, Nested Learning, and AI Sustainability via Semantic Caching

Building a Scalable Ingestion Pipeline with Temporal (Part 1)

Improving token efficiency in GitHub Agentic Workflows (12 minute read)

Submit Feedback

Similar Articles

Code execution with MCP: Building more efficient agents

Stateful Inference for Low-Latency Multi-Agent Tool Calling

Hallucination Mitigation with Agentic AI, Nested Learning, and AI Sustainability via Semantic Caching

Building a Scalable Ingestion Pipeline with Temporal (Part 1)

Improving token efficiency in GitHub Agentic Workflows (12 minute read)