A Benchmark and Framework for Evaluating Next Action Predictions in Spreadsheets

Hugging Face Daily Papers 06/11/26, 12:00 AM Papers

benchmark spreadsheet next-action-prediction llm fine-tuning evaluation

Summary

This paper introduces a benchmark for predicting spreadsheet user actions, addressing challenges in edit history availability and complex action spaces through manual curation and online evaluation methodology.

Predictive code completion greatly accelerates how quickly developers work. In spreadsheets, despite being much more common, such auto-completion features are virtually non-existent. To address this gap, we introduce a benchmark for systems that observe a sequence of user actions in a spreadsheet and predict future actions. Two challenges are (1) the absence of edit histories in public spreadsheet corpora and (2) the complex space of spreadsheet actions (spatial, temporal, composite). To address (1), we manually curate 52 sequences of 12K actions that recreate spreadsheets from public corpora, seeded by parametrized heuristics and LLM refinement. To address (2), we propose an online evaluation that expects a prediction after each user action, accepts or rejects that prediction, updates the future actions upon acceptance, and repeats this until the target spreadsheet is obtained. We use multiple baseline predictors (including zero-shot LLMs, fine-tuned SLMs, and classical models) and analyze different properties that our benchmark teaches us, including but not limited to: properties of saved actions and false positives, efficiency, effect of user profiles, effect of triggers, and effect of context.

Original Article

View Cached Full Text

Cached at: 06/18/26, 03:58 PM

Paper page - A Benchmark and Framework for Evaluating Next Action Predictions in Spreadsheets

Source: https://huggingface.co/papers/2606.13802

Abstract

A benchmark for predicting spreadsheet user actions is introduced, addressing challenges in edit history availability and complex action spaces through manual curation and online evaluation methodology.

Predictivecodecompletiongreatlyaccelerateshowquicklydeveloperswork.Inspreadsheets,despitebeingmuchmorecommon,suchauto-completionfeaturesarevirtuallynon-existent.Toaddressthisgap,weintroduceabenchmarkforsystemsthatobserveasequenceofuseractionsinaspreadsheetandpredictfutureactions.Twochallengesare(1)theabsenceofedithistoriesinpublicspreadsheetcorporaand(2)thecomplexspaceofspreadsheetactions(spatial,temporal,composite).Toaddress(1),wemanuallycurate52sequencesof12Kactionsthatrecreatespreadsheetsfrompubliccorpora,seededbyparametrizedheuristicsandLLMrefinement.Toaddress(2),weproposeanonlineevaluationthatexpectsapredictionaftereachuseraction,acceptsorrejectsthatprediction,updatesthefutureactionsuponacceptance,andrepeatsthisuntilthetargetspreadsheetisobtained.Weusemultiplebaselinepredictors(includingzero-shotLLMs,fine-tunedSLMs,andclassicalmodels)andanalyzedifferentpropertiesthatourbenchmarkteachesus,includingbutnotlimitedto:propertiesofsavedactionsandfalsepositives,efficiency,effectofuserprofiles,effectoftriggers,andeffectofcontext.

View arXiv page View PDF Project page GitHub0 Add to collection

Get this paper in your agent:

hf papers read 2606\.13802

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2606.13802 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2606.13802 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2606.13802 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

A Benchmark and Framework for Evaluating Next Action Predictions in Spreadsheets

Paper page - A Benchmark and Framework for Evaluating Next Action Predictions in Spreadsheets

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

PreAct-Bench: Benchmarking Predictive Monitoring in LLMs

BehaviorBench: Modeling Real-World User Decisions from Behavioral Traces

TabClaw: An Interactive and Self-Evolving Agent for Spreadsheet Manipulation and Table Reasoning

From Heuristics to Analytics: Forecasting Effort and Progress in Online Learning

ForecastBench-Sim: A Simulated-World Forecasting Benchmark

Submit Feedback

Similar Articles

PreAct-Bench: Benchmarking Predictive Monitoring in LLMs

BehaviorBench: Modeling Real-World User Decisions from Behavioral Traces

TabClaw: An Interactive and Self-Evolving Agent for Spreadsheet Manipulation and Table Reasoning

From Heuristics to Analytics: Forecasting Effort and Progress in Online Learning

ForecastBench-Sim: A Simulated-World Forecasting Benchmark