How AI Agents Reshape Knowledge Work: Autonomy, Efficiency, and Scope

arXiv cs.AI 06/08/26, 04:00 AM Papers
ai-agents knowledge-work autonomy productivity perplexity empirical-study
Summary
This study uses production data from Perplexity to compare AI agents versus conversational assistants, finding that agents reduce completion time by 87% and costs by 94% while expanding the scope and quality of knowledge work.
arXiv:2606.07489v1 Announce Type: new Abstract: Frontier AI systems are bridging the gap between intelligence and utility by shifting from conversational assistants to autonomous agents that execute tasks end to end. Using production data from Perplexity's Search and Computer products, we study this transition by examining how AI agents accelerate and reshape knowledge work. Three key empirical findings emerge. First, using sessions with near-identical initial query pairs as natural experiments for the same underlying task attempted with both products, Computer performs 26 minutes of autonomous work per user session, versus 33 seconds for Search. Computer automates task decomposition and execution that Search users might otherwise manually orchestrate and implement. As a result, Computer shifts follow-up query distribution toward higher-order work such as verification and extension. Autonomy also increases execution quality, with per-query dissatisfaction rates 55% lower on Computer than on Search. Second, due to its autonomy advantage, Computer reduces completion time from 269 to 36 minutes on matched tasks, lowering estimated time and cost by 87% and 94%, respectively, compared to humans equipped with Search alone. Third, Computer changes the scope of work that users attempt: Computer queries more often cross occupational boundaries, require higher-order cognition, draw on broader expertise, take the form of composite tasks that bundle interdependent subtasks into a single query, and unlock work activities that are essentially absent from Search usage among the same users. Together, the evidence indicates that AI agents accelerate workflows, enhance output quality, reduce costs, and expand the breadth and depth of automated work.
Original Article
View Cached Full Text
Cached at: 06/08/26, 09:15 AM
# How AI Agents Reshape Knowledge Work: Autonomy, Efficiency, and ScopeCorrespondence to Jeremy Yang ([email protected]) and Jerry Ma ([email protected]).
Source: [https://arxiv.org/html/2606.07489](https://arxiv.org/html/2606.07489)
\\undefine@key

newfloatplacement\\undefine@keynewfloatname\\undefine@keynewfloatfileext\\undefine@keynewfloatwithin

###### Abstract

Frontier AI systems are bridging the gap between intelligence and utility by shifting from conversational assistants to autonomous agents that execute tasks end to end\. Using production data from Perplexity’s Search and Computer products, we study this transition by examining how AI agents accelerate and reshape knowledge work\. We adopt an individual\-level task\-based framework where agents have a higher fixed delegation cost but a lower marginal execution cost per step\. This framework predicts that agent access expands the affordable task frontier toward weakly higher\-value tasks and weakly increases realized value; when the pre\-agent budget binds, surplus and the value\-to\-cost ratio also weakly increase\. Turning to the data, three key empirical findings emerge\. First, using sessions with near\-identical initial query pairs as natural experiments for the same underlying task attempted with both products, Computer performs 26 minutes of autonomous work per user session, versus 33 seconds for Search\. Computer automates task decomposition and execution that Search users might otherwise manually orchestrate and implement\. As a result, Computer shifts follow\-up query distribution toward higher\-order work such as verification and extension\. Autonomy also increases execution quality, with per\-query dissatisfaction rates 55% lower on Computer than on Search\. Second, due to its autonomy advantage, Computer reduces completion time from 269 to 36 minutes on matched tasks, lowering estimated time and cost by 87% and 94%, respectively, compared to humans equipped with Search alone\. Third, Computer changes the scope of work that users attempt: Computer queries more often cross occupational boundaries, require higher\-order cognition, draw on broader expertise, take the form of composite tasks that bundle interdependent subtasks into a single query, and unlock work activities that are essentially absent from Search usage among the same users\. Together, the evidence indicates that AI agents accelerate workflows, enhance output quality, reduce costs, and expand the breadth and depth of automated work\.

## 1Introduction

A central question in the economics of AI is how it reshapes knowledge work\(agrawal2026economics\)\. As model capabilities advance, AI products are closing the gap between intelligence and utility, changing how they are integrated into real\-world workflows and creating new sources of value and new structures of work\. AI and user behavior are also co\-evolving, producing a shifting landscape of what AI can do, how AI is used, and what its downstream economic impact is\.

Over the last few years, frontier products have progressed from conversational assistants to copilots to agents\. Conversational assistants \(e\.g\., chatbots\) primarily support isolated information exchange with limited context or ability to act\. Copilots embed these capabilities into existing tools and workflows, co\-working with users to complete tasks within those tools’ interfaces\. Agents go further: they connect across a wider range of tools in the backend and return completed artifacts with little human involvement\. The shift is from AI as a conversational assistant to AI as an end\-to\-end work execution engine, characterized by greater autonomy and deeper integration into the user’s entire digital environment\.

We use data from Perplexity to study the implications of this transition by comparing how knowledge work is completed with conversational assistants versus agents\. As background, Figure[1](https://arxiv.org/html/2606.07489#S1.F1)situates Perplexity’s product portfolio within a two\-dimensional space of autonomy and context\. Autonomy captures the extent to which a system can plan and execute actions on behalf of the user with minimal human intervention\. Context integration captures the extent to which the system can read from and write to the user’s environment, including external tools and connected services\. We use three Perplexity products to illustrate the broader landscape:

- •Perplexity Searchrepresents the baseline\. Released in 2022, Perplexity Search introduced the*answer engine*product category: it allows a user to ask a question and receive a cited, synthesized answer from a knowledge base comprising billions of documents\.
- •Comet Assistantrepresents an advance in both autonomy and context\. In 2025, Perplexity released the Comet web browser\. Its flagship feature, Comet Assistant, is an agent that helps users access knowledge and perform work within their browser\. Comet Assistant makes human\-AI integration more continuous by moving interactions into the application layer, where much knowledge work already occurs, allowing AI to co\-work with the user by reasoning over and acting upon open\-world web environments\.
- •Perplexity Computeradvances even further across both autonomy and context\. Released in 2026, Computer is a general\-purpose agent orchestration system that performs work across increasingly broad environments and long horizons\. A Computer user specifies an outcome, and the system autonomously searches, browses, codes, creates documents, accesses external services, delegates work to subordinate agents, and persists in these efforts until the outcome is fulfilled via real\-world actions or deliverables\.

![Refer to caption](https://arxiv.org/html/2606.07489v1/x1.png)Figure 1:AI product progression by autonomy and workflow\-context integration\. Perplexity’s Search represents the baseline for information retrieval and synthesis; Comet Assistant introduces deeper context integration and execution on top of an interactive browser interface; Computer combines long\-horizon asynchronous execution with even deeper and broader context integration as an agent orchestrator\.Our paper provides the first field evidence on task\-level economic implications of the shift from conversational assistants to agent orchestration across a wide spectrum of knowledge work\. We begin by defining a simple individual\-level task\-based framework to highlight the key economic forces and ground our empirical analysis\. Each task is indexed by its required step count, with each step representing an atomic unit of work and longer tasks generating weakly greater value\. The model centers on a shift in the cost structure: relative to conversational assistants, agents reduce marginal costs per step by replacing manual execution with autonomous implementation, but impose higher fixed costs through delegation and verification\. The framework predicts that agent access expands the affordable task frontier toward weakly higher\-value tasks and weakly increases total realized value; when the pre\-agent budget binds, gains in total surplus and the aggregate value\-to\-cost ratio follow as corollaries\. We then connect the framework to our empirical setting by mapping Search and Computer to the conversational assistant and agent categories respectively\. We organize the empirical analysis around four themes:

1. 1\.Adoption\(Section[5](https://arxiv.org/html/2606.07489#S5)\)\. Computer grew rapidly: cumulative queries reached 84×\\timesof their first\-week total over the three\-month study window \(February 27 through May 27, 2026\)\. A random sample of 100,000 classified queries further characterizes the use\-case distribution: Research & Analysis \(25\.8%\) and Document & Asset Creation \(18\.6%\) dominate, with structured artifacts \(e\.g\., documents, websites, codebases, spreadsheets\) accounting for roughly a third of expected outputs\.
2. 2\.Autonomy\(Section[6](https://arxiv.org/html/2606.07489#S6)\)\. Because the same users interact with both products over the same period, we leverage matched sessions as natural experiments to control for user and task heterogeneity\. In 10,000 session pairs with near\-identical initial queries \(cosine similarity\>0\.99\>0\.99\), Computer performs 26 minutes of autonomous planning and execution per session versus 33 seconds for Search, a 48×\\timesincrease in machine work\. Classification of follow\-up queries from 1,000 matched multi\-turn sessions reveals that Computer replaces manual directives with task verification and extension\. Higher autonomy is achieved without sacrificing quality: on the next\-turn dissatisfaction signal, Computer elicits medium\-to\-high dissatisfaction on 1\.3% of queries versus 2\.9% for Search, a 55% reduction\.
3. 3\.Efficiency\(Section[7](https://arxiv.org/html/2606.07489#S7)\)\. On the same matched sessions, a human equipped with Search alone takes an average of 269 minutes to complete a task\. Replacing manual execution with automated implementation, the Computer \+ Human workflow reduces average task completion time to 36 minutes, lowering time and cost by 87% and 94%, respectively\. A breakeven analysis shows that a Search\-aided human professional would need to complete all manual steps in under 20 minutes to match the cost of Computer \+ Human\. A sensitivity analysis further confirms robustness to variation in human\-time estimates\. These results are cross\-validated through an independent LLM\-driven procedure and user interviews\.
4. 4\.Scope\(Section[8](https://arxiv.org/html/2606.07489#S8)\)\. Autonomous execution also expands the range of work users attempt\. *Horizontally*: Based on a sample of 8,000 users across 8 occupation clusters and all their queries, Computer queries venture outside users’ primary occupation more often than Search queries from the same users\. This pattern holds across all 8 occupation clusters, with an average gap of 9 pp\. *Vertically*: Task difficulty also differs\. A classification of 10,000 Computer and Search queries from a sample of 5,000 dual\-product users suggests that: 1. \(a\)Computer queries are more cognitively complex: 71% abstract non\-routine tasks versus 53% for Search; 76% higher\-order Bloom cognition versus 55%; Create\-level work accounts for 50% of Computer queries versus 26% of Search\. 2. \(b\)Computer queries draw upon a broader set of competencies: each Computer query requires substantive expertise in an average of 2\.40 distinct O\*NET Knowledge domains versus 1\.74 for Search \(\+38%\), with Computer nearly three times as likely as Search to require three or more domains \(51% vs\. 17%\)\. 3. \(c\)Computer composes more tasks into a single query: at the task\-activity level, Computer queries engage 2\.95 of O\*NET’s Generalized Work Activities on average versus 2\.24 for Search \(\+32%\) and 4\.01 of Intermediate Work Activities versus 2\.87 \(\+40%\); the gap widens at finer grains, with 59% more Detailed Work Activities \(3\.64 vs\. 2\.29\) and 60% more occupation\-specific Task Statements \(3\.81 vs\. 2\.38\) engaged per query\. 4. \(d\)Computer unlocks new task possibilities for users: 23% of Computer queries involve at least one O\*NET Task Statement that never appears in the same users’ Search queries\. The shares are smaller at coarser grains \(5% for Detailed Work Activities, under 1% for Intermediate and Generalized Work Activities\), indicating that Computer’s distinctiveness lies in fine\-grained executional work rather than coarse topical range\. These shares also increase as the tolerance threshold is relaxed\.

Together, these findings suggest that autonomous task execution accelerates existing workflows, improves quality, reduces costs, and expands the range of work users undertake\. By automating the generative components of tasks that require specialties, agents make it easier for users to branch into domains outside their core competency and take on tasks that are costly to produce but relatively easy to verify\. As individual workers absorb tasks that previously spanned occupational boundaries and expertise levels, the findings also indicate a reduction in coordination costs, with broader implications for occupational and organizational structure\.

The paper proceeds as follows\. Section[2](https://arxiv.org/html/2606.07489#S2)situates our contribution relative to prior work on AI’s productivity impact, autonomous agent capabilities, and task recomposition\. Section[3](https://arxiv.org/html/2606.07489#S3)develops an individual\-level task\-based conceptual framework to derive welfare predictions and motivate the empirical analysis\. Section[4](https://arxiv.org/html/2606.07489#S4)describes the samples drawn from Perplexity’s Search and Computer products over the February–May 2026 study period\. Sections[5](https://arxiv.org/html/2606.07489#S5),[6](https://arxiv.org/html/2606.07489#S6),[7](https://arxiv.org/html/2606.07489#S7), and[8](https://arxiv.org/html/2606.07489#S8)present the four empirical themes in turn: adoption growth and use cases; autonomy gains on matched sessions; reductions in task time and cost; and expansion in the scope of work\. Section[9](https://arxiv.org/html/2606.07489#S9)discusses limitations and implications\. Proofs, a numerical example illustrating the propositions, and supplementary analysis and user\-interview materials are collected in the Appendix\.

## 2Related Work

##### Productivity impact of AI assistants\.

A growing body of experimental evidence documents the productivity effects of generative AI assistants\. For instance,noy2023experimentalfind that ChatGPT reduces writing time by 40% and raises output quality by 18% in a randomized experiment with 453 professionals, with the largest gains for lower\-ability workers\.brynjolfsson2025generativestudy 5,172 customer support agents and report a 14% increase in issues resolved per hour, again with disproportionate gains for novice workers\.dellacqua2023navigatingfind that BCG consultants using GPT\-4 improve performance by up to 40% on tasks within the model’s capability frontier, but perform worse on tasks beyond it, a “jagged frontier” that underscores the importance of task–tool fit\. At larger scale,cui2024effectsrun three field experiments with 4,867 software developers and find that GitHub Copilot increases completed tasks by 26%, with junior developers benefiting most\. A notable counterpoint comes frombecker2025measuring, whose randomized trial finds that experienced open\-source developers using AI tools were 19% slower, suggesting that productivity gains may depend on task familiarity and developer expertise\. Relatedly,vendraminelli2025genaidocument a “GenAI wall effect” in which AI assistance fails to close the performance gap between occupational insiders and outsiders, highlighting limits to horizontal expertise transfer through interactive tools\. Moving beyond controlled experiments,tamkinmccrory2025productivityestimate productivity gains directly from large\-scale Claude usage logs, finding 80% time savings across a broader set of tasks\. These studies focus on humans working interactively with an AI assistant that augments each step\. Our setting differs because Computer replaces the interactive loop with asynchronous delegation\.

##### From assistance to autonomous AI agents\.

The progression from interactive assistants to autonomous agents has been driven by advances in tool use and multi\-step reasoning\.schick2023toolformerdemonstrate that language models can learn to invoke external tools in a self\-supervised manner;yao2023reactshow that interleaving reasoning traces with action steps improves task completion on knowledge\-intensive and decision\-making tasks\.kwa2025measuringintroduce the “time horizon” metric \(the task duration at which agents achieve 50% success rate\) and estimate the frontier at roughly 12 human\-hours, doubling approximately every seven months\. Beyond capability measurements, production deployments are beginning to reveal agent usage and impact in real\-world settings\. For instance,sarkar2026agentsanalyzes tens of thousands of Cursor users around the rollout of an agentic coding mode and finds that companies merge 39% more pull requests under the agent default, with experienced developers shifting effort from typing code to planning and supervising the agent\.mccain2026measuringanalyze millions of human\-agent interactions in Anthropic’s Claude Code and find that autonomous turn durations at 99\.9th percentile nearly doubled from 25 to 45 minutes between October 2025 and January 2026, with experienced users granting agents full autonomy in over 40% of sessions\.demirer2026writingcombine usage data from over 100,000 GitHub developers across successive generations of coding tools and find that autocomplete, interactive coding agents, and autonomous coding agents raise coding activity by 40%, 140%, and 180% respectively; these task\-level gains, however, attenuate sharply down the production chain—to 50% for projects and 30% for releases—consistent with human bottlenecks limiting how much reaches shipped software\.yang2025adoptionstudy the adoption and use of Perplexity’s Comet agentic browser and find that early users concentrate on productivity\- and learning\-related use cases\. Our study complements this work by comparing user interaction with an autonomous agent versus a conversational assistant, and by extending downstream impact analysis beyond coding to a wide range of knowledge work\.

##### Occupational exposure and task recomposition\.

Several studies move beyond individual\- and task\-level analysis to estimate, at the macro level, which occupations and tasks are most exposed to AI automation\. For instance,eloundou2024gptsfind that approximately 80% of the U\.S\. workforce could have at least 10% of their tasks affected by LLMs, with higher\-wage occupations more exposed\.felten2023occupationalconstruct an AI occupational exposure index updated for language\-model capabilities and find similar patterns of broad exposure concentrated in white\-collar work\. These exposure analyses assess potential displacement but do not measure how task composition actually changes once AI tools are adopted\. Usage\-based measurements are beginning to fill this gap: for instance,appel2026economicanalyze millions of Claude conversations mapped to occupational tasks and find that augmentation slightly outpaces full automation\.massenkoff2026laborintroduce a usage\-based measure of AI exposure to document a capability\-deployment gap across occupations\. A complementary theoretical literature emphasizes that automation can both displace existing tasks and create new ones, with the net labor\-market impact depending on the balance between displacement and reinstatement forces\(acemoglu2019automation\)\. Recent work further shows why task\-level exposure measures can miss important complementarities in how work is organized\. For instance,gans2026ringextend the O\-ring model of multi\-step production to show that the return to automation is limited by the human bottlenecks in the process\.garicano2026weakmodel jobs as bundles of tasks and show that weak bundles of easily separable tasks face stronger displacement pressure, whereas strong bundles are more resilient\. Our analysis provides direct evidence on task recomposition as users shift from conversational assistants to agents: usage tends to expand horizontally into other occupations and vertically into more complex work\.

## 3Conceptual Framework

This section develops a simple task\-based framework at the individual\-worker level\. Its aim is to derive partial\-equilibrium predictions about how autonomous execution affects task completion under minimal assumptions and to ground the empirical analysis\. To keep the theory product\-agnostic, we refer to two modes: a conversational mode and an agent mode\. We keep the framework general and provide a concrete numerical example in Appendix[B](https://arxiv.org/html/2606.07489#A2)\.

### 3\.1Task primitives

Consider a finite set of candidate task opportunities indexed by:

Tasks are ordered by the number of steps or subtasks required for completion:

0<s1≤s2≤⋯≤sJ,0<s\_\{1\}\\leq s\_\{2\}\\leq\\cdots\\leq s\_\{J\},which we call the task’s*step count*\. A step is an atomic unit of work \(e\.g\., lookup, calculation, code execution, synthesis\)\. We make four assumptions on the value and cost structure\.

##### Assumption 1: Higher\-step tasks have weakly higher value\.

Task values are weakly increasing in step count:

0<v1≤v2≤⋯≤vJ\.0<v\_\{1\}\\leq v\_\{2\}\\leq\\cdots\\leq v\_\{J\}\.This implies that longer tasks create weakly greater value\. Value is zero if no task is completed\.

##### Assumption 2: Task value is realized upon full completion\.

Task valuevjv\_\{j\}is realized only upon completion of allsjs\_\{j\}steps; partially completed tasks generate no value\. Equivalently, task opportunities are indivisible: a user attempts taskjjin full or not at all\.

##### Assumption 3: Agent has higher fixed per\-task cost than Conversational\.

Each mode has a fixed per\-task cost:

0<fConversational<fAgent\.0<f\_\{\\text\{Conversational\}\}<f\_\{\\text\{Agent\}\}\.fConversationalf\_\{\\text\{Conversational\}\}captures the cost of formulating the initial conversational prompt, typically a single question\.fAgentf\_\{\\text\{Agent\}\}captures the cost of delegating a task to an agent: specifying a well\-scoped objective and later reviewing the delegated output\. The signfAgent\>fConversationalf\_\{\\text\{Agent\}\}\>f\_\{\\text\{Conversational\}\}reflects a simple observation about agent user experience: directing an autonomous system typically requires more specification and verification than issuing a simple one\-shot query\.

##### Assumption 4: Agent has lower marginal per\-step cost than Conversational\.

Each step incurs a mode\-specific marginal costmt\>0m\_\{t\}\>0, and the agent mode is cheaper per step than the conversational mode:

0<mAgent<mConversational\.0<m\_\{\\text\{Agent\}\}<m\_\{\\text\{Conversational\}\}\.Under the conversational mode the user must plan each step, issue it, read the response, and execute; the per\-step cost includes human planning, interpretation, and execution overhead\. Under the agent mode the system plans and executes autonomously, leaving the user only the one\-time cost of delegation and review\. Equivalently,mAgentm\_\{\\text\{Agent\}\}is the*delegated\-execution*cost per step,mConversationalm\_\{\\text\{Conversational\}\}is the*human\-in\-the\-loop*cost per step, and their gap is the autonomy premium\.

##### Total cost\.

Combining the primitives, the total cost to complete taskjjunder modettis

C\(sj;t\)=ft\+mtsj\.C\(s\_\{j\};t\)=f\_\{t\}\+m\_\{t\}s\_\{j\}\.For toolkit𝒯⊆\{Conversational,Agent\}\\mathcal\{T\}\\subseteq\\\{\\text\{Conversational\},\\text\{Agent\}\\\}define the effective completion cost

cj𝒯≡mint∈𝒯⁡C\(sj;t\)\.c\_\{j\}^\{\\mathcal\{T\}\}\\equiv\\min\_\{t\\in\\mathcal\{T\}\}C\(s\_\{j\};t\)\.

### 3\.2Cost properties

The following property of the cost function follows directly from Assumptions 3–4\. Proofs are collected in Appendix[A](https://arxiv.org/html/2606.07489#A1)\.

###### Lemma 1\(Agent is preferred for more complex tasks\)\.

Under Assumptions 3–4, for any task that could be routed to either mode, the user strictly prefers the conversational mode to the agent mode whenever the step count falls below a positive threshold,

s<s∗≡fAgent−fConversationalmConversational−mAgent\>0,s<s^\{\\ast\}\\equiv\\frac\{f\_\{\\text\{Agent\}\}\-f\_\{\\text\{Conversational\}\}\}\{m\_\{\\text\{Conversational\}\}\-m\_\{\\text\{Agent\}\}\}\>0,and strictly prefers the agent mode above it\.

Lemma[1](https://arxiv.org/html/2606.07489#Thmlem1)establishes that the agent’s dominance is confined to tasks with enough steps to amortize its higher delegation overhead\. Even when the agent mode is strictly cheaper per step, its higher fixed delegation cost dominates on short tasks\. The conversational mode is therefore preferred in the low\-ssregion \(e\.g\., quick lookups, single\-turn clarifications, one\-off factual questions\), while the agent mode is preferred in the high\-ssregion\.

### 3\.3Optimal task selection

LetB\>0B\>0denote the user’s resource endowment\. Letaj∈\{0,1\}a\_\{j\}\\in\\\{0,1\\\}indicate whether taskjjis attempted\. If taskjjis attempted, the user also selects a mode from the toolkit that minimizes the cost of taskjj\. By Assumption[3\.1](https://arxiv.org/html/2606.07489#S3.SS1.SSS0.Px2), the user then pays the total mode\-dependent costcj𝒯c\_\{j\}^\{\\mathcal\{T\}\}and generates valuevjv\_\{j\}; otherwise nothing is paid and no value is realized\. The user therefore solves a standard 0\-1 knapsack problem

max\{aj\}j=1J∑j=1Jajvj\\max\_\{\\\{a\_\{j\}\\\}\_\{j=1\}^\{J\}\}\\sum\_\{j=1\}^\{J\}a\_\{j\}v\_\{j\}subject to the aggregate resource budget

∑j=1Jajcj𝒯≤B\.\\sum\_\{j=1\}^\{J\}a\_\{j\}c\_\{j\}^\{\\mathcal\{T\}\}\\leq B\.We describe a standard dynamic programming solution in Appendix[B](https://arxiv.org/html/2606.07489#A2)\.

### 3\.4Predictions

Leta𝒯,j∗a^\{\\ast\}\_\{\\mathcal\{T\},j\}denote the optimal attempt decision for taskjjunder toolkit𝒯\\mathcal\{T\}, and write

W𝒯≡∑j=1Ja𝒯,j∗vj,K𝒯≡∑j=1Ja𝒯,j∗cj𝒯,W^\{\\mathcal\{T\}\}\\equiv\\sum\_\{j=1\}^\{J\}a^\{\\ast\}\_\{\\mathcal\{T\},j\}v\_\{j\},\\qquad K^\{\\mathcal\{T\}\}\\equiv\\sum\_\{j=1\}^\{J\}a^\{\\ast\}\_\{\\mathcal\{T\},j\}c\_\{j\}^\{\\mathcal\{T\}\},Π𝒯≡W𝒯−K𝒯,A𝒯≡\{j:a𝒯,j∗=1\}\\Pi^\{\\mathcal\{T\}\}\\equiv W^\{\\mathcal\{T\}\}\-K^\{\\mathcal\{T\}\},\\qquad A^\{\\mathcal\{T\}\}\\equiv\\\{j:a^\{\\ast\}\_\{\\mathcal\{T\},j\}=1\\\}for the total realized value, total realized cost, total surplus \(value net of cost\), and selected task set in the optimum under𝒯\\mathcal\{T\}\. We usepre,post\{\\textit\{pre\}\},\{\\textit\{post\}\}for the conversational\-only and conversational\-plus\-agent toolkits\. For shorthand, write effective costs in both periods as

cjpre≡C\(sj;Conversational\),cjpost≡min⁡\{C\(sj;Conversational\),C\(sj;Agent\)\}\.c\_\{j\}^\{\\text\{pre\}\}\\equiv C\(s\_\{j\};\\text\{Conversational\}\),\\qquad c\_\{j\}^\{\\text\{post\}\}\\equiv\\min\\\{C\(s\_\{j\};\\text\{Conversational\}\),C\(s\_\{j\};\\text\{Agent\}\)\\\}\.Let the induced upper affordability endpoints be

upre≡max⁡\{j:cjpre≤B\},upost≡max⁡\{j:cjpost≤B\}u^\{\\text\{pre\}\}\\equiv\\max\\\{j:c\_\{j\}^\{\\text\{pre\}\}\\leq B\\\},\\qquad u^\{\\text\{post\}\}\\equiv\\max\\\{j:c\_\{j\}^\{\\text\{post\}\}\\leq B\\\}with the sameu=0u=0convention when no task is individually affordable\.

We derive several predictions from the knapsack problem, with proofs in Appendix[A](https://arxiv.org/html/2606.07489#A1)\.

###### Proposition 1\(Affordable value frontier expands\)\.

Adding agent access weakly expands the set of individually affordable tasks; therefore, the highest individually affordable value weakly rises:

upost≥upre,vupost≥vupre\.u^\{\\text\{post\}\}\\geq u^\{\\text\{pre\}\},\\quad v\_\{u^\{\\text\{post\}\}\}\\geq v\_\{u^\{\\text\{pre\}\}\}\.This indicates that the agent mode unlocks weakly higher\-value tasks that are not feasible under the conversational mode\.

###### Proposition 2\(Total value expands\)\.

Adding agent access weakly increases total realized value:Wpost≥WpreW^\{\\text\{post\}\}\\geq W^\{\\text\{pre\}\}\.

###### Corollary 1\(Total surplus expands\)\.

When the conversational\-only chosen bundle exhausts the aggregate budget, adding agent access weakly increases total surplus:

Πpost≥Πpre\.\\Pi^\{\\text\{post\}\}\\geq\\Pi^\{\\text\{pre\}\}\.

###### Corollary 2\(Value\-to\-cost ratio expands\)\.

When the conversational\-only chosen bundle exhausts the aggregate budget, adding agent access weakly increases the aggregate value\-to\-cost ratio:

WpostKpost≥WpreKpre\.\\frac\{W^\{\\text\{post\}\}\}\{K^\{\\text\{post\}\}\}\\;\\geq\\;\\frac\{W^\{\\text\{pre\}\}\}\{K^\{\\text\{pre\}\}\}\.

###### Proposition 3\(Surplus change decomposes into intensive, entry, and exit margins\)\.

For any pre\- and post\-agent selected task sets, the surplus change can be written as

ΔΠ≡Πpost−Πpre\\displaystyle\\Delta\\Pi\\equiv\\Pi^\{\\text\{post\}\}\-\\Pi^\{\\text\{pre\}\}=∑j∈Apre∩Apost\(cjpre−cjpost\)⏟intensive: cost saving\\displaystyle=\\underbrace\{\\sum\_\{j\\in A^\{\\text\{pre\}\}\\cap A^\{\\text\{post\}\}\}\\big\(c\_\{j\}^\{\\text\{pre\}\}\-c\_\{j\}^\{\\text\{post\}\}\\big\)\}\_\{\\text\{intensive: cost saving\}\}\+∑j∈Apost∖Apre\(vj−cjpost\)⏟extensive: surplus from entry\\displaystyle\\quad\+\\underbrace\{\\sum\_\{j\\in A^\{\\text\{post\}\}\\setminus A^\{\\text\{pre\}\}\}\\big\(v\_\{j\}\-c\_\{j\}^\{\\text\{post\}\}\\big\)\}\_\{\\text\{extensive: surplus from entry\}\}−∑j∈Apre∖Apost\(vj−cjpre\)⏟extensive: surplus from exit\.\\displaystyle\\quad\-\\underbrace\{\\sum\_\{j\\in A^\{\\text\{pre\}\}\\setminus A^\{\\text\{post\}\}\}\\big\(v\_\{j\}\-c\_\{j\}^\{\\text\{pre\}\}\\big\)\}\_\{\\text\{extensive: surplus from exit\}\}\.The intensive term is cost savings on retained tasks and is weakly non\-negative\. The entry and exit terms are the net surplus from newly attempted and no\-longer\-attempted tasks\.

Task value is unobserved in our data, so the value, surplus, ratio, and decomposition results are not directly testable\. The empirical analysis therefore focuses on observable cost and scope outcomes\. Section[7](https://arxiv.org/html/2606.07489#S7)tests cost structure assumptions and cost reduction\. Section[8](https://arxiv.org/html/2606.07489#S8)tests whether lower costs unlock more complex attempted work\.

## 4Data

Perplexity Computer was released on February 25, 2026\. Our main analysis covers the 3\-month post\-launch period February 27 through May 27, 2026 and draws on 7 samples\. We follow the privacy procedure used inyang2025adoptionto ensure that no raw queries are exposed to human analysts and all results are reported in highly aggregated forms\. In our empirical setting, Search instantiates the conversational mode and Computer instantiates the agent mode in the conceptual framework\. We describe each of the 7 samples analyzed in the empirical sections below\.

##### Adoption \(Section[5](https://arxiv.org/html/2606.07489#S5)\)\.

We use two samples for adoption analyses\. The first is the full universe of Computer and Search queries over the 90\-day post\-launch window\. For Search query growth, we split the user base into users who issued at least one Computer query in the window \(“Computer users”\) and users who did not \(“non\-Computer users”\)\. The second sample is a random draw of 100,000 Computer queries from the same window, each labeled by an LLM along two dimensions: primary task category and subject\-matter domain\.

##### Autonomy \(Section[6](https://arxiv.org/html/2606.07489#S6)\)\.

The autonomy analysis rests on a matched\-pair design that compares Computer and Search outcomes on sessions with near\-identical initial user queries\.111A session \(or thread\) consists of one or more related turns that share a common context\. Within a session, the AI retains prior messages, tool outputs, and intermediate state, enabling coherent reasoning across turns rather than treating each message in isolation\.We begin by identifying dual\-product users \(those who issued at least one Computer initial query, defined as the first message of a session, and at least one Search initial query during the window\) and draw 100,000 such users\. To ensure that every matched Computer query exercises Computer’s full agentic capabilities, we require the Computer session to invoke at least one execution \(“do”\) tool such as code execution, browser actions, file creation, and external API calls that go beyond information retrieval and synthesis\. Queries that do not invoke a “do” tool are functionally similar to Search and excluded from the Computer side\. For each sampled user we collect every qualifying Computer initial query and up to the 100 most recent Search initial queries, embed each query into a dense vector representation, and compute pairwise cosine similarity within the user\. We then perform one\-to\-one greedy matching, retaining each Computer query’s top\-ranked Search neighbor, and keep pairs with similarity\>0\.99\>0\.99\.222We set the threshold to make the matched pairs as similar as possible while allowing for minor punctuation and formatting differences, such as whitespace and line breaks\.From the resulting pool of near\-identical matches we randomly sample10,00010\{,\}000pairs \(multiple pairs per user are possible\)\. For the follow\-up query analysis, we additionally draw a1,0001\{,\}000\-pair subsample restricted to pairs where both sessions have≥2\\geq 2turns, and classify these into a 10\-category taxonomy\.

##### Efficiency \(Section[7](https://arxiv.org/html/2606.07489#S7)\)\.

The efficiency analysis reuses the10,00010\{,\}000matched query pairs from the autonomy sample, augmented with execution\-cost and human\-time data\. For the Search \+ Human counterfactual we use two independent human\-time estimates\. The tool\-based estimate sums human\-equivalent minutes for each “do” tool invocation observed in the Computer thread; “search” tools are treated as already provided by Search and contribute zero human minutes\. The LLM\-based estimate is an independent validation: for each of the10,00010\{,\}000pairs we prompt an LLM with the query text only, describing the Search \+ Human counterfactual, and elicit a total human\-time estimate per task session\. Both estimates are converted to cost by combining model cost with human labor cost at domain\-specific hourly wages \(BLS Occupational Employment and Wage Statistics, May 2025 release, mapped to the 18 query domains\)\. A third, user\-reported estimate comes from 45\-minute semi\-structured interviews with 25 active Computer users \(6 enterprise, 19 consumer\) recruited from those with≥5\\geq 5historical queries; participants self\-report their pre\-Computer workflow time and cost displacement\.

##### Scope \(Section[8](https://arxiv.org/html/2606.07489#S8)\)\.

Scope analyses use two paired samples drawn from the same dual\-user universe\. The*horizontal*sample identifies each user’s primary occupation cluster by mapping every Search query in the window to one of the 8 most common clusters in our data \(Digital Technology, Financial Services, Healthcare & Human Services, Education, Public Service & Safety, Management & Entrepreneurship, Marketing & Sales, Arts & Design\), then assigning the mode cluster as the user’s primary occupation\. We restrict to users active in both products and then sample exactly1,0001\{,\}000users per cluster, yielding8,0008\{,\}000users in total\. For each sampled user we then pull all Search and Computer queries over the full 90\-day window; each Computer query is assigned a destination cluster by a direct single\-label LLM call \(8 clusters \+ “Other”\), while each Search session is mapped deterministically via its topic domain\. The*vertical*sample is a paired query\-level draw from a fixed set of5,0005\{,\}000dual\-product users: for each user we randomly sample one do\-gated Computer initial query and one Search initial query within the window, yielding10,00010\{,\}000queries from the same users, a paired within\-user comparison\. Each query is then LLM\-classified along four axes: cognitive complexity \(Bloom’s Revised Taxonomy and routine versus abstract task types\), required knowledge breadth \(which O\*NET Knowledge domains the query requires\), task\-activity composability \(which O\*NET Generalized Work Activities, Intermediate Work Activities, Detailed Work Activities, and occupation\-specific Task Statements the query engages\), and new tasks unlocked \(work activities that appear in the user’s Computer queries but are essentially absent from the same user’s Search queries\)\.

## 5Adoption

##### Growth\.

Figure[2](https://arxiv.org/html/2606.07489#S5.F2)shows Computer’s growth trajectory in the three months following launch, using Search growth as a baseline for an established product\. Each series is reported as a cumulative running total and indexed to its own week\-1 cumulative \(Feb 27–Mar 5=1×=1\\times\)\. Cumulative Computer queries reached 84×\\timesby May 27\. Over the same window cumulative Search queries from Computer users reached 14×\\times, slightly above non\-Computer users at 12×\\times\.

The faster growth of Search among Computer users could in principle reflect either complementarity or selection; we isolate the causal effect with a matched difference\-in\-differences design that compares Computer adopters to non\-adopters exactly matched on subscription tier, primary search topic, and pre\-period Search intensity\. Computer adoption increases daily Search queries by1\.051\.05, with positive estimates in all search topics and across alternative staggered\-adoption estimators\. Appendix[D](https://arxiv.org/html/2606.07489#A4)reports the full design, robustness checks, and intensive\-margin estimates\.

![Refer to caption](https://arxiv.org/html/2606.07489v1/x2.png)Figure 2:Cumulative adoption growth, with each series indexed to its own week\-1 cumulative total \(Feb 27–Mar 5=1×=1\\times\)\. Cumulative Computer queries reached 84×\\times\. Cumulative Search queries from Computer users reached 14×\\times, slightly above non\-Computer users at 12×\\times\.
##### What do people use Computer for?

To characterize the task distribution, we classify a random sample of 100,000 Computer queries along two dimensions: primary task category and subject\-matter domain \(Figure[3](https://arxiv.org/html/2606.07489#S5.F3)\)\. Research & Analysis is the most common task category \(25\.8%\), followed by Document & Asset Creation \(18\.6%\)\. Capabilities & Product Discovery accounts for 5\.3%–6\.0% of queries and is decreasing over time as users transition from exploration to production\.333The label appears on both classification axes: 5\.3% of queries have it as their*task category*, and 6\.0% have it as their*subject\-matter domain*\. The two sets overlap but are not identical, since task and domain are classified independently\.Subject\-matter domains are broadly distributed across knowledge work: Software & IT leads \(13\.8%\), followed by Finance & Investing \(10\.8%\), Marketing & Sales \(7\.6%\), General Business Operations \(7\.0%\), Healthcare & Life Sciences \(6\.8%\), Education & Academia \(5\.9%\), Legal & Compliance \(5\.5%\), and Media & Creative \(5\.1%\)\.

![Refer to caption](https://arxiv.org/html/2606.07489v1/x3.png)Figure 3:Use case distribution of Computer queries along two classification dimensions\.Left:Task category: Research & Analysis \(25\.8%\) and Document & Asset Creation \(18\.6%\) dominate\.Right:Subject\-matter domain: usage is broadly distributed across 15\+ subject areas\. Within each panel, categories are sorted by share \(descending\), with Other placed last\. Percentages may not sum exactly to 100% due to rounding\.

## 6Autonomy

A core design principle of Computer is to complete tasks autonomously, executing multi\-step workflows with minimal human intervention\. To measure how well it achieves this, we compare Computer sessions to Search sessions on the same initial queries from the same users\. While both query and session \(a thread of related queries\) can serve as proxies for a task unit, Search and Computer may induce different interaction patterns on the same task\. We therefore use session as the primary unit of analysis, and also report query\-level results where relevant\.

### 6\.1Method

A direct comparison between Computer and Search is complicated by endogenous task selection: users may send different types of queries to each product, as predicted by the sorting behavior in Lemma[1](https://arxiv.org/html/2606.07489#Thmlem1)\. Ideally, we would compare outcomes for the same task performed with Computer versus Search\. We therefore exploit user\-generated natural experiments in which the same user submits near\-identical initial queries \(first message in a session\) to both products\.

We identify users who submitted at least one initial query to both Computer and Search between February 27 and May 27, 2026\. From these, we sample 100,000 users\. We restrict the Computer side to initial queries that invoked at least one*execution*tool \(e\.g\., browser actions, code execution, file writes, drive uploads, external connector calls\), so that every matched Computer session actually performs autonomous work rather than returning a chat\-style response\. For each sampled user, we collect all qualifying initial Computer queries and up to 100 most recent initial Search queries\. We embed each query and compute pairwise cosine similarity within each user’s query sets\. We then perform one\-to\-one greedy matching and sample 10,000 near\-identical pairs \(similarity\>0\.99\>0\.99\) drawn from8,3578\{,\}357unique users, which provides a clean control for task content by comparing effectively the same task attempted through both products\. The task domain of a session is labeled by the primary domain of the initial query\.

### 6\.2Results

We first compare autonomy at the matched\-session level, measuring execution time, the rate of model pauses and user stops, and the number of connector calls to external apps\.

##### Execution time\.

The defining feature of Computer’s autonomy is the machine work it performs between user turns\. For Computer, we compute per\-turn wall\-clock time444Many Computer sessions involve parallel task execution, so wall\-clock time reflects user experience and underestimates total machine time\.from the user’s submission timestamp to the last LLM\-response end within the turn, summed across turns, and capped at three hours per session to reduce the influence of outliers555Less than 5% of Computer sessions exceed the three\-hour limit due to a combination of rare long\-running jobs, recurring jobs, system retries, and other edge cases\.; this captures both model reasoning and downstream tool\-execution time\. For Search, we similarly use the end\-to\-end server latency from query receipt to last token summed over turns \(covering retrieval, reasoning, and generation\)\.

Averaging across matched pairs, Computer sessions run 26 minutes of wall\-clock execution, versus 33 seconds for Search, a 48×\\timesratio\. An average Computer session contains 5\.3 queries versus 2\.8 for Search, yielding a per\-query runtime gap of 25×\\times\. The gap is also visible at the distribution level \(Figure[4](https://arxiv.org/html/2606.07489#S6.F4)\): Computer and Search per\-session runtimes barely overlap, with Search concentrated near 10–30 seconds and Computer spread across roughly 5 minutes to over an hour \(median 9 minutes vs\. 14 seconds, a 40×\\timesgap\)\.

The execution\-time ratio also varies substantially by domain \(Figure[5](https://arxiv.org/html/2606.07489#S6.F5)\), reflecting the difference in the nature of tasks across domains\. Local shows the largest gap \(75×\\times\), followed by Politics \(67×\\times\), Finance \(64×\\times\), and Business \(60×\\times\)\. Science \(26×\\times\) and Education \(27×\\times\) show the smallest ratios, because Search’s per\-turn responses already suffice for common tasks like concept explanation\. Technology and Business, the two largest categories by volume, show 58×\\timesand 60×\\timesratios, with Computer averaging 27 and 31 minutes versus 28 and 31 seconds for Search\.

![Refer to caption](https://arxiv.org/html/2606.07489v1/x4.png)Figure 4:Distribution of per\-session machine execution time for Computer vs\. Search\. Each session contributes one value: the total machine execution time summed across all turns\. Computer is per\-turn wall\-clock from user submission to the last LLM\-response end, summed across turns and capped at three hours; Search is the session’s turn count times its per\-turn end\-to\-end streaming time\. The two distributions barely overlap: Search is a tight mass near 10–30 seconds, while Computer is a wide, right\-skewed distribution centered near 9 minutes, with a 40×\\timesgap in medians \(9m vs\. 14s\)\.![Refer to caption](https://arxiv.org/html/2606.07489v1/x5.png)Figure 5:Level of autonomy: average machine execution time per session across 18 domains, for Computer vs\. Search\. Computer sessions are restricted to those that invoked at least one execution tool\. The ratio column reports Computer/Search execution time per session\. Across all domains, Computer performs roughly 26–75×\\timesas much machine work per session as Search\. Domains are sorted by the Computer/Search ratio \(descending\), with Other placed last\.
##### Model pauses and user stops\.

At the query level, 13% of Computer queries invoke at least one pause\-for\-user tool during execution, versus 0\.3% of Search queries; at the session level, the rates rise to 38% versus 0\.8%\. Because Computer performs longer\-running actions, it more often pauses to request critical input from the user, ensuring that the final output matches the user’s intent\. At the session level, pauses are dominated by approval prompts \(24\.2% of sessions\), followed by open clarifying questions \(16\.9%\), structured input requests \(2\.3%\), and file\-upload waits \(0\.7%\)\. Despite Computer’s longer runtimes, user\-initiated interruptions are similar across products: 3\.7% of Computer sessions contain at least one user stop event versus 3\.4% of Search sessions\. Users are no more likely to abandon a long autonomous Computer run than a short Search response, suggesting that the autonomous execution is largely trusted to run to completion once launched\.

##### Connector calls\.

Another indicator of autonomy is whether Computer chains many external\-tool calls via Model Context Protocol \(MCP\) or an Application Programming Interface \(API\) endpoint into a single session, work that a Search user would otherwise operate manually through separate apps\. 7\.9% of Computer sessions invoke at least one connector call versus 1\.8% of Search sessions, a 4×\\timesgap, and the mean number of connector calls per session is 1\.19 for Computer vs\. 0\.10 for Search, a 12×\\timesratio\. For requests in which a connector is invoked, Computer fires 15\.0 calls per session on average versus 5\.5 for Search\. At the pair level, of the 914 pairs in which at least one side used a connector, only Computer used one in 80% \(735 pairs\)\. The pattern is most extreme for Finance \(23% Computer sessions vs\. 1\.2% Search\), Technology \(10% vs\. 3%\), and Business \(9% vs\. 2%\)\. This pattern suggests that Computer’s autonomy also operates with greater context integration over a larger input\-output space, enabling it to retrieve data from and take actions across a broader set of external services\.

##### What do follow\-up turns contain?

To characterize how user interaction differs with greater autonomy, we examine the content of follow\-up turns in the matched sessions\. We sample 1,000 matched pairs where both products required multiple turns and classify all 15,507 follow\-up queries \(7,093 on the Search side, 8,414 on the Computer side\) with an LLM against a single 10\-category taxonomy \(Table[1](https://arxiv.org/html/2606.07489#S6.T1)\)\. Three patterns emerge\. First, the overall propensity toward task advancement is near\-identical across products \(drill\-downs, new subtasks, and extensions together account for 52\.7% of Computer follow\-ups versus 52\.9% for Search\), but its composition shifts: relative to Search, Computer substitutes extensions for drill\-downs, with drill\-downs falling from 23\.4% to 22\.0% \(−1\.4\-1\.4pp\) and extensions rising from 12\.5% to 14\.2% \(\+1\.7\+1\.7pp\)\.666The two differ in what they ask the system to do: a*drill\-down*is an information\-seeking question about the existing output \(“why did you choose this?”, “how does this compare?”\), which keeps the user in a clarification loop, whereas an*extension*requests a new component that builds on the delivered output \(“now also add X”\), pushing the artifact further\.Because Computer returns a more complete deliverable up front, its users spend relatively fewer follow\-ups clarifying the result and more extending it\. Second, Computer users spend slightly more of their follow\-ups reviewing and revising autonomous output: revision and verification together account for 24\.6% of Computer follow\-ups versus 23\.6% for Search \(\+1\.0\+1\.0pp\)\. Third, Search is heavier on lightweight continuation: confirmations, format\-delivery, and retry requests together account for 11\.6% of Search follow\-ups versus 9\.9% for Computer \(−1\.7\-1\.7pp\), short directives that Computer integrates into its initial run\. Manual data inputs are essentially identical \(8\.5% vs\. 8\.4%\), as expected when matched\-pair design controls task content\. Matched Computer and Search sessions therefore contain human turns for different reasons: Search turns reflect shorter digest\-and\-execute loops, whereas Computer turns reflect longer review\-and\-extend loops\.

CategorySearch \(%\)Computer \(%\)Δ\\Delta\(pp\)Task advancementDrill\-down23\.422\.0−1\.4\-1\.4New subtask17\.016\.5−0\.5\-0\.5Extension12\.514\.2\+1\.7\+1\.7Subtotal52\.952\.7−0\.2\-0\.2Output review / iterationRevision13\.914\.1\+0\.2\+0\.2Verification9\.710\.5\+0\.8\+0\.8Subtotal23\.624\.6\+1\.0\+1\.0Manual inputData input8\.48\.5\+0\.1\+0\.1Short directivesConfirmation6\.86\.5−0\.3\-0\.3Format\-delivery3\.42\.7−0\.7\-0\.7Retry1\.40\.7−0\.7\-0\.7Subtotal11\.69\.9−1\.7\-1\.7OtherUnclassified3\.54\.3\+0\.8\+0\.8Total queries7,0938,414Table 1:Follow\-up query composition for Computer vs\. Search in the 1,000\-pair multi\-turn sample classified against a single 10\-category taxonomy\. Group subtotals \(italics\) sum the constituent categories\. Task advancement is near\-identical overall \(52\.7% vs\. 52\.9%\), but its composition shifts from drill\-downs toward extensions for Computer \(extensions 14\.2% vs\. 12\.5%; drill\-downs 22\.0% vs\. 23\.4%\); Computer is also slightly higher on output review/iteration overall \(24\.6% vs\. 23\.6%\), while Search is heavier on short directives \(11\.6% vs\. 9\.9%\)\. Manual data inputs are matched \(≈\\approx8\.5%\), consistent with matched\-pair design controlling task content\. Columns may not sum exactly to 100% due to rounding\.
##### User satisfaction\.

Lastly, we investigate if higher autonomy is associated with lower user satisfaction\. Higher autonomy could in principle come at a quality cost: users might issue more dissatisfied follow\-ups to Computer’s autonomous output than to Search’s chat response\. We test this directly by scoring each response’s observable user dissatisfaction on the scale \{zero, low, mid, high\} based on what the user does next \(re\-asks, corrections, error reports, retries, etc\.\)\. The dissatisfaction signal is restricted to multi\-turn sessions\. Computer elicits less dissatisfaction at every level \(Table[2](https://arxiv.org/html/2606.07489#S6.T2)\): the mid\+high rate is 1\.3% for Computer versus 2\.9% for Search, and any dissatisfaction \(low\+mid\+high\) is 10\.8% versus 16\.6%\. Computer’s autonomous execution thus increases autonomy and response quality simultaneously\.

Table 2:Next\-turn dissatisfaction rates for matched Computer vs\. Search responses among pairs with multiple turns\. Each response is scored on the scale \{zero, low, mid, high\} based on what the user does next \(re\-asks, corrections, error reports, retries, etc\.\)\. Computer’s autonomous responses elicit measurably lower dissatisfaction than Search’s chat responses across every level of the signal\. Columns may not sum exactly to 100% due to rounding\.

## 7Efficiency

Computer’s autonomous execution augments human labor with machine computation, shifting execution effort from manual work to oversight\. To quantify this shift, we estimate the human and compute time and cost required to complete each matched task under two regimes: \(1\) Search \+ Human, where Search handles information retrieval and processing but the human must manually perform all non\-research actions, and \(2\) Computer \+ Human, where the model executes the full workflow and the human provides only oversight\. We also test the assumptions and lemmas in Section[3](https://arxiv.org/html/2606.07489#S3)\.

### 7\.1Method

It is challenging to estimate the human time spent on a task, so we triangulate via three independent approaches: tool\-based estimate, LLM estimate, and user self\-reports from interviews\. The first two approaches are based on production data\.

##### Tool\-based estimate\.

We classify Computer’s tool calls into two categories based on what a Search user would still need to do manually \(Table[3](https://arxiv.org/html/2606.07489#S7.T3)\):

- •“Search” tools: research actions that Search already handles\. These includesearch\_web,fetch\_url,search\_vertical,navigate\(browsing\),read\(document analysis\), and result\-submission tools \(submit\_answer,submit\_analysis,submit\_result\)\.
- •“Do” tools: actions the human must execute themselves, since Search provides only information\. These include, for instance,bash\(terminal commands\),write\(create files\),edit\(modify files\),js\_repl\(run code\),computerin browser\-task mode \(web app interaction\),call\_external\_tool\(connector calls\), andshare\_file\(export deliverables\)\.

For each of the 10,000 matched Computer sessions, we sum the estimated human\-equivalent minutes from “Do” tool calls \(Table[3](https://arxiv.org/html/2606.07489#S7.T3)\)\. This represents the manual work a Search user would still need to perform after receiving Search’s answer\. Computer’s human time is fixed at 10 minutes of oversight per task \(e\.g\., writing the prompt and reviewing the output\)\. For cost, we combine model cost with human labor cost\. Human labor cost uses Bureau of Labor Statistics Occupational Employment and Wage Statistics \(BLS OEWS, May 2025, the most recent available\) mean hourly wages mapped to each query domain \(Table[4](https://arxiv.org/html/2606.07489#S7.T4)\)\.

ToolManual equivalent for a Search userMin/call“Search” tools: already covered by Searchsearch\_webRun a web search0fetch\_urlOpen a URL and read its contents0search\_verticalQuery a domain\-specific index \(news, academic\)0navigateClick through linked pages0readRead and extract from a document0submit\_\*Compose the final written answer0“Do” tools: the human must executebashRun a terminal command5writeAuthor a new file from scratch15editLocate and modify text in an existing file10js\_replWrite and execute a code snippet15computer\(browser\)Click/type in a web application0\.5/clickbrowser\_taskComplete a multi\-step browser workflow10call\_external\_toolIssue an API call \(auth, request, parse\)5deploy\_websiteDeploy code to a host or server5start\_serverSpin up a local dev server or service5findLocate a file or string in a project3system\_diagnosticInspect system state, check logs3share\_fileExport and deliver a file2Table 3:Tool classification used in the tool\-based estimate\. “Search” tools mirror capabilities Search already provides, so no manual time is charged\. “Do” tools require the human counterfactual user to act on Search’s research output; per\-call minute estimates approximate the time a skilled professional would spend performing the equivalent action by hand\.Table 4:Hourly wage estimates used for cost calculations\. Wages are BLS OEWS mean hourly wages \(rounded to whole dollars\), May 2025 release, mapped from BLS major occupation groups to the 18 query domains\. The same domain\-specific wage is applied to human time spent with Search and Computer\. Rows are sorted by hourly wage \(descending\), with Other placed last\.
##### LLM\-based estimate\.

The tool\-time mapping relies on two key assumptions: 1\) accurate per\-tool human\-equivalent time estimates and 2\) humans follow the same procedure as Computer\. As an independent validation, we estimate the total human time required for each of the 10,000 matched pairs, given only the query text\. The prompt describes the Search \+ Human counterfactual: a skilled professional receives research answers from Search but must perform all execution steps manually\. The LLM returns a time estimate per task\.

##### User\-reported estimate\.

Both production\-data approaches fix the counterfactual to Search \+ Human execution, which may not capture all realistic pre\-agent workflows\. To capture a richer counterfactual, we conducted 45\-minute semi\-structured interviews with 25 active Computer users \(6 enterprise, 19 consumer\), recruited from those with at least 5 historical queries\. Participants walked through specific completed tasks, described their pre\-Computer workflow for each, and estimated the time that workflow would have taken; we also probed for cost displacement and recurring use cases\. Self\-reports are subject to recall and selection bias, but unlike the matched\-pair estimates they reflect the counterfactual users actually would have chosen rather than the Search \+ Human baseline\. These interviews are summarized by theme in Appendix[F](https://arxiv.org/html/2606.07489#A6)\.

### 7\.2Results

##### Tool\-based estimates\.

Using the tool\-based approach, Computer reduces both time and cost substantially across all 18 domains \(Figure[6](https://arxiv.org/html/2606.07489#S7.F6), Table[5](https://arxiv.org/html/2606.07489#S7.T5)\)\. The mean Search \+ Human task requires 121–596 minutes of manual execution \(269 minutes overall\), compared to 25–48 minutes for Computer \+ Human \(36 minutes overall\)\. Computer saves 79–92% of task time \(overall 87%\), with reductions largest for the most labor\-intensive domains such as Programming \(596 vs\. 48 min, 92% saved\), Technology \(280 vs\. 37 min, 87%\), and Social Media \(224 vs\. 34 min, 85%\)\. The cost savings are similarly large: 87–96% \(overall 94%\)\. Cost savings exceed time savings because domain\-specific wages amplify the effect: in high\-wage domains like Programming \($58/hr\), a 92% time reduction yields a 96% cost reduction\. Despite Computer’s higher model costs \($4–10 per task versus $0\.05 for Search\), human labor dominates: model costs account for less than 0\.1% of the Search \+ Human total, while they constitute 30–61% of Computer \+ Human’s cost\.

![Refer to caption](https://arxiv.org/html/2606.07489v1/x6.png)Figure 6:Task efficiency: time and cost for Search \+ Human vs\. Computer \+ Human\. Search \+ Human is dominated by human time and cost so the combined total is reported\.Left:total time to complete\.Right:total cost\. Despite higher model costs, Computer \+ Human saves 87–96% on total cost because it shifts expensive human execution toward machine computation and oversight\. Domains are sorted by the Search \+ Human / Computer \+ Human time ratio \(descending\), with Other placed last\.
##### Breakeven and sensitivity analysis\.

How fast would a human need to perform the execution steps for Search \+ Human to match Computer \+ Human’s total cost? Setting the two costs equal yields a breakeven threshold of 14–24 minutes across all domains \(median 18 minutes\)\. A professional would need to run all commands, edit all files, and navigate all web applications in under2020minutes to match Computer \+ Human’s total cost\.

The result is robust to two human time assumptions\. First, even if our per\-tool human\-equivalent time estimates are overstated by16×16\\times, Computer \+ Human retains a cost advantage on average \(8×8\\timesin the tightest domain\)\. Second, the 10\-minute human oversight assumption can be inflated by26×26\\times\(to 260 minutes\) before Computer \+ Human’s cost advantage disappears on average \(12×12\\timesin the tightest domain\)\. Figure[14](https://arxiv.org/html/2606.07489#A3.F14)in Appendix[C\.1](https://arxiv.org/html/2606.07489#A3.SS1)plots how the cost advantage erodes as we make the assumptions increasingly conservative\. The time advantage is robust to the same assumptions, though it is tighter than the cost advantage: it survives per\-tool overstatement to7×7\\timesoverall \(5×5\\timesin the tightest domain\) and oversight inflation to24×24\\timesoverall \(11×11\\timesin the tightest domain\), as shown in Figure[15](https://arxiv.org/html/2606.07489#A3.F15)in Appendix[C\.2](https://arxiv.org/html/2606.07489#A3.SS2)\.

##### LLM\-based estimate\.

The LLM approach confirms the tool\-based findings \(Table[5](https://arxiv.org/html/2606.07489#S7.T5)\)\. The LLM\-based estimates yield similar advantages: 79–88% time saved \(overall 84%\) and 88–94% cost saved \(overall 93%\) across all 18 domains\. The two methods yield task\-level estimates of comparable magnitude: mean human time is 269 minutes under the tool\-based approach and 227 minutes under the LLM\-based approach\. LLM estimates include time costs the tool mapping cannot measure, such as planning and digesting Search responses\. The tool mapping might also be sensitive to which tools each domain’s tasks invoke, while the LLM judges from query text alone\. Overall, the two approaches agree that Computer \+ Human delivers large savings across every domain: tool\-based 79–92% time and 87–96% cost; LLM 79–88% time and 88–94% cost\.

Table 5:Percentage of time and cost saved by Computer \+ Human relative to Search \+ Human, with multipliers in parentheses \(e\.g\., 94% \(16×\\times\) means Computer \+ Human is 94% or 16 times cheaper\)\. Human labor cost uses BLS OEWS May 2025 mean hourly wages\. Rows are sorted by tool\-based time savings \(descending\), with Other placed last\.
##### User\-reported estimates\.

As a third validation, we draw on semi\-structured interviews with 25 Computer users \(6 enterprise, 19 consumer\) who were asked to describe specific tasks and estimate time saved\. Among participants who provided quantifiable before/after comparisons, self\-reported speedups range from5×5\\timesto over300×300\\times, likely reflecting substantial variation in pre\-Computer baselines, with a per\-participant median of approximately25×25\\times\. Representative examples are summarized in Appendix[F\.2](https://arxiv.org/html/2606.07489#A6.SS2)\.

Next, we connect the cost estimates to Assumptions[3\.1](https://arxiv.org/html/2606.07489#S3.SS1.SSS0.Px3)and[3\.1](https://arxiv.org/html/2606.07489#S3.SS1.SSS0.Px4)and Lemma[1](https://arxiv.org/html/2606.07489#Thmlem1)in Section[3](https://arxiv.org/html/2606.07489#S3)\. We proxy step countssby the total number of Computer tool calls \(both “Search” and “Do” tools\) per task\.

##### Fixed cost\.

We proxy the per\-task fixed cost by total query characters per session \(summing the initial query and any user\-issued follow\-ups\), which captures prompt\-writing effort across the full task rather than only the opening turn\.777This approach does not capture the verification cost for Computer outputs so its fixed cost might be underestimated\.We compare within matched pairs to hold task content fixed; Computer sessions are about46%46\\%longer than Search sessions at the median \(652652vs\.448448characters\)\. This pair\-controlled gap is consistent with Assumption[3\.1](https://arxiv.org/html/2606.07489#S3.SS1.SSS0.Px3)\(fAgent\>fConversational\>0f\_\{\\text\{Agent\}\}\>f\_\{\\text\{Conversational\}\}\>0\): delegating a whole workflow requires more up\-front scoping than issuing a one\-shot question\.

##### Marginal cost\.

Across the matched pairs, total variable cost divided by total tool calls yields $0\.16 per step for Computer \+ Human and $2\.05 per step for Search \+ Human, a13×13\\timesgap\. The same pattern holds in time: total task execution minutes divided by total tool calls yields0\.460\.46minutes per step for Computer \+ Human and2\.662\.66minutes per step for Search \+ Human, a6×6\\timesgap\. Both comparisons support Assumption[3\.1](https://arxiv.org/html/2606.07489#S3.SS1.SSS0.Px4)\(mConversational\>mAgent\>0m\_\{\\text\{Conversational\}\}\>m\_\{\\text\{Agent\}\}\>0\)\.

##### How Computer’s efficiency advantage scales with task steps\.

To trace how the gap scales with task complexity, we regress the within\-pair wedge between Search \+ Human and Computer \+ Human onNtotalN\_\{\\text\{total\}\}tool calls, separately for time and for cost \(Table[6](https://arxiv.org/html/2606.07489#S7.T6)\)\. Both slopes are positive: each additional step widens the time gap by2\.392\.39minutes and the cost gap by$1\.94\\mathdollar 1\.94\. This is the empirical analogue of Lemma[1](https://arxiv.org/html/2606.07489#Thmlem1): because the wedge is strictly increasing inss, longer tasks reap larger savings under Computer \+ Human, generating the post\-adoption sorting thresholds∗=\(fAgent−fConversational\)/\(mConversational−mAgent\)s^\{\\ast\}=\(f\_\{\\text\{Agent\}\}\-f\_\{\\text\{Conversational\}\}\)/\(m\_\{\\text\{Conversational\}\}\-m\_\{\\text\{Agent\}\}\)\.888Using independent LLM\-based estimates of time and cost to construct the wedge yields consistently positive and significant results\.

Table 6:Within\-pair wedges \(Search \+ Human minus Computer \+ Human\) regressed on total Computer tool\-call counts=Ntotals=N\_\{\\text\{total\}\}across matched pairs\. Each column reports a separate regression: column \(1\) the time wedge in minutes; column \(2\) the cost wedge in dollars\. Heteroskedasticity\-robust standard errors in parentheses\. Significance:p∗<0\.10\{\}^\{\*\}\\,p<0\.10;p∗∗<0\.05\{\}^\{\*\*\}\\,p<0\.05;p∗⁣∗∗<0\.01\{\}^\{\*\*\*\}\\,p<0\.01\.

## 8Scope

The previous sections use within\-user, within\-task comparison to show Computer’s autonomous execution saves time and cost\. Now we turn to within\-user, cross\-task analysis to show that Computer also enables users to expand their work scopes at the query level\. Because Computer sessions contain more queries on average, session\-level estimates further amplify the query\-level differences\. We examine two dimensions:*horizontal expansion*\(whether users work across multiple occupations\) and*vertical expansion*\(whether users attempt more complex tasks\)\.

### 8\.1Horizontal expansion: cross\-occupation work

#### 8\.1\.1Method

##### Occupation inference\.

We assign Computer users to one of the 8 most common occupation clusters in the data based on their pre\-Computer Search behavior\. The clusters are based on the National Career Clusters Framework used in O\*NET\.999We drop Hospitality/Events/Tourism, Advanced Manufacturing, Energy & Natural Resources, Construction, Supply Chain & Transportation, and Agriculture because these clusters cover primarily physical, hands\-on work that is largely outside the scope of digital knowledge work and accounts for negligible query volume on both products\.For each user, we map the topic domain and subdomain of each Search query to a standardized occupation cluster \(e\.g\., Programming→\\toDigital Technology; Finance→\\toFinancial Services; Health→\\toHealthcare\)\. Multi\-cluster domains such as Business are split by subdomain \(e\.g\., Business/Marketing→\\toMarketing & Sales; Business/Management→\\toManagement & Entrepreneurship\)\. The user’s occupation is defined as the cluster containing their most frequently queried domain and subdomain\. To ensure within\-user comparison, we restrict the sample to users who actively use both Computer and Search during the study period\. From each occupation cluster, we draw 1,000 dual\-product users, yielding 8,000 users in total\. Computer queries are then mapped to a destination cluster directly via an LLM single\-label classification into the same 8\-cluster taxonomy\.101010For both Computer and Search queries, those that do not belong to any of these clusters are classified as “Other” and excluded\.

##### Cross\-occupation task share\.

For each user, we compute the fraction of their queries that fall outside their primary occupation cluster\. Each Computer and Search query is mapped to an occupation cluster and compared to the user’s assigned cluster\.

#### 8\.1\.2Results

##### Cross\-occupation task share\.

Computer users consistently work outside their primary occupation at higher rates than when using Search \(Figure[7](https://arxiv.org/html/2606.07489#S8.F7)\) across clusters\. Across the eight occupation clusters, Computer’s cross\-occupation share averages 59%, compared to 50% for Search, a 9 pp increase\. The effect is largest for Management & Entrepreneurship \(\+19 pp\), Digital Technology \(\+13 pp\), Arts & Design \(\+12 pp\), and Healthcare & Human Services \(\+10 pp\)\. Education \(\+6 pp\) and Marketing & Sales \(\+5 pp\) shift modestly, while Public Service & Safety \(\+2 pp\) and Financial Services \(\+1 pp\) are nearly flat\. The pattern is also corroborated by interviews in Appendix[F\.5](https://arxiv.org/html/2606.07489#A6.SS5)\.

![Refer to caption](https://arxiv.org/html/2606.07489v1/x7.png)Figure 7:Cross\-occupation task share by inferred occupation\. Computer users work outside their primary occupation more frequently than when using Search, particularly in Management \(\+19 pp\), Digital Technology \(\+13 pp\), Arts & Design \(\+12 pp\), and Healthcare \(\+10 pp\)\. Clusters are sorted by the Computer−\-Search gap in pp \(descending\)\.
##### Cross\-occupation destinations\.

Figures[8](https://arxiv.org/html/2606.07489#S8.F8)and[9](https://arxiv.org/html/2606.07489#S8.F9)break down the share of each cluster’s queries directed at occupations other than the user’s own\. The patterns differ markedly between products\. For Search, Digital Technology acts as a near\-universal sink: it is the top out\-of\-occupation destination for all clusters except itself, consistent with a workflow dominated by ubiquitous technical lookups\. For Computer, the flows are more distributed across multiple hubs\. Digital Technology, Marketing & Sales, Financial Services, and Management form a tightly connected core: Financial Services users primarily turn to Digital Technology \(15%\); Management users turn to Digital Technology \(16%\) and Public Service \(14%\); Marketing users turn to Digital Technology \(14%\); and Digital Technology users split between Financial Services \(12%\) and Marketing \(11%\)\. Outside this core, Healthcare users distribute their out\-of\-occupation queries across Public Service \(10%\) and Digital Technology \(10%\); Education users turn primarily to Healthcare \(12%\); Public Service users turn to Financial Services \(11%\) and Education \(9%\); and Arts & Design users split between Digital Technology and Education \(10% each\)\. The contrast, visualized in Figure[9](https://arxiv.org/html/2606.07489#S8.F9)as a single dominant hub on the Search side versus a multi\-hub pattern on the Computer side, suggests that Search cross\-occupation queries are looking up technical facts adjacent to one’s field, while Computer cross\-occupation queries are delegating work that would normally require a specialist in a different field\.

![Refer to caption](https://arxiv.org/html/2606.07489v1/x8.png)Figure 8:Cross\-occupation task destinations:P\(task cluster∣primary occupation\)P\(\\text\{task cluster\}\\mid\\text\{primary occupation\}\)for Search \(left\) and Computer \(right\) across the 8 occupation clusters\. Each row sums to 100%; the diagonal is the share of work that stays within the user’s primary occupation\. Color intensity is on a log scale\. Computer’s diagonal is consistently weaker than Search’s, and its off\-diagonal mass is spread across multiple destination clusters rather than concentrated on Digital Technology\.![Refer to caption](https://arxiv.org/html/2606.07489v1/x9.png)Figure 9:Cross\-occupation task flows on a ring of the 8 occupation clusters\. Each node shows the share of that cluster’s tasks that are out of the user’s primary occupation; curved arcs show each cluster’s top\-3 outgoing destinations, with line width proportional toP\(clusterY∣primary=X\)P\(\\text\{cluster \}Y\\mid\\text\{primary\}=X\)forY≠XY\\neq Xand line type indicating the rank\. Lines are colored by the destination cluster\.Left:Search concentrates flows into Digital Technology as the universal top destination\.Right:Computer spreads flows across Marketing, Management, Financial Services, and other executional destinations\.

### 8\.2Vertical expansion: task complexity

The horizontal analysis above shows that Computer usage spans more occupations\. Another question is whether Computer queries also attempt more demanding tasks\. We measure task complexity along four axes:*cognitive level*,*knowledge breadth*,*task composability*, and*new tasks unlocked*\.

#### 8\.2\.1Method

We sample 10,000 queries \(5,000 initial Computer queries and 5,000 initial Search queries\) from a fixed set of 5,000 dual\-product users \(one Computer query and one Search query per user, both drawn at random from each user’s sessions in the study period\)\. The Computer side is gated to sessions that invoked at least one execution \(“do”\) tool\. Each query is classified by an LLM along four axes using established taxonomies:

- •Cognitive level: Bloom’s Revised Taxonomy\(anderson2001taxonomy\): six levels of cognitive demand, from*Remember*\(factual recall\) through*Understand*,*Apply*,*Analyze*,*Evaluate*, to*Create*\(producing novel output\)\. We group these into lower\-order \(Remember, Understand, Apply\) and higher\-order \(Analyze, Evaluate, Create\)\. Each query is assigned to the highest level required to complete the request\.
- •Cognitive level: Task type\(autor2003skill\):*abstract*\(non\-routine cognitive work requiring judgment, creativity, or strategic reasoning\) versus*routine*\(rule\-following tasks such as fact lookup, format conversion, or template\-based writing\)\.
- •Knowledge breadth: for each query, the minimal set of O\*NET Knowledge areas\(onet\)whose substantive expertise is required to complete the task well\. O\*NET, maintained by the U\.S\. Department of Labor, defines 33 Knowledge areas \(e\.g\., Economics and Accounting, Design, Medicine and Dentistry, Law and Government\) that are used to characterize the knowledge requirements of nearly every U\.S\. occupation\. The classifier is instructed to return a*minimal*set: a domain counts only if doing the task well requires real knowledge in that area, not if the topic merely appears\.111111For instance, a query about “the 2008 financial crisis” does not require Economics expertise if it is a simple lookup, but “analyze this company’s 10\-K and flag accounting irregularities” does\.For each query we record the list of domains and its cardinality \(the*breadth*metric\)\.
- •Task composability: each query is classified against the O\*NET activity hierarchy at four nesting levels: 37*Generalized Work Activities*\(GWAs, the broadest top\-level groupings such as “Getting Information” or “Thinking Creatively”\); 332*Intermediate Work Activities*\(IWAs, occupation\-agnostic behaviors such as “Analyze business or financial data” or “Create visual designs or displays”\); 2,087*Detailed Work Activities*\(DWAs, e\.g\., “Analyze financial data to detect irregularities or trends”\); and 18,796 occupation\-specific*Task Statements*\(TSs, e\.g\., the Chief Executive task “Direct or coordinate an organization’s financial or budget activities”\)\. To avoid mechanically inflating counts via parent\-to\-child rollouts, GWAs are classified directly against the 37\-label catalog; IWAs are then classified only from the candidates implied by the chosen GWAs; and similarly for DWAs and TSs\. For each query we record the engaged set at every level and its cardinality\.
- •New tasks unlocked: beyond counting activities per query, we treat each product’s classified activity inventory as a set and ask which activities Computer attempts but Search essentially does not\. At each O\*NET nesting level \(GWA, IWA, DWA, TS\), the*only\-Computer set*at thresholdkkis the set of activities with more thankkoccurrences in the Computer queries and at mostkkoccurrences in the Search queries from the same users\. The strictest case \(k=0k=0\) corresponds to activities that appear in Computer but never in Search; relaxingkkadmits activities Search attempts only rarely\. For each Computer query we record whether it engages at least one only\-Computer activity at each level\.

#### 8\.2\.2Results

##### Cognitive level\.

Computer queries are substantially more cognitively complex than Search queries from the same users \(Figure[10](https://arxiv.org/html/2606.07489#S8.F10)\)\. On Bloom’s taxonomy, 76% of Computer queries demand higher\-order cognition \(Analyze, Evaluate, or Create\) compared to 55% for Search, a 21 pp gap\. The difference is concentrated at the top: 50% of Computer queries are classified as*Create*\(producing novel artifacts such as code, reports, or designs\) versus 26% for Search\. Conversely, Search is dominated by lower\-order tasks:*Remember*\-level factual lookups account for 21% of Search queries but only 7% of Computer queries\. On the Autor et al\. task\-type dimension, 71% of Computer queries are*abstract*\(non\-routine\) versus 53% for Search\.

The concentration at*Create*, rather than a uniform upward shift, suggests that autonomous execution specifically unlocks generative work \(writing code, drafting documents, building artifacts\) that users would not attempt through a question\-answer interface\. The middle levels of Bloom’s taxonomy \(Analyze, Evaluate\) show roughly equal shares between products, indicating that the shift is not simply users rephrasing the same intent more ambitiously, but a qualitative change in the type of work delegated\.

![Refer to caption](https://arxiv.org/html/2606.07489v1/x10.png)Figure 10:Cognitive complexity of Computer vs\. Search queries\.Left:Bloom’s Revised Taxonomy distribution\. Computer queries concentrate at*Create*\(50% vs\. 26%\); Search carries more lower\-order weight at*Remember*\(21% vs\. 7%\)\.Right:Task type \(Autor et al\.\)\. 71% of Computer queries involve abstract, non\-routine cognition vs\. 53% for Search\.
##### Knowledge breadth\.

Beyond being more cognitively demanding, Computer tasks require substantive expertise in more distinct domains \(Figure[11](https://arxiv.org/html/2606.07489#S8.F11)\)\. On average, a Computer task requires substantive expertise in 2\.40 O\*NET Knowledge domains, compared with 1\.74 for Search, a 38% increase\. The shift is also a change in shape, not just location\. Search queries cluster at 1–2 domains \(77% combined\): the typical Search query is a focused lookup across few topics\. Computer queries, by contrast, concentrate at 2–3 domains \(76% combined\) and are nearly three times as likely as Search to require three or more domains \(51% vs\. 17%\)\. These multi\-competency queries are exactly the tasks that in a pre\-agent workflow would require coordination across specialists, such as building a data\-visualization dashboard for financial models \(Design \+ Mathematics \+ Economics and Accounting\)\.

The composition of which domains are invoked also differs sharply between products\. Computer shows large prevalence gains in domains associated with production work: Design \(\+12 pp\), Mathematics \(\+9 pp\), Administration & Management \(\+9 pp\), Computers & Electronics \(\+7 pp\), Economics & Accounting \(\+6 pp\), Communications & Media \(\+4 pp\), and Sales & Marketing \(\+3 pp\)\. Search’s relative strongholds \(Food Production, Sociology & Anthropology, Mechanical, Medicine & Dentistry, and Foreign Language\) are precisely the domains where users tend to look up facts rather than execute on them\.

![Refer to caption](https://arxiv.org/html/2606.07489v1/x11.png)Figure 11:Required knowledge domains per query\.Left:Distribution of the number of O\*NET Knowledge areas a task requires substantive expertise in\. Search concentrates at 1–2 domains \(77%\); Computer queries concentrate at 2–3 \(76%\) and are nearly 3×\\timesas likely to require≥3\\geq 3domains \(51% vs\. 17%\)\.Right:Top 10 Knowledge domains by Computer prevalence, with Search shares side\-by\-side\. Computer’s largest gains are in executional/creative domains \(Design, Mathematics, Administration & Management, Economics & Accounting, Computers & Electronics\)\.
##### Task composability\.

Computer queries decompose into more distinct work activities than Search queries from the same users, and the gap widens as the O\*NET hierarchy is resolved more finely \(Figure[12](https://arxiv.org/html/2606.07489#S8.F12), Table[7](https://arxiv.org/html/2606.07489#S8.T7)\)\. At the coarsest GWA level, a typical Computer query engages 2\.95 activities versus 2\.24 for Search \(\+32%\), with 63% of Computer queries engaging three or more GWAs versus 36% of Search queries\. At the IWA level the gap is larger: a typical Computer query engages 4\.01 activities versus 2\.87 for Search, a 40% increase, and 83% of Computer queries engage three or more IWAs versus 60% of Search queries\. Composition sharpens the information\-versus\-execution contrast at every level\. At the GWA level, “Getting Information” is essentially shared \(Computer 58%, Search 56%\), while Computer’s gains concentrate in production\-oriented groupings: “Documenting/Recording Information” \(\+30 pp\), “Thinking Creatively” \(\+24 pp\), and “Analyzing Data or Information” \(\+14 pp\)\. The same contrast appears among IWAs: “Gather information from physical or electronic sources” remains the most prevalent activity on both sides \(Computer 46%, Search 42%\), but Computer’s gains concentrate in activities that produce artifacts or deliverables: “Create visual designs or displays” \(\+18 pp\), “Prepare informational or instructional materials” \(\+16 pp\), “Prepare reports of operational or procedural activities” \(\+13 pp\), and “Analyze business or financial data” \(\+9 pp\)\. The pattern strengthens at finer grains: at the DWA and TS levels, Computer queries engage59%59\\%and60%60\\%more activities than Search\. The ten most prevalent activities at each level for both products are shown in Tables[9](https://arxiv.org/html/2606.07489#A5.T9)–[12](https://arxiv.org/html/2606.07489#A5.T12)in Appendix[E](https://arxiv.org/html/2606.07489#A5)\.

![Refer to caption](https://arxiv.org/html/2606.07489v1/x12.png)Figure 12:Per\-query distribution of engaged O\*NET task\-level activities at four nesting depths: GWAs \(37 labels\), IWAs \(332 labels\), DWAs \(2,087 labels\), and TSs \(18,796 labels\)\.Left column:grouped\-bar histogram of per\-query count distributions\.Right column:corresponding empirical CDFs\. Across all four levels, Computer’s distribution shifts right relative to Search: per\-query means are2\.952\.95vs\.2\.242\.24for GWAs \(\+32%\+32\\%\),4\.014\.01vs\.2\.872\.87for IWAs \(\+40%\+40\\%\),3\.643\.64vs\.2\.292\.29for DWAs \(\+59%\+59\\%\), and3\.813\.81vs\.2\.382\.38for TSs \(\+60%\+60\\%\)\. Search mass concentrates at11–33activities while Computer concentrates at33–55and beyond\.Table 7:Task\-activity breadth at four levels of the O\*NET hierarchy\. Search/Computer columns report per\-query means; gap is\(Computer−Search\)/Search\(\\text\{Computer\}\-\\text\{Search\}\)/\\text\{Search\}\. The Computer−\-Search gap widens monotonically with granularity \(\+32%\+32\\%at the coarse GWA level to\+60%\+60\\%at the fine TS level\), indicating that Computer’s distinctiveness lies in fine\-grained executional work rather than coarse topical range\.
##### New tasks unlocked\.

Proposition[1](https://arxiv.org/html/2606.07489#Thmprop1)shows that agent access weakly expands the affordable task frontier toward weakly higher\-value tasks\. Here, we measure new\-task activity directly\. We treat each product’s classified activity inventory as a set: at thresholdkk, the*only\-Computer set*is the set of activities with more thankkoccurrences in Computer queries and at mostkkoccurrences in Search queries from the same users\.

Figure[13](https://arxiv.org/html/2606.07489#S8.F13)plots the share of Computer queries that engage at least one only\-Computer activity, swept acrossk=0k=0–1010at the four O\*NET nesting levels\. At the strictest definition \(k=0k=0: activities Search never attempts in this sample\),23%23\\%of Computer queries engage at least one only\-Computer TS,5%5\\%engage an only\-Computer DWA, under1%1\\%engage an only\-Computer IWA, and effectively none engage an only\-Computer GWA\. Relaxing the Search ceiling tok=5k=5raises these to38%38\\%,18%18\\%,2%2\\%, and under1%1\\%respectively\. At the coarse GWA and IWA levels, Search and Computer cover nearly the same topical surface \(the only\-Computer share stays below2%2\\%\), but at the fine\-grained TS level Computer’s inventory is substantially larger than Search’s, and a sizable share of Computer usage sits in the excess\.

The only\-Computer set concentrates in three capability clusters that align with the IWAs identified above:*software and web development*\(e\.g\., “Develop application\-specific software,” “Develop or maintain internal or external company Web sites”\),*documentation production*\(e\.g\., “Create or revise user instructions, procedures, or manuals,” “Prepare support documentation and training materials”\), and*data visualization and graphics*\(e\.g\., “Create graphs, charts, or other visualizations to convey the results of data analyses,” “Draw and print charts, graphs, illustrations, and other artwork, using computer,” “Develop diagrams or flow charts of system operation”\)\. These are precisely the categories where Computer’s tool use enables artifact delivery rather than description\. Computer’s expansion thus reflects not only a quantitative shift in cognitive demand and competency breadth but a qualitative one: at fine granularity, a meaningful fraction of Computer usage targets work that users essentially never directed at Search\.

![Refer to caption](https://arxiv.org/html/2606.07489v1/x13.png)Figure 13:Share of Computer queries that engage at least one only\-Computer activity, swept across the Search\-occurrence ceilingkkand the four O\*NET nesting levels\. An activity is in the only\-Computer set at thresholdkkif it appears more thankktimes in Computer queries and at mostkktimes in Search queries from the same dual\-product users\. At the strictest definition \(k=0k=0, Search occurrences must be zero\),23%23\\%of Computer queries land on a Computer\-only TS; atk=5k=5the share rises to38%38\\%and plateaus near41%41\\%byk=7k=7\. The GWA and IWA curves both flatten near zero, indicating that Search and Computer cover the same coarse topical surface; the TS curve grows steeply, indicating that Computer’s distinctiveness lies in fine\-grained executional work rather than topical range\.

## 9Discussion

This paper documents the downstream task\-level economic implications of giving users access to autonomous task execution enabled by AI agents: Computer completes tasks autonomously and with higher quality \(Section[6](https://arxiv.org/html/2606.07489#S6)\), which dramatically reduces the human time and cost required \(Section[7](https://arxiv.org/html/2606.07489#S7)\) and shifts activity toward broader and more cognitively demanding work \(Section[8](https://arxiv.org/html/2606.07489#S8)\)\. We make a few concluding comments and discuss the limitations and directions for future research\.

##### The role of autonomy\.

The autonomy results clarify why the downstream effects occur\. Agents eliminate the manual task\-decomposition and execution loop in conversational sessions and produce higher quality outputs\. The pre\-agent binding constraint on what users attempt is therefore not information access but execution capacity\. When execution is delegated to the agent, the user’s role shifts from operator to supervisor, reallocating time toward higher\-order work such as direction, verification, and task extension\.

##### From speed to scope\.

The existing evidence has largely focused on productivity; our results suggest that a productivity framing might understate the impact of autonomous agents\. Although the estimated time and cost savings are large, the more consequential finding may be scope expansion: users undertake tasks outside their primary domains and at higher levels of complexity\.

##### Limitations\.

We note a few important caveats\. First, the 90\-day observation window \(Feb 27–May 27, 2026\) captures an early\-adoption period in which users are disproportionately power users and paying subscribers\. Whether the patterns generalize to a broader population as the product matures remains an open question\. Second, the matched\-query methodology, while controlling for task content, applies only to the subset of Computer queries with close Search equivalents\. As we show, many Computer queries lack such analogues, and our estimates of autonomy and efficiency may not generalize to these tasks\. Nevertheless, given that Computer queries tend to be more complex, the corresponding gains are likely to be larger for these tasks\. Third, although sessions provide a natural unit for organizing distinct tasks, they are noisy proxies: users may distribute a single task across multiple sessions or, conversely, undertake multiple unrelated tasks within a single session\. Fourth, the efficiency estimates rely on assumed per\-tool human\-equivalent times, human supervision time, and LLM estimates\. Although the breakeven and sensitivity analysis shows robustness to mismeasurement, the absolute magnitudes should be interpreted as approximate\. Fifth, the scope analysis relies heavily on LLM\-based classification, which also introduces measurement error\. However, the magnitude of the gaps suggests that the patterns are unlikely to be driven by classification noise alone\. Lastly, we measure user behavior within the Perplexity ecosystem and do not observe activity outside it; as a result, we may not capture the full scope of users’ workflow and tool use\.

##### Future directions\.

Our conceptual framework and empirical analysis are restricted to the individual\-worker and task levels, and therefore leave open how firms, consumers, or the broader labor market may respond\. A natural next step is to study how these micro\-level changes aggregate to organizational and labor\-market outcomes\. If agents lower barriers to entry across occupational boundaries and expertise levels, and reduce coordination costs, individuals may produce outputs that previously required teams\. The relevant margin then extends beyond productivity on existing tasks to the recomposition of job bundles and team structures\. Future work could link agent usage to downstream workplace outcomes to assess whether agents are directed primarily toward accelerating existing workers, enabling workers to assume cross\-occupational responsibilities, or creating new categories of economically viable work\. Our early evidence points to all three, but firm\-level production and employment data are needed to assess how agents ultimately reshape the bundling of work, the definition of roles, and the structure of teams\.

## References

## Appendix

## Appendix AProofs for the Conceptual Framework

We collect here the proofs of all lemmas, propositions, and corollaries in Section[3](https://arxiv.org/html/2606.07489#S3)\.

###### Proof of Lemma[1](https://arxiv.org/html/2606.07489#Thmlem1)\(Agent is preferred for more complex tasks\)\.

The user prefers the conversational mode if and only ifC\(s;Conversational\)<C\(s;Agent\)C\(s;\\text\{Conversational\}\)<C\(s;\\text\{Agent\}\), i\.e\.fConversational\+mConversationals<fAgent\+mAgentsf\_\{\\text\{Conversational\}\}\+m\_\{\\text\{Conversational\}\}s<f\_\{\\text\{Agent\}\}\+m\_\{\\text\{Agent\}\}s\. Rearranging givess<\(fAgent−fConversational\)/\(mConversational−mAgent\)\.s<\(f\_\{\\text\{Agent\}\}\-f\_\{\\text\{Conversational\}\}\)/\(m\_\{\\text\{Conversational\}\}\-m\_\{\\text\{Agent\}\}\)\.The numerator is positive by Assumption[3\.1](https://arxiv.org/html/2606.07489#S3.SS1.SSS0.Px3)and the denominator by Assumption[3\.1](https://arxiv.org/html/2606.07489#S3.SS1.SSS0.Px4), sos∗s^\{\\ast\}is positive and well\-defined\. The strict direction of preference flips ats=s∗s=s^\{\\ast\}\. ∎

###### Proof of Proposition[1](https://arxiv.org/html/2606.07489#Thmprop1)\(Affordable value frontier expands\)\.

By Assumption[3\.1](https://arxiv.org/html/2606.07489#S3.SS1.SSS0.Px2)task opportunities are indivisible, so individual affordability under toolkit𝒯\\mathcal\{T\}reduces to the per\-task conditioncj𝒯≤Bc\_\{j\}^\{\\mathcal\{T\}\}\\leq B\. Agent access preserves the conversational mode as an option, so for every taskjj,

cjpost=min⁡\{C\(sj;Conversational\),C\(sj;Agent\)\}≤C\(sj;Conversational\)=cjpre\.c\_\{j\}^\{\\text\{post\}\}=\\min\\\{C\(s\_\{j\};\\text\{Conversational\}\),C\(s\_\{j\};\\text\{Agent\}\)\\\}\\leq C\(s\_\{j\};\\text\{Conversational\}\)=c\_\{j\}^\{\\text\{pre\}\}\.Thereforecjpre≤Bc\_\{j\}^\{\\text\{pre\}\}\\leq Bimpliescjpost≤Bc\_\{j\}^\{\\text\{post\}\}\\leq B\. The pre\-period individually affordable set is a subset of the post\-period individually affordable set, soupost≥upreu^\{\\text\{post\}\}\\geq u^\{\\text\{pre\}\}\.

By the task primitives and Assumption[3\.1](https://arxiv.org/html/2606.07489#S3.SS1.SSS0.Px1), tasks are weakly ordered by step count, and value weakly increases along this order\. Ifupost\>upreu^\{\\text\{post\}\}\>u^\{\\text\{pre\}\}, then everyj∈\{upre\+1,…,upost\}j\\in\\\{u^\{\\text\{pre\}\}\+1,\\ldots,u^\{\\text\{post\}\}\\\}is unaffordable under the conversational mode and affordable under the post\-agent toolkit by the definition of the endpoints, sovupost≥vuprev\_\{u^\{\\text\{post\}\}\}\\geq v\_\{u^\{\\text\{pre\}\}\}\. ∎

###### Proof of Proposition[2](https://arxiv.org/html/2606.07489#Thmprop2)\(Total value expands\)\.

By Assumption[3\.1](https://arxiv.org/html/2606.07489#S3.SS1.SSS0.Px2), realized valueW𝒯=∑ja𝒯,j∗vjW^\{\\mathcal\{T\}\}=\\sum\_\{j\}a^\{\\ast\}\_\{\\mathcal\{T\},j\}v\_\{j\}counts only fully completed tasks, so comparing toolkits reduces to comparing their feasible attempt policies\. Letapre,j∗a^\{\\ast\}\_\{\\text\{pre\},j\}be the pre\-period optimum, so

∑j=1Japre,j∗cjpre≤B\.\\sum\_\{j=1\}^\{J\}a^\{\\ast\}\_\{\\text\{pre\},j\}c\_\{j\}^\{\\text\{pre\}\}\\leq B\.Becausecjpost≤cjprec\_\{j\}^\{\\text\{post\}\}\\leq c\_\{j\}^\{\\text\{pre\}\}for every taskjj, the same policy is weakly cheaper under post\-period costs:

∑j=1Japre,j∗cjpost≤∑j=1Japre,j∗cjpre≤B\.\\sum\_\{j=1\}^\{J\}a^\{\\ast\}\_\{\\text\{pre\},j\}c\_\{j\}^\{\\text\{post\}\}\\leq\\sum\_\{j=1\}^\{J\}a^\{\\ast\}\_\{\\text\{pre\},j\}c\_\{j\}^\{\\text\{pre\}\}\\leq B\.The post\-period optimizer can therefore imitate the pre\-period policy\. By optimality ofapost∗a^\{\\ast\}\_\{\\text\{post\}\},

Wpost≥Wpre\.W^\{\\text\{post\}\}\\;\\geq\\;W^\{\\text\{pre\}\}\.∎

###### Proof of Corollary[1](https://arxiv.org/html/2606.07489#Thmcor1)\(Total surplus expands\)\.

When the conversational\-only chosen bundle exhausts the aggregate budget,Kpre=BK^\{\\text\{pre\}\}=B, and the post\-period policy is feasible by construction, soKpost≤B=KpreK^\{\\text\{post\}\}\\leq B=K^\{\\text\{pre\}\}\. Combined withWpost≥WpreW^\{\\text\{post\}\}\\geq W^\{\\text\{pre\}\}from Proposition[2](https://arxiv.org/html/2606.07489#Thmprop2),

Πpost−Πpre=\(Wpost−Wpre\)−\(Kpost−Kpre\)≥0\.\\Pi^\{\\text\{post\}\}\-\\Pi^\{\\text\{pre\}\}=\(W^\{\\text\{post\}\}\-W^\{\\text\{pre\}\}\)\-\(K^\{\\text\{post\}\}\-K^\{\\text\{pre\}\}\)\\geq 0\.∎

###### Proof of Corollary[2](https://arxiv.org/html/2606.07489#Thmcor2)\(Value\-to\-cost ratio expands\)\.

When the conversational\-only chosen bundle exhausts the aggregate budget,Kpre=B\>0K^\{\\text\{pre\}\}=B\>0, and the post\-period budget constraint givesKpost≤B=KpreK^\{\\text\{post\}\}\\leq B=K^\{\\text\{pre\}\}\. The pre\-period selected set is therefore nonempty, and since task values are positive,Wpre\>0W^\{\\text\{pre\}\}\>0; Proposition[2](https://arxiv.org/html/2606.07489#Thmprop2)then givesWpost≥Wpre\>0W^\{\\text\{post\}\}\\geq W^\{\\text\{pre\}\}\>0, so the post\-period selected set is also nonempty andKpost\>0K^\{\\text\{post\}\}\>0\. Therefore

WpostKpost≥WpreKpost≥WpreKpre,\\frac\{W^\{\\text\{post\}\}\}\{K^\{\\text\{post\}\}\}\\;\\geq\\;\\frac\{W^\{\\text\{pre\}\}\}\{K^\{\\text\{post\}\}\}\\;\\geq\\;\\frac\{W^\{\\text\{pre\}\}\}\{K^\{\\text\{pre\}\}\},where the first inequality usesWpost≥WpreW^\{\\text\{post\}\}\\geq W^\{\\text\{pre\}\}and the second usesKpost≤KpreK^\{\\text\{post\}\}\\leq K^\{\\text\{pre\}\}\. ∎

###### Proof of Proposition[3](https://arxiv.org/html/2606.07489#Thmprop3)\(Surplus decomposition\)\.

Starting fromΠ𝒯=W𝒯−K𝒯\\Pi^\{\\mathcal\{T\}\}=W^\{\\mathcal\{T\}\}\-K^\{\\mathcal\{T\}\},

ΔΠ\\displaystyle\\Delta\\Pi=∑j∈Apost\(vj−cjpost\)−∑j∈Apre\(vj−cjpre\)\\displaystyle=\\sum\_\{j\\in A^\{\\text\{post\}\}\}\\big\(v\_\{j\}\-c\_\{j\}^\{\\text\{post\}\}\\big\)\-\\sum\_\{j\\in A^\{\\text\{pre\}\}\}\\big\(v\_\{j\}\-c\_\{j\}^\{\\text\{pre\}\}\\big\)=∑j∈Apre∩Apost\(cjpre−cjpost\)\+∑j∈Apost∖Apre\(vj−cjpost\)−∑j∈Apre∖Apost\(vj−cjpre\)\.\\displaystyle=\\sum\_\{j\\in A^\{\\text\{pre\}\}\\cap A^\{\\text\{post\}\}\}\\big\(c\_\{j\}^\{\\text\{pre\}\}\-c\_\{j\}^\{\\text\{post\}\}\\big\)\+\\sum\_\{j\\in A^\{\\text\{post\}\}\\setminus A^\{\\text\{pre\}\}\}\\big\(v\_\{j\}\-c\_\{j\}^\{\\text\{post\}\}\\big\)\-\\sum\_\{j\\in A^\{\\text\{pre\}\}\\setminus A^\{\\text\{post\}\}\}\\big\(v\_\{j\}\-c\_\{j\}^\{\\text\{pre\}\}\\big\)\.The first term is weakly non\-negative becausecjpost≤cjprec\_\{j\}^\{\\text\{post\}\}\\leq c\_\{j\}^\{\\text\{pre\}\}for every task\. The entry and exit terms depend on selected task values and are not separately signed by these assumptions alone\. ∎

## Appendix BNumerical Example via Dynamic Programming

The discrete optimization problem in Section[3](https://arxiv.org/html/2606.07489#S3),

max\{aj\}j=1J∑j=1Jajvjs\.t\.∑j=1Jajcj𝒯≤B,aj∈\{0,1\},\\max\_\{\\\{a\_\{j\}\\\}\_\{j=1\}^\{J\}\}\\sum\_\{j=1\}^\{J\}a\_\{j\}v\_\{j\}\\qquad\\text\{s\.t\.\}\\qquad\\sum\_\{j=1\}^\{J\}a\_\{j\}c\_\{j\}^\{\\mathcal\{T\}\}\\leq B,\\qquad a\_\{j\}\\in\\\{0,1\\\},is a 0\-1 knapsack problem and admits a standard dynamic programming solution\. We describe the recurrence first and then illustrate it with a four\-task example\.

### B\.1Dynamic programming solution

##### Recurrence\.

Letf𝒯\(j,b\)f\_\{\\mathcal\{T\}\}\(j,b\)denote the maximum total value attainable using only tasks\{1,…,j\}\\\{1,\\ldots,j\\\}with remaining budgetbb, wherebbruns over a discretization of\[0,B\]\[0,B\]fine enough to resolve the costs\{cj𝒯\}\\\{c\_\{j\}^\{\\mathcal\{T\}\}\\\}\. The recurrence is

f𝒯\(j,b\)=\{max⁡\{f𝒯\(j−1,b\),f𝒯\(j−1,b−cj𝒯\)\+vj\}ifcj𝒯≤b,f𝒯\(j−1,b\)ifcj𝒯\>b,f\_\{\\mathcal\{T\}\}\(j,b\)\\;=\\;\\begin\{cases\}\\max\\bigl\\\{\\,f\_\{\\mathcal\{T\}\}\(j\-1,b\),\\;\\;f\_\{\\mathcal\{T\}\}\(j\-1,b\-c\_\{j\}^\{\\mathcal\{T\}\}\)\+v\_\{j\}\\,\\bigr\\\}&\\text\{if \}c\_\{j\}^\{\\mathcal\{T\}\}\\leq b,\\\\\[4\.0pt\] f\_\{\\mathcal\{T\}\}\(j\-1,b\)&\\text\{if \}c\_\{j\}^\{\\mathcal\{T\}\}\>b,\\end\{cases\}with boundaryf𝒯\(0,b\)=0f\_\{\\mathcal\{T\}\}\(0,b\)=0for allbb\. The optimal value of the user’s problem isW𝒯=f𝒯\(J,B\)W^\{\\mathcal\{T\}\}=f\_\{\\mathcal\{T\}\}\(J,B\)\. The two cases of the recurrence correspond to whether taskjjis included: take it \(gainvjv\_\{j\}, losecj𝒯c\_\{j\}^\{\\mathcal\{T\}\}from the budget\) or skip it\. The maximum over the two encodes optimal substructure\.

##### Recovering the optimal task set\.

The selected setA𝒯=\{j:a𝒯,j∗=1\}A^\{\\mathcal\{T\}\}=\\\{j:a^\{\\ast\}\_\{\\mathcal\{T\},j\}=1\\\}is recovered by backward tracing\. Starting at\(j,b\)=\(J,B\)\(j,b\)=\(J,B\), set

a𝒯,j∗=\{1ifcj𝒯≤bandf𝒯\(j,b\)=f𝒯\(j−1,b−cj𝒯\)\+vj,0otherwise,a^\{\\ast\}\_\{\\mathcal\{T\},j\}\\;=\\;\\begin\{cases\}1&\\text\{if \}c\_\{j\}^\{\\mathcal\{T\}\}\\leq b\\text\{ and \}f\_\{\\mathcal\{T\}\}\(j,b\)=f\_\{\\mathcal\{T\}\}\(j\-1,b\-c\_\{j\}^\{\\mathcal\{T\}\}\)\+v\_\{j\},\\\\ 0&\\text\{otherwise\},\\end\{cases\}then updateb←b−a𝒯,j∗cj𝒯b\\leftarrow b\-a^\{\\ast\}\_\{\\mathcal\{T\},j\}\\,c\_\{j\}^\{\\mathcal\{T\}\}andjj\. AfterJJiterations everya𝒯,j∗a^\{\\ast\}\_\{\\mathcal\{T\},j\}is determined and the implied total cost isK𝒯=∑ja𝒯,j∗cj𝒯K^\{\\mathcal\{T\}\}=\\sum\_\{j\}a^\{\\ast\}\_\{\\mathcal\{T\},j\}\\,c\_\{j\}^\{\\mathcal\{T\}\}\.

### B\.2Application to a four\-task example

We apply the DP to a stylized four\-task example chosen to illustrate the propositions\. Tasks are indexed byj∈\{1,2,3,4\}j\\in\\\{1,2,3,4\\\}with step counts

s1=1,s2=2,s3=3,s4=10\.s\_\{1\}=1,\\qquad s\_\{2\}=2,\\qquad s\_\{3\}=3,\\qquad s\_\{4\}=10\.In this example, the conversational mode alone attempts the middle task set\{2,3\}\\\{2,3\\\}, while the post\-agent toolkit attempts the expanded set\{1,2,3,4\}\\\{1,2,3,4\\\}\. This pattern is illustrative, not a general implication of the assumptions alone\.

##### Parameters\.

SetfConversational=1f\_\{\\text\{Conversational\}\}=1,mConversational=17m\_\{\\text\{Conversational\}\}=17,fAgent=18f\_\{\\text\{Agent\}\}=18,mAgent=1m\_\{\\text\{Agent\}\}=1\(Assumptions[3\.1](https://arxiv.org/html/2606.07489#S3.SS1.SSS0.Px3)and[3\.1](https://arxiv.org/html/2606.07489#S3.SS1.SSS0.Px4): agent has higher fixed cost but lower marginal cost\)\. Task values are

v1=20,v2=50,v3=70,v4=200,v\_\{1\}=20,\\qquad v\_\{2\}=50,\\qquad v\_\{3\}=70,\\qquad v\_\{4\}=200,satisfying Assumption[3\.1](https://arxiv.org/html/2606.07489#S3.SS1.SSS0.Px1)strictly in this example\. The aggregate budget isB=87B=87\. The induced upper affordability endpoints areupre=3u^\{\\text\{pre\}\}=3andupost=4u^\{\\text\{post\}\}=4\.

The breakeven threshold from Lemma[1](https://arxiv.org/html/2606.07489#Thmlem1)is

s∗=fAgent−fConversationalmConversational−mAgent=18−117−1=1716≈1\.06\.s^\{\\ast\}=\\frac\{f\_\{\\text\{Agent\}\}\-f\_\{\\text\{Conversational\}\}\}\{m\_\{\\text\{Conversational\}\}\-m\_\{\\text\{Agent\}\}\}=\\frac\{18\-1\}\{17\-1\}=\\tfrac\{17\}\{16\}\\approx 1\.06\.Fors\>s∗s\>s^\{\\ast\}agent is cheaper; fors<s∗s<s^\{\\ast\}the conversational mode is cheaper\.

##### Per\-task costs\.

The mode\-specific and effective costs at eachsjs\_\{j\}are:

##### Pre\-period optimum \(conversational only\)\.

Apply the DP recurrence withJ=4J=4,B=87B=87, and pre\-period costs\{18,35,52,171\}\\\{18,35,52,171\\\}\. Task44is individually unaffordable sincec4pre=171\>87c\_\{4\}^\{\\text\{pre\}\}=171\>87, so the recurrence’s second branch removes it from consideration\. Among the remaining tasks,\{1,2,3\}\\\{1,2,3\\\}would cost18\+35\+52=105\>8718\+35\+52=105\>87, while\{2,3\}\\\{2,3\\\}costs35\+52=8735\+52=87\. The DP yieldsfpre\(4,87\)=120f\_\{\\text\{pre\}\}\(4,87\)=120with selected setApre=\{2,3\}A^\{\\text\{pre\}\}=\\\{2,3\\\}:

Kpre=87,Wpre=120,Πpre=33\.K^\{\\text\{pre\}\}=87,\\quad W^\{\\text\{pre\}\}=120,\\quad\\Pi^\{\\text\{pre\}\}=33\.

##### Post\-period optimum \(conversational \+ agent\)\.

Apply the DP with post\-period costs\{18,20,21,28\}\\\{18,20,21,28\\\}\. Agent reduces the cost of tasks\{2,3\}\\\{2,3\\\}from35\+52=8735\+52=87to20\+21=4120\+21=41, freeing4646units of budget; that relief exactly pays for the two previously omitted endpoints \(task11at1818and task44at2828\)\. All four tasks are jointly affordable:

18\+20\+21\+28=87=B\.18\+20\+21\+28=87=B\.The DP yieldsfpost\(4,87\)=340f\_\{\\text\{post\}\}\(4,87\)=340withApost=\{1,2,3,4\}A^\{\\text\{post\}\}=\\\{1,2,3,4\\\}:

Kpost=87,Wpost=340,Πpost=253\.K^\{\\text\{post\}\}=87,\\quad W^\{\\text\{post\}\}=340,\\quad\\Pi^\{\\text\{post\}\}=253\.The mode mix in the post period reflects the sorting threshold: the short taskj=1j=1stays on the conversational mode \(cost18<1918<19\), while tasksj∈\{2,3,4\}j\\in\\\{2,3,4\\\}migrate to agent\.

##### Illustrating the propositions\.

We use this example to illustrate our prior propositions:

- •*Lemma[1](https://arxiv.org/html/2606.07489#Thmlem1)\(sorting\)\.*Realized post\-period choices respect the thresholds∗≈1\.06s^\{\\ast\}\\approx 1\.06:j=1j=1stays on the conversational mode, whilej∈\{2,3,4\}j\\in\\\{2,3,4\\\}go to agent\.
- •*Proposition[1](https://arxiv.org/html/2606.07489#Thmprop1)\.*The affordable value frontier expands: task44hasc4pre=171\>B=87≥28=c4postc\_\{4\}^\{\\text\{pre\}\}=171\>B=87\\geq 28=c\_\{4\}^\{\\text\{post\}\}, soupost=4\>3=upreu^\{\\text\{post\}\}=4\>3=u^\{\\text\{pre\}\}andv4≥v3v\_\{4\}\\geq v\_\{3\}\.
- •*Proposition[2](https://arxiv.org/html/2606.07489#Thmprop2)\.*Total value rises:Wpost=340≥120=WpreW^\{\\text\{post\}\}=340\\geq 120=W^\{\\text\{pre\}\}\.
- •*Corollary[1](https://arxiv.org/html/2606.07489#Thmcor1)\.*The cost condition required by the corollary holds, here with equality \(Kpost=87=KpreK^\{\\text\{post\}\}=87=K^\{\\text\{pre\}\}\), so total surplus rises:Πpost=253≥33=Πpre\\Pi^\{\\text\{post\}\}=253\\geq 33=\\Pi^\{\\text\{pre\}\}\.
- •*Corollary[2](https://arxiv.org/html/2606.07489#Thmcor2)\.*The aggregate value\-to\-cost ratio rises:Wpost/Kpost=340/87\>120/87=Wpre/KpreW^\{\\text\{post\}\}/K^\{\\text\{post\}\}=340/87\>120/87=W^\{\\text\{pre\}\}/K^\{\\text\{pre\}\}\.
- •*Proposition[3](https://arxiv.org/html/2606.07489#Thmprop3)\.*The surplus gain decomposes into an intensive margin of4646from cost savings on retained tasks,\(35−20\)\+\(52−21\)\(35\-20\)\+\(52\-21\), an entry margin of174174from adding tasks11and44,\(20−18\)\+\(200−28\)\(20\-18\)\+\(200\-28\), and no exit term;46\+174=220=ΔΠ46\+174=220=\\Delta\\Pi\.

## Appendix CSensitivity Analysis

The tool\-based efficiency estimates \(Section[7](https://arxiv.org/html/2606.07489#S7)\) rest on two assumptions about human time: the per\-tool human\-equivalent time estimates \(Table[3](https://arxiv.org/html/2606.07489#S7.T3)\), and the 10\-minute oversight charged to Computer\. We trace how Computer \+ Human’s advantage erodes as we make each assumption increasingly conservative, for both cost \(Section[C\.1](https://arxiv.org/html/2606.07489#A3.SS1)\) and time \(Section[C\.2](https://arxiv.org/html/2606.07489#A3.SS2)\)\.

In every figure, thexx\-axis is the stress factor, and each panel sweeps one assumption\. The left panel deflates the per\-tool human\-equivalent time estimates by this factor \(counterfactual human time = original / factor\), decreasing the human time estimate for Search \+ Human\. The right panel inflates the 10\-minute oversight assumption \(counterfactual human time = original×\\timesfactor\), increasing the human time estimate for Computer \+ Human\. Each thin line is one of the 18 domains; the bold line is the sample\-weighted overall\.

### C\.1Cost advantage

Figure[14](https://arxiv.org/html/2606.07489#A3.F14)plots the cost saving relative to Search \+ Human\. Computer retains a cost advantage at every level until each curve crosses zero: under the per\-tool stress the overall advantage survives to16×16\\timesand the tightest domain \(Travel\) to8×8\\times; under the oversight stress, to26×26\\timesoverall and12×12\\timesin the tightest domain \(Consumer Goods\)\.

![Refer to caption](https://arxiv.org/html/2606.07489v1/x14.png)Figure 14:Computer \+ Human cost saving \(%\) versus two stress factors on the human\-time assumptions\.Left:Per\-tool human\-equivalent time estimates deflated by the factor on thexx\-axis\.Right:The 10\-minute oversight assumption inflated by the factor on thexx\-axis\. Thin gray lines are the 18 individual domains; the bold teal line is the sample\-weighted overall\. The dashed line marks zero cost saving \(breakeven\); markers indicate where the overall and the tightest \(earliest\-crossing\) domain reach it\.
### C\.2Time advantage

Figure[15](https://arxiv.org/html/2606.07489#A3.F15)repeats the exercise in wall\-clock minutes rather than dollars\. Computer retains a time advantage at every level until each curve crosses zero: under the per\-tool stress the overall advantage survives to7×7\\timesand the tightest domain \(Consumer Goods\) to5×5\\times; under the oversight stress, to24×24\\timesoverall and11×11\\timesin the tightest domain \(Consumer Goods\)\.

![Refer to caption](https://arxiv.org/html/2606.07489v1/x15.png)Figure 15:Computer \+ Human time saving \(%\) versus two stress factors on the human\-time assumptions\.Left:Per\-tool human\-equivalent time estimates deflated by the factor on thexx\-axis\.Right:The 10\-minute oversight assumption inflated by the factor on thexx\-axis\. Thin gray lines are the 18 individual domains; the bold teal line is the sample\-weighted overall\. The dashed line marks zero time saving \(breakeven\); markers indicate where the overall and the tightest \(earliest\-crossing\) domain reach it\.

## Appendix DCross\-Product Complementarity

A natural question at the product level is whether Computer queries are substitutes for or complements to Search queries\. The conceptual framework suggests two countervailing forces\. On one hand, users could substitute Computer for Search on longer tasks, reducing Search usage\. On the other hand, Computer could free up budget to attempt additional shorter tasks where Search has the cost advantage, increasing Search usage\. We therefore estimate the adoption effect directly\. Although Search queries grew faster among Computer users than among non\-Computer users \(Figure[2](https://arxiv.org/html/2606.07489#S5.F2)\), Computer users may have been more active to begin with\.

To isolate the causal effect, we construct a matched sample using exact matching on three dimensions: subscription tier \(Pro, Max, free\), primary search topic \(20 domain categories derived from the user’s most frequent domain in the pre\-period Search queries\), and pre\-period Search intensity \(quartile bins\)\. We draw a sample of 100,000 Computer adopters whose first use of Computer falls between February 27 and May 13, and restrict the sample to users with at least one Search query in the pre\-adoption period, leaving 61,913 adopters\. We then draw an approximately equal\-sized control group of non\-adopters from the same tier×\\timestopic×\\timesintensity cells\. This design ensures that each adopter is compared to a non\-adopter with the same subscription plan, the same primary search interest, and a similar baseline search rate\.

We estimate a two\-way fixed\-effects \(TWFE\) difference\-in\-differences model on a balanced user\-day panel spanning February 13 to May 27 \(104 days\):

SearchQueriesit=αi\+γt\+β⋅Postit\+εit\\text\{SearchQueries\}\_\{it\}\\;=\\;\\alpha\_\{i\}\\;\+\\;\\gamma\_\{t\}\\;\+\\;\\beta\\cdot\\text\{Post\}\_\{it\}\\;\+\\;\\varepsilon\_\{it\}\(1\)whereSearchQueriesit\\text\{SearchQueries\}\_\{it\}is the number of Search queries by useriion daytt,αi\\alpha\_\{i\}andγt\\gamma\_\{t\}are user and day fixed effects, andPostit\\text\{Post\}\_\{it\}is an indicator equal to one if dayttfalls on or after userii’s first Computer query \(zero for never\-adopted controls\)\. Standard errors are clustered at the user level\. A positive \(negative\)β\\betaindicates complementarity \(substitution\) between Computer adoption and Search use\. We also estimate an intensive\-margin specification on the treated subsample:

SearchQueriesit=αi\+γt\+δ⋅ComputerQueriesit\+εit\\text\{SearchQueries\}\_\{it\}\\;=\\;\\alpha\_\{i\}\\;\+\\;\\gamma\_\{t\}\\;\+\\;\\delta\\cdot\\text\{ComputerQueries\}\_\{it\}\\;\+\\;\\varepsilon\_\{it\}\(2\)whereComputerQueriesit\\text\{ComputerQueries\}\_\{it\}is the number of Computer queries issued by useriion daytt\.δ\\deltacaptures the effect of the quantity of Computer queries, not of binary adoption\.

Table[8](https://arxiv.org/html/2606.07489#A4.T8)reports the regression estimates\. In column \(1\), the TWFE estimate of Computer adoption on daily Search queries isβ^=1\.05\\hat\{\\beta\}=1\.05\(p<0\.001p<0\.001\): adopting Computer is associated with 1\.05 additional Search queries per day\. In column \(2\), each additional Computer query on a given day is associated withδ^=0\.019\\hat\{\\delta\}=0\.019additional Search queries \(p<0\.01p<0\.01\), a small but significant effect\. A concern with the TWFE specification is that already\-treated users may serve as implicit controls for later adopters, biasing the estimate when treatment effects vary across cohorts\[goodman\-bacon2021\]\. Columns \(3\) and \(4\) of Table[8](https://arxiv.org/html/2606.07489#A4.T8)check the robustness of the result with alternative estimators\. In column \(3\), we estimate a stacked DiD in which each adoption cohort \(early: Feb 27–Mar 24; mid: Mar 25–Apr 18; late: Apr 19–May 13\) is compared only to the never\-treated control group, with cohort\-specific user and day fixed effects\. In column \(4\), we estimate separate DiD regressions for each cohort and aggregate via precision \(inverse variance\)\-weighted averaging\. The three approaches yield consistent estimates \(β^=1\.05\\hat\{\\beta\}=1\.05–1\.121\.12\)\. The complementarity is also broadly uniform across topics: estimating Equation[1](https://arxiv.org/html/2606.07489#A4.E1)separately within each of the 20 primary search\-topic categories yields a positiveβ^\\hat\{\\beta\}in every category\.

- •Notes:61,913 Computer adopters exactly matched to 61,786 non\-adopters on subscription tier, primary search topic \(20 categories\), and pre\-period Search intensity \(quartile bins\)\. Panel: Feb 13 – May 27, 2026 \(104 days\)\. Column \(2\): treated users only, post\-adoption days\. Column \(3\): stacked DiD with three cohorts \(early, mid, late\); each cohort paired with all never\-treated controls\. Column \(4\): precision\-weighted average of cohort\-specific DiD estimates \(early:β^=0\.939\\hat\{\\beta\}=0\.939; mid:1\.1161\.116; late:1\.2781\.278\)\. Standard errors clustered by user in parentheses\. Significance:p∗<0\.10\{\}^\{\*\}\\,p<0\.10;p∗∗<0\.05\{\}^\{\*\*\}\\,p<0\.05;p∗⁣∗∗<0\.01\{\}^\{\*\*\*\}\\,p<0\.01\.

Table 8:Effect of Computer adoption on daily Search queries\. Column \(1\): TWFE DiD \(Equation[1](https://arxiv.org/html/2606.07489#A4.E1)\)\. Column \(2\): intensive margin \(Equation[2](https://arxiv.org/html/2606.07489#A4.E2)\)\. Columns \(3\)–\(4\): robustness to staggered adoption\.
## Appendix ETop Activities at Each O\*NET Nesting Level

Tables[9](https://arxiv.org/html/2606.07489#A5.T9)–[12](https://arxiv.org/html/2606.07489#A5.T12)report the ten most prevalent O\*NET Generalized Work Activities \(GWAs\), Intermediate Work Activities \(IWAs\), Detailed Work Activities \(DWAs\), and Task Statements \(TSs\) by Computer prevalence in the scope sample, with Search prevalence shown alongside\. Computer queries are restricted to those that invoked at least one “do” tool; Search queries are matched at the user level \(one Search query per user, same 5,000 users\)\. Prevalence is the share of queries that engage at least one instance of that activity, classified by an LLM against the O\*NET catalog\.

Table 9:Top 10 O\*NET Generalized Work Activities \(GWAs\) by Computer prevalence, with Search side\-by\-side\.Table 10:Top 10 O\*NET Intermediate Work Activities \(IWAs\) by Computer prevalence, with Search side\-by\-side\.Table 11:Top 10 O\*NET Detailed Work Activities \(DWAs\) by Computer prevalence, with Search side\-by\-side\.Table 12:Top 10 O\*NET Task Statements \(TSs\) by Computer prevalence, with Search side\-by\-side\.
## Appendix FUser Interview Highlights

We conducted 45\-minute semi\-structured interviews with 25 Computer users \(6 enterprise users, 19 consumers\), recruited from active Computer users with at least 5 queries\. Participants walked through specific completed tasks, described their pre\-Computer workflow for each, and estimated the time and cost that workflow would have taken\. We group the findings into five recurring themes, summarizing the patterns across participants rather than attributing claims to individuals\. These self\-reports are subject to recall and selection bias and should be read as corroborative evidence; where participants gave concrete figures, we report them\.

Section[F\.1](https://arxiv.org/html/2606.07489#A6.SS1)gathers*quality*reports—self\-reports of output quality that complement the next\-turn user dissatisfaction signal in Section[6](https://arxiv.org/html/2606.07489#S6)\. Section[F\.2](https://arxiv.org/html/2606.07489#A6.SS2)collects*efficiency*reports with specific before versus after time and cost comparisons, which we use to cross\-validate the efficiency estimates in Section[7](https://arxiv.org/html/2606.07489#S7)\. Section[F\.3](https://arxiv.org/html/2606.07489#A6.SS3)covers*recurrent task automation*—automated workflows that users run repeatedly on a fixed schedule\. Section[F\.4](https://arxiv.org/html/2606.07489#A6.SS4)reports*parallel work*, where asynchronous delegation lets users fire off tasks and continue other activity concurrently\. Finally, Section[F\.5](https://arxiv.org/html/2606.07489#A6.SS5)documents*scope expansion*—users performing tasks outside their domains of expertise—corroborating the scope analysis in Section[8](https://arxiv.org/html/2606.07489#S8)\.

### F\.1Quality improvement

Participants across legal, founder, and product/consulting roles reported high output quality on demanding tasks\. Legal users described high\-quality legal first drafts and output on their hardest accounting task; founders reported technical specification documents praised by their technical leadership; and product/consulting users described polished, presentation\-ready slide decks\. These self\-reports align with the low next\-turn dissatisfaction observed in Section[6](https://arxiv.org/html/2606.07489#S6)\.

### F\.2Efficiency gains

Efficiency claims were the most common and most quantified theme, with reported speedups spanning roughly5×5\\timesto300×300\\times\(per\-participant median≈\\approx25×\\times\) and cost reductions of two to three orders of magnitude\. The gains clustered by work type:

- •Legal workclustered at the high end for drafting: legal research and writing dropped from≈\\approx2 hours to≈\\approx5 minutes \(≈\\approx24×\\times\), drafting from 1–2 days to 5–10 minutes \(≈\\approx95×\\times\), guardianship accounting from a full day to≈\\approx30 minutes \(≈\\approx16×\\times\), and an appraisal comparison report from≈\\approx1 week to≈\\approx15 minutes \(≈\\approx160×\\times\)\.
- •Financial and consulting workshowed the largest synthesis gains: a risk\-reporting framework went from weeks to under an hour \(≈\\approx40×\\times\), a 1,000\-page synthesis from≈\\approx2 weeks to≈\\approx15 minutes \(≈\\approx320×\\times\), and three major analyses from days–weeks to≈\\approx1 hour \(≈\\approx10–80×\\times\)\.
- •Product builds and websitescompressed from months to days: a full web product build from 3–6 months to 4–5 days \(≈\\approx15–35×\\times\), a market\-ready product delivered with≈\\approx12 hours of human input over a 72\-hour window \(≈\\approx6×\\timesleverage\), and a full website rebuilt ground\-up in≈\\approx2\.5 days\. Course content dropped from 1–3 weeks to≈\\approx1 day \(≈\\approx5–15×\\times\)\.
- •Cost displacementwas largest on work that would otherwise require outside specialists or vendors, where reported spend fell by roughly two to three orders of magnitude \(≈\\approx120–750×\\timescheaper\)\.

These accounts cross\-validate the production\-data efficiency estimates in Section[7](https://arxiv.org/html/2606.07489#S7)\.

### F\.3Recurrent task automation

Several users described moving from one\-off prompting to scheduled, recurring automations\. Reported workflows included periodic leadership reports assembled from cloud documents and posted to a review channel, weekend jobs that summarize a week’s email, calendar, and project\-tracker activity and propose the next week’s priorities, and automated CRM synchronization and lead creation\. A recurring benefit was keeping previously neglected workstreams—email backlogs, receivables collection—continuously up to date rather than handling them in batches\.

### F\.4Parallel work

Because Computer executes autonomously, users described delegating tasks and continuing other work in parallel: firing off batches of asynchronous work and checking back later, setting up multi\-step projects unattended, and submitting tasks before stepping away\. Several characterized the resulting workflow as among the most productive stretches of their careers\.

### F\.5Scope expansion

Users repeatedly described taking on work outside their domain of expertise: handling software and data\-engineering tasks, building software modules without a technical background, and preparing legal work without formal legal training\. A recurring theme was spanning functions that normally require separate specialists, such as combining accounting, legal, and compliance work\. These accounts corroborate the cross\-occupation findings in Section[8](https://arxiv.org/html/2606.07489#S8)\.
How AI Agents Reshape Knowledge Work: Autonomy, Efficiency, and Scope

Similar Articles

How AI Agents Reshape Knowledge Work (18 minute read)

@omarsar0: New paper on how AI agents are reshaping knowledge work. This is a nice economic read on where agents actually change k…

AI agents might become the biggest productivity shift since the internet

Beyond the hype: I just watched an AI agent automate a 4-hour research workflow in 18 minutes.

AI Agents Are Finally Becoming Actually Useful

Submit Feedback

Similar Articles

How AI Agents Reshape Knowledge Work (18 minute read)
@omarsar0: New paper on how AI agents are reshaping knowledge work. This is a nice economic read on where agents actually change k…
AI agents might become the biggest productivity shift since the internet
Beyond the hype: I just watched an AI agent automate a 4-hour research workflow in 18 minutes.
AI Agents Are Finally Becoming Actually Useful