@Xudong07452910: This paper makes me feel that our discussion about "AI will replace programmers" might be completely misguided. The core point: The emergence of AI Agents is not about making software engineers more efficient, but about making the very act of "permanently encoding decision logic into software" increasingly unnecessary. The author describes a more fundamental paradigm shift: tra...

X AI KOLs Timeline Papers

Summary

This paper argues that the emergence of AI Agents is not about making programmers more efficient, but fundamentally changes the nature of the software paradigm — code transforms from static artifacts that permanently solidify decision logic into temporary tools dynamically generated by LLMs and discarded after use; the core of software engineering will shift towards designing reliable inference constraint boundaries.

This paper makes me feel that our discussion about "AI will replace programmers" might be completely misguided. Core point: The emergence of AI Agents is not about making software engineers more efficient, but about making the very act of "permanently encoding decision logic into software" increasingly unnecessary. The author describes a more fundamental paradigm shift: The essence of traditional software engineering is that humans "solidify" judgment logic into code — if-else, state machines, algorithms — all ways of "hard-coding" human decisions upfront. But in Agent systems centered on LLMs as reasoning engines, code becomes "a temporary tool, generated on the fly and discarded after use". For each task, the Agent dynamically generates the required code and no longer needs it after execution. Decisions are no longer pre-encoded but are dynamically produced by the LLM's reasoning loop at runtime. This is not an incremental improvement, but a structural replacement of the software production paradigm. The detail I find most noteworthy in this view: This is not just an upgrade of productivity tools, but a change in the very role of the concept of "software". Previously, code was the "center of the system", with the Agent framework as a shell. Now the LLM reasoning loop is the center, and code becomes a temporary aid within the shell. If this trend continues, the core capability of software engineering may no longer be "writing good code", but "designing reliable inference constraint boundaries". Previously we focused on writing maintainable code; in the future we may need to focus more on designing reliable inference boundaries. https://arxiv.org/pdf/2606.05608
Original Article
View Cached Full Text

Cached at: 06/08/26, 05:18 AM

This paper gives me the feeling that our entire discussion about “AI replacing programmers” might be completely misguided.

Core insight: The emergence of AI Agents is not about making software engineers more efficient—it’s about making the very act of “permanently encoding decision logic into software” increasingly unnecessary.

The author describes a more fundamental paradigm shift: The essence of traditional software engineering is that humans “solidify” decision logic into code—if-else, state machines, algorithms, all ways of “hardcoding” human decisions in advance. But in Agent systems with LLM as the core reasoning engine, code becomes a “temporarily generated, disposable tool”. For each task, the Agent dynamically generates the needed code, executes it, and discards it. Decisions are no longer pre-encoded; they are dynamically produced at runtime by the LLM reasoning loop. This is not incremental improvement, but a structural replacement of the software production paradigm.

I think the most noteworthy detail in this view is: This is not just an upgrade of productivity tools, but a change in the role of the concept of “software” itself. Previously, code was the “center of the system”, and the Agent framework was the shell. Now, the LLM reasoning loop is the center, and code becomes temporary assistance within the shell. If this trend continues, the core capability of software engineering may no longer be “writing good code”, but “designing reliable reasoning constraint boundaries”. Previously we focused on writing maintainable code; in the future, we may need to focus on designing reliable reasoning boundaries.

https://arxiv.org/pdf/2606.05608


How AI Agents Are Fundamentally Restructuring the Software Paradigm

Source: https://arxiv.org/html/2606.05608

Abstract

For over half a century, software engineering has operated on a foundational premise: human engineers decompose problems, encode decision logic into static code, and manually adapt that code as requirements evolve. This paper argues that the emergence of AI agents – systems where large language models serve as the primary reasoning engine, dynamically generating and discarding code as an instrumental resource – constitutes not an incremental improvement but a fundamental restructuring of the software paradigm. Drawing on first-principles analysis of complexity scaling, we formalize the distinction between traditional software (where code is the carrier of decision logic) and agentic systems (where code is ephemeral tooling for an LLM-driven reasoning loop). We trace the historical arc from licensed software to SaaS to what we term Agent-as-a-Service (AaaS), showing that each shift transferred additional complexity away from end-users. We introduce the concept of Agentic Engineering as an emergent discipline – distinct from software engineering in its core object of study, control model, and human role. Through analysis of recent benchmark evidence including SWE-bench Verified, EvoClaw, and LangChain’s multi-agent coordination studies, we demonstrate both the transformative potential of the agentic paradigm and its current limitations. We conclude with a four-stage roadmap toward self-evolving agent ecosystems and concrete recommendations for practitioners navigating this transition.

1Introduction

Software engineering, as codified at the 1968 NATO Conference[1 (https://arxiv.org/html/2606.05608#bib.bib1)], was born from a crisis: systems were growing in complexity beyond what ad-hoc programming practices could manage. The discipline’s founding insight was that rigorous methodologies—structured design, modular decomposition, configuration management, systematic testing—could tame this complexity. For five decades, this bet largely paid off. We moved from waterfall to agile, from monoliths to microservices, from manual deployment to CI/CD.

Yet a deeper structural problem persisted. As Brooks observed inThe Mythical Man-Month[2 (https://arxiv.org/html/2606.05608#bib.bib2)], software complexity exhibits a fundamentally different scaling behavior than other engineering domains. Unlike bridges or circuits, software has no manufacturing step—the designisthe product. Every new feature, every edge case, every integration point adds to a combinatorial explosion of possible states and interactions that Brooks characterized as “essential complexity”: complexity inherent to the problem itself, not accidental to the implementation.

This paper contends that the emergence of AI agents does not merely offer a new tool within the existing paradigm. Rather, it dissolves the very premise on which software engineering was founded. When a large language model (LLM)[12 (https://arxiv.org/html/2606.05608#bib.bib12)]can understand a task, decompose it into subtasks, dynamically generate code to execute those subtasks, and discard that code when it’s no longer needed, the role of code changes fromthe system itselftoan ephemeral instrument of reasoning. This shift is as fundamental as the transition from analog circuits to stored-program computers.

We make three central claims:

  1. 1.First-Principles Necessity.The agentic paradigm is not a market preference but an inevitable consequence of complexity scaling laws. Traditional software requires human engineers to explicitly encode every decision; LLM-based agents can navigate complexity non-linearly by outsourcing reasoning to models whose capacity grows with training compute.
  2. 2.Paradigm Shift, Not Optimization.The transition from “AI→\rightarrowSoftware→\rightarrowResult” to “Agent→\rightarrowResult” eliminates the software artifact as a necessary intermediary—comparable to how SaaS eliminated on-premise installation as a necessary intermediary. We formalize this as the third major paradigm shift in software delivery.
  3. 3.Emergent Discipline.Agentic Engineering is emerging as a distinct practice with its own concepts, tools, and metrics. Its practitioners are not “better programmers” but a fundamentally different role: intent architects, agent coordinators, and outcome auditors.

The remainder of this paper is structured as follows. Section 2 presents a first-principles analysis of traditional software and agent-based systems, including a formal complexity argument. Section 3 traces the historical paradigm shifts in software delivery and positions AaaS as the logical endpoint. Section 4 defines Agentic Engineering as a discipline and contrasts it with traditional software engineering. Section 5 reviews empirical evidence from recent benchmarks, acknowledging both breakthroughs and persistent challenges. Section 6 proposes an evolutionary roadmap. Section 7 concludes with implications for practitioners and the research community.

2First-Principles Analysis

2.1The Nature of Traditional Software

We begin with a precise definition.

Definition 2.1(Traditional Software System).

A traditional software systemSSis a tupleS=(C,D,E)S=(C,D,E)where:

  • •CCis a set of computational resources (CPU, memory, I/O);
  • •DDis a set of deterministic decision rules encoded in source code;
  • •EEis an execution environment that evaluatesDDagainst inputs to produce outputs.

The critical property is thatDDis static with respect to execution: all decision logic must be explicitly written by human engineers before the system encounters any input.

Under this definition, every feature addition, every bug fix, every adaptation to a changing environment requires a human to (a) understand the change needed, (b) locate the correct position inDD, (c) modify the logic without introducing regressions, and (d) verify correctness. The cost of each change is a function of the size ofDDand the density of its internal dependencies.

2.2The Complexity Barrier

Brooks[2 (https://arxiv.org/html/2606.05608#bib.bib2)]distinguished betweenaccidental complexity(artifacts of particular implementations) andessential complexity(inherent to the problem). While decades of advances—higher-level languages, frameworks, automated testing—have systematically reduced accidental complexity, essential complexity remains unbounded. In fact, as systems grow, the interaction surface between components grows combinatorially.

Proposition 2.1(Complexity Scaling).

For a system withnncomponents, each potentially interacting with any other, the number of possible interaction pathsP(n)P(n)is bounded by:

P(n)∈Θ(2n)P(n)\in\Theta(2^{n})(1)This arises because each of the(n2)\binom{n}{2}pairs may or may not have a meaningful interaction, yielding2(n2)2^{\binom{n}{2}}possible dependency graphs. While real systems do not realize all configurations, the upper bound on complexity grows exponentially, while human cognitive capacity to reason about these interactions is essentially constant.

This mismatch is the deep structural reason why software projects experience declining marginal productivity as they grow. The traditional response—hierarchical decomposition, modular interfaces, encapsulation—reduces the constant factor but does not change the asymptotic behavior.

2.3Agentic Systems: A Formal Model

In contrast, an agentic system operates on fundamentally different principles.

Definition 2.2(AI Agent System).

An AI agent systemAAis a tupleA=(M,T,M,Π)A=(M,\mathcal{T},\mathcal{M},\Pi)where:

  • •MMis a large language model serving as the reasoning engine;
  • •T\mathcal{T}is a set of executable tools (code interpreters, APIs, databases, file systems);
  • •M\mathcal{M}is a memory subsystem (short-term context, long-term vector store);
  • •Π\Piis a planning mechanism that decomposes user intent into action sequences.

The system operates by iteratively executing:at←M(st,M)a_{t}\leftarrow M(s_{t},\mathcal{M}),st+1←exec(at)s_{t+1}\leftarrow\text{exec}(a_{t}), wherests_{t}is the system state at timettandata_{t}is the action chosen by the model.

The key distinction is that in an agentic system, the decision logic isgenerated at runtime. The LLMMMcan dynamically produce code, invoke tools, and adjust its behavior based on intermediate results—none of which was explicitly pre-programmed. The code it generates is not the system; it is a transient artifact, produced and discarded as needed.

This distinction maps cleanly to Karpathy’s “Software 2.0” framework[3 (https://arxiv.org/html/2606.05608#bib.bib3)], but extends it further. In Karpathy’s formulation, neural networks replace hand-crafted program logic with learned weights. Agentic systems go a step further: the neural network does not merelyreplacethe program—itwrites programs on demand, using code as a tool in service of broader reasoning goals. This pattern is consistent with the ReAct framework[9 (https://arxiv.org/html/2606.05608#bib.bib9)], which demonstrated that interleaving reasoning traces with tool-use actions substantially improves task performance, and with Chain-of-Thought prompting[8 (https://arxiv.org/html/2606.05608#bib.bib8)], which showed that explicit intermediate reasoning steps unlock latent capabilities in LLMs.

2.4Why Agents Inevitably Scale Better

Consider a taskTTwhose solution requires reasoning over a space of sizeNN. Under the traditional paradigm:

  • •A human engineer must mentally traverse this space to identify the solution path.
  • •The path must then be encoded as a static program.
  • •Human cognitive capacityCHC_{H}is essentially fixed.
  • •Thus, forN>CHN>C_{H}, the task is infeasible at any realistic cost.

Under the agentic paradigm:

  • •The LLMMMtraverses the space, with effective capacityCMC_{M}that scales with model size and training compute.
  • •The planΠ\PidecomposesTTinto subproblems, each handled independently.
  • •Code is generated only for the specific solution path, not for all contingencies.
  • •As LLM capabilities improve (which they have been, exponentially),CMC_{M}grows correspondingly.

Thus, the agentic paradigm decouples solution capacity from human cognitive limits. This is not a 10% improvement; it is a qualitative change in what kinds of problems can be economically addressed.

3From SaaS to AaaS: The Third Paradigm Shift

3.1Three Generations of Software Delivery

The history of commercial software can be understood as a progressive transfer of complexity away from the end-user. Table1 (https://arxiv.org/html/2606.05608#S3.T1)summarizes this trajectory.

Table 1:Three Generations of Software DeliveryEach transition follows the same pattern: the party best positioned to absorb complexity absorbs it, and the party least positioned to manage it is liberated from it. SaaS liberated businesses from server rooms; AaaS promises to liberate them from the need to specifyhowa result should be produced—they need only specifywhatresult they want.

3.2The Failure of “AI→\rightarrowSoftware→\rightarrowResult”

The dominant enterprise AI paradigm to date has beenAI-augmented development: use LLMs to help human engineers write code faster, within the traditional software lifecycle. We denote this as the “AI→\rightarrowSoftware→\rightarrowResult” pipeline.

This approach has three structural weaknesses:

  1. 1.Bottleneck persistence.The human engineer remains the critical path for design decisions, architecture, integration testing, and deployment. AI accelerates code generation (a sub-step of implementation) but does not remove the human from any phase.
  2. 2.Complexity ceiling intact.The final deliverable remains a traditional software systemS=(C,D,E)S=(C,D,E). Its complexity still scales with the size ofDD, and it still requires human understanding for any modification. AI merely made construction ofDDsomewhat faster.
  3. 3.Iteration latency.Even with AI assistance, any functional change requires traversing the full chain: requirements→\rightarrowdesign→\rightarrowcode→\rightarrowtest→\rightarrowdeploy. This latency cannot be reduced below human communication and coordination speeds.

3.3“Agent→\rightarrowResult”: Eliminating the Intermediary

The alternative paradigm eliminates the software artifact as a necessary intermediary:

  1. 1.Human articulates intent and constraints to an agent.
  2. 2.Agent autonomously plans, executes (generating code as needed), validates, and delivers the result.
  3. 3.Human audits the outcome and provides feedback.

In this model, software is not delivered;outcomesare delivered. The agent may generate thousands of lines of code, execute database queries, call external APIs, produce visualizations—all ephemerally. What persists is the agent’scapability, not its intermediate artifacts. Kumar and Ramagopal[7 (https://arxiv.org/html/2606.05608#bib.bib7)]capture this distinction precisely: “AI coding agents excel at translating intent into code within a single user-driven session. Agentic engineering operates at a higher level of abstraction—it’s a control plane that orchestrates cross-team workflows, maintains long-term memory across agents, and manages state and traceability across the full software delivery lifecycle.”

4Agentic Engineering: A New Discipline

4.1Defining the Field

Agentic Engineering, formally introduced by LangChain in April 2026[7 (https://arxiv.org/html/2606.05608#bib.bib7)], is defined as “a multi-agent coordination model where AI agents function as digital team members—each with defined roles, shared memory, and a unified observability layer—to drive software through the entire delivery pipeline, not merely to generate code faster.”

Wang et al.[4 (https://arxiv.org/html/2606.05608#bib.bib4)]provide a foundational taxonomy of LLM-based agents in software engineering, identifying three core modules; a complementary survey by Guo et al.[13 (https://arxiv.org/html/2606.05608#bib.bib13)]offers a systematic treatment of multi-agent collaboration patterns and progress in LLM-based multi-agent systems.

PerceptionMulti-modal inputprocessingMemorySemantic, episodic,proceduralActionInternal reasoning +external tool useLLM Reasoning CoreExternal EnvironmentFigure 1:The LLM-based agent framework for software engineering, adapted from Wang et al.[4 (https://arxiv.org/html/2606.05608#bib.bib4)]. The perception module handles multi-modal input; the memory module maintains semantic, episodic, and procedural knowledge; the action module executes both internal reasoning and external tool invocations. All are orchestrated by the LLM reasoning core.A concrete realization of this architecture can be observed in Hermes Agent[14 (https://arxiv.org/html/2606.05608#bib.bib14)], an open-source framework by Nous Research that operationalizes the perception-memory-action model with a distinctive self-evolution mechanism. Its most consequential feature is a closed learning loop: after completing complex tasks, the agent autonomously creates reusable Skil

Similar Articles

@Xudong07452910: This paper is a must-read for heavy users of Claude Code, Codex, or other AI Agents. It doesn't study how Agents fail on benchmarks, but a more real problem: In real development, what exactly are AI coding agents doing...

X AI KOLs Timeline

This paper analyzes 20,574 real-world coding-agent sessions to identify how AI agents misalign with developer intent, finding that constraint violations and inaccurate self-reporting are the most common failure modes, imposing trust and effort costs rather than irreversible damage.

@Xudong07452910: This might be the last paper written by humans for AI to read. Recently came across a paper co-authored by 37 authors from Stanford, CMU, Michigan, etc.: 'The Last Human-Written Paper'. The core point is quite bold: the centuries-old paper format may be outdated in the AI era...

X AI KOLs Timeline

A paper co-authored by 37 authors from Stanford, CMU, Michigan, etc. proposes ARA (Agent-native Research Artifact) to replace the traditional paper format, aiming to solve the narrative tax and engineering tax, enabling AI agents to understand, reproduce, and extend research.

@runes_leo: At Sequoia Ascent on 4/30, Karpathy compressed this year’s most valuable explanation of AI into three core arguments. You’ll see AI differently after reading this. 1. AI Isn’t Just “Faster,” It’s a New Paradigm For the past two years, the narrative has been that AI speeds things up. Karpathy says this is a misunderstanding...

X AI KOLs Timeline

This article summarizes Karpathy’s core points at the Sequoia Ascent conference, highlighting that AI is a paradigm shift restructuring workflows rather than merely an acceleration tool. It introduces the concept of a "jagged edge" for model capabilities based on verifiability and economic viability, and predicts that future software will evolve into an agent-native architecture where LLMs serve as the logic layer and traditional code functions as sensors and actuators.

@hongming731: Alibaba's article on organizational R&D in the AI Native era is well worth reading. It addresses a critical foundational issue: for the past two millennia, organizational structures have been built around human limitations. Humans forget, get tired, misunderstand, and have emotions. The number of people one can stably collaborate with and manage is limited, and information inevitably degrades as it passes between hierarchies...

X AI KOLs Timeline

Alibaba released insights on organizational R&D in the AI Native era, pointing out that traditional organizational structures need to shift from accommodating human limitations to adapting to the efficient execution of AI Agents. The article emphasizes that the core bottleneck of AI transformation lies in outdated information formats; implicit experience must be transformed into AI-understandable infrastructure, while preserving the human role in innovation and cultural building.