@Xudong07452910: 这篇论文让我感觉，我们对「AI 会取代程序员」这件事的讨论方向可能全错了。核心观点：AI Agent 的出现不是让软件工程师工作效率更高，而是让「把决策逻辑永久编码进软件」这件事本身变得越来越不必要。作者说的是一个更根本的范式变化：传…

X AI KOLs Timeline 2026/06/08 03:17 论文

ai-agents llm software-engineering paradigm-shift code-generation ephemeral-code agentic-computing

摘要

这篇论文认为AI Agent的出现不是让程序员更高效，而是从根本上改变了软件范式的本质——代码从永久固化决策逻辑的静态产物，变成了LLM动态生成、用完即弃的临时工具，软件工程的核心将转向设计可靠的推理约束边界。

这篇论文让我感觉，我们对「AI 会取代程序员」这件事的讨论方向可能全错了。核心观点：AI Agent 的出现不是让软件工程师工作效率更高，而是让「把决策逻辑永久编码进软件」这件事本身变得越来越不必要。作者说的是一个更根本的范式变化：传统软件工程的本质是，人类把判断逻辑「固化」成代码——if-else、状态机、算法，这些都是把人类决策「提前写死」的方式。但在以 LLM 为核心推理引擎的 Agent 系统里，代码变成了「临时生成、用完即扔的工具」，每次任务，Agent 动态生成需要的代码，执行完就不需要了。决策不再被预先编码，而是在运行时由 LLM 推理循环动态产生。这不是增量改进，而是软件生产范式的结构性替换。我觉得这个观点里最值得注意的细节是：这不只是生产力工具的升级，而是「软件」这个概念本身的角色在变。以前代码是「系统的中心」，Agent 框架是外壳。现在 LLM 推理循环是中心，代码变成了外壳里的临时辅助。如果这个趋势持续，软件工程的核心能力可能不再是「写出好代码」，而是「设计出可靠的推理约束边界」。以前我们关注写出可维护的代码，以后可能更要关注设计出可靠的推理边界。 https://arxiv.org/pdf/2606.05608

查看原文

查看缓存全文

缓存时间: 2026/06/08 05:18

这篇论文让我感觉，我们对「AI 会取代程序员」这件事的讨论方向可能全错了。

核心观点：AI Agent 的出现不是让软件工程师工作效率更高，而是让「把决策逻辑永久编码进软件」这件事本身变得越来越不必要。

作者说的是一个更根本的范式变化：传统软件工程的本质是，人类把判断逻辑「固化」成代码——if-else、状态机、算法，这些都是把人类决策「提前写死」的方式。但在以 LLM 为核心推理引擎的 Agent 系统里，代码变成了「临时生成、用完即扔的工具」，每次任务，Agent 动态生成需要的代码，执行完就不需要了。决策不再被预先编码，而是在运行时由 LLM 推理循环动态产生。这不是增量改进，而是软件生产范式的结构性替换。

我觉得这个观点里最值得注意的细节是：这不只是生产力工具的升级，而是「软件」这个概念本身的角色在变。以前代码是「系统的中心」，Agent 框架是外壳。现在 LLM 推理循环是中心，代码变成了外壳里的临时辅助。如果这个趋势持续，软件工程的核心能力可能不再是「写出好代码」，而是「设计出可靠的推理约束边界」。

以前我们关注写出可维护的代码，以后可能更要关注设计出可靠的推理边界。

https://arxiv.org/pdf/2606.05608

How AI Agents Are Fundamentally Restructuring the Software Paradigm

Source: https://arxiv.org/html/2606.05608

Abstract

For over half a century, software engineering has operated on a foundational premise: human engineers decompose problems, encode decision logic into static code, and manually adapt that code as requirements evolve. This paper argues that the emergence of AI agents – systems where large language models serve as the primary reasoning engine, dynamically generating and discarding code as an instrumental resource – constitutes not an incremental improvement but a fundamental restructuring of the software paradigm. Drawing on first-principles analysis of complexity scaling, we formalize the distinction between traditional software (where code is the carrier of decision logic) and agentic systems (where code is ephemeral tooling for an LLM-driven reasoning loop). We trace the historical arc from licensed software to SaaS to what we term Agent-as-a-Service (AaaS), showing that each shift transferred additional complexity away from end-users. We introduce the concept of Agentic Engineering as an emergent discipline – distinct from software engineering in its core object of study, control model, and human role. Through analysis of recent benchmark evidence including SWE-bench Verified, EvoClaw, and LangChain’s multi-agent coordination studies, we demonstrate both the transformative potential of the agentic paradigm and its current limitations. We conclude with a four-stage roadmap toward self-evolving agent ecosystems and concrete recommendations for practitioners navigating this transition.

1Introduction

Software engineering, as codified at the 1968 NATO Conference[1], was born from a crisis: systems were growing in complexity beyond what ad-hoc programming practices could manage. The discipline’s founding insight was that rigorous methodologies—structured design, modular decomposition, configuration management, systematic testing—could tame this complexity. For five decades, this bet largely paid off. We moved from waterfall to agile, from monoliths to microservices, from manual deployment to CI/CD.

Yet a deeper structural problem persisted. As Brooks observed inThe Mythical Man-Month[2], software complexity exhibits a fundamentally different scaling behavior than other engineering domains. Unlike bridges or circuits, software has no manufacturing step—the designisthe product. Every new feature, every edge case, every integration point adds to a combinatorial explosion of possible states and interactions that Brooks characterized as “essential complexity”: complexity inherent to the problem itself, not accidental to the implementation.

This paper contends that the emergence of AI agents does not merely offer a new tool within the existing paradigm. Rather, it dissolves the very premise on which software engineering was founded. When a large language model (LLM)[12]can understand a task, decompose it into subtasks, dynamically generate code to execute those subtasks, and discard that code when it’s no longer needed, the role of code changes fromthe system itselftoan ephemeral instrument of reasoning. This shift is as fundamental as the transition from analog circuits to stored-program computers.

We make three central claims:

1.First-Principles Necessity.The agentic paradigm is not a market preference but an inevitable consequence of complexity scaling laws. Traditional software requires human engineers to explicitly encode every decision; LLM-based agents can navigate complexity non-linearly by outsourcing reasoning to models whose capacity grows with training compute.
2.Paradigm Shift, Not Optimization.The transition from “AI→\rightarrowSoftware→\rightarrowResult” to “Agent→\rightarrowResult” eliminates the software artifact as a necessary intermediary—comparable to how SaaS eliminated on-premise installation as a necessary intermediary. We formalize this as the third major paradigm shift in software delivery.
3.Emergent Discipline.Agentic Engineering is emerging as a distinct practice with its own concepts, tools, and metrics. Its practitioners are not “better programmers” but a fundamentally different role: intent architects, agent coordinators, and outcome auditors.

The remainder of this paper is structured as follows. Section 2 presents a first-principles analysis of traditional software and agent-based systems, including a formal complexity argument. Section 3 traces the historical paradigm shifts in software delivery and positions AaaS as the logical endpoint. Section 4 defines Agentic Engineering as a discipline and contrasts it with traditional software engineering. Section 5 reviews empirical evidence from recent benchmarks, acknowledging both breakthroughs and persistent challenges. Section 6 proposes an evolutionary roadmap. Section 7 concludes with implications for practitioners and the research community.

2First-Principles Analysis

2.1The Nature of Traditional Software

We begin with a precise definition.

Definition 2.1(Traditional Software System).

A traditional software systemSSis a tupleS=(C,D,E)S=(C,D,E)where:

•CCis a set of computational resources (CPU, memory, I/O);
•DDis a set of deterministic decision rules encoded in source code;
•EEis an execution environment that evaluatesDDagainst inputs to produce outputs.

The critical property is thatDDis static with respect to execution: all decision logic must be explicitly written by human engineers before the system encounters any input.

Under this definition, every feature addition, every bug fix, every adaptation to a changing environment requires a human to (a) understand the change needed, (b) locate the correct position inDD, (c) modify the logic without introducing regressions, and (d) verify correctness. The cost of each change is a function of the size ofDDand the density of its internal dependencies.

2.2The Complexity Barrier

Brooks[2]distinguished betweenaccidental complexity(artifacts of particular implementations) andessential complexity(inherent to the problem). While decades of advances—higher-level languages, frameworks, automated testing—have systematically reduced accidental complexity, essential complexity remains unbounded. In fact, as systems grow, the interaction surface between components grows combinatorially.

Proposition 2.1(Complexity Scaling).

For a system withnncomponents, each potentially interacting with any other, the number of possible interaction pathsP(n)P(n)is bounded by:

P(n)∈Θ(2n)P(n)\in\Theta(2^{n})(1)This arises because each of the(n2)\binom{n}{2}pairs may or may not have a meaningful interaction, yielding2(n2)2^{\binom{n}{2}}possible dependency graphs. While real systems do not realize all configurations, the upper bound on complexity grows exponentially, while human cognitive capacity to reason about these interactions is essentially constant.

This mismatch is the deep structural reason why software projects experience declining marginal productivity as they grow. The traditional response—hierarchical decomposition, modular interfaces, encapsulation—reduces the constant factor but does not change the asymptotic behavior.

2.3Agentic Systems: A Formal Model

In contrast, an agentic system operates on fundamentally different principles.

Definition 2.2(AI Agent System).

An AI agent systemAAis a tupleA=(M,𝒯,ℳ,Π)A=(M,\mathcal{T},\mathcal{M},\Pi)where:

•MMis a large language model serving as the reasoning engine;
•𝒯\mathcal{T}is a set of executable tools (code interpreters, APIs, databases, file systems);
•ℳ\mathcal{M}is a memory subsystem (short-term context, long-term vector store);
•Π\Piis a planning mechanism that decomposes user intent into action sequences.

The system operates by iteratively executing:at←M(st,ℳ)a_{t}\leftarrow M(s_{t},\mathcal{M}),st+1←exec(at)s_{t+1}\leftarrow\text{exec}(a_{t}), wherests_{t}is the system state at timettandata_{t}is the action chosen by the model.

The key distinction is that in an agentic system, the decision logic isgenerated at runtime. The LLMMMcan dynamically produce code, invoke tools, and adjust its behavior based on intermediate results—none of which was explicitly pre-programmed. The code it generates is not the system; it is a transient artifact, produced and discarded as needed.

This distinction maps cleanly to Karpathy’s “Software 2.0” framework[3], but extends it further. In Karpathy’s formulation, neural networks replace hand-crafted program logic with learned weights. Agentic systems go a step further: the neural network does not merelyreplacethe program—itwrites programs on demand, using code as a tool in service of broader reasoning goals. This pattern is consistent with the ReAct framework[9], which demonstrated that interleaving reasoning traces with tool-use actions substantially improves task performance, and with Chain-of-Thought prompting[8], which showed that explicit intermediate reasoning steps unlock latent capabilities in LLMs.

2.4Why Agents Inevitably Scale Better

Consider a taskTTwhose solution requires reasoning over a space of sizeNN. Under the traditional paradigm:

•A human engineer must mentally traverse this space to identify the solution path.
•The path must then be encoded as a static program.
•Human cognitive capacityCHC_{H}is essentially fixed.
•Thus, forN>CHN>C_{H}, the task is infeasible at any realistic cost.

Under the agentic paradigm:

•The LLMMMtraverses the space, with effective capacityCMC_{M}that scales with model size and training compute.
•The planΠ\PidecomposesTTinto subproblems, each handled independently.
•Code is generated only for the specific solution path, not for all contingencies.
•As LLM capabilities improve (which they have been, exponentially),CMC_{M}grows correspondingly.

Thus, the agentic paradigm decouples solution capacity from human cognitive limits. This is not a 10% improvement; it is a qualitative change in what kinds of problems can be economically addressed.

3From SaaS to AaaS: The Third Paradigm Shift

3.1Three Generations of Software Delivery

The history of commercial software can be understood as a progressive transfer of complexity away from the end-user. Table1summarizes this trajectory.

Table 1:Three Generations of Software DeliveryEach transition follows the same pattern: the party best positioned to absorb complexity absorbs it, and the party least positioned to manage it is liberated from it. SaaS liberated businesses from server rooms; AaaS promises to liberate them from the need to specifyhowa result should be produced—they need only specifywhatresult they want.

3.2The Failure of “AI→\rightarrowSoftware→\rightarrowResult”

The dominant enterprise AI paradigm to date has beenAI-augmented development: use LLMs to help human engineers write code faster, within the traditional software lifecycle. We denote this as the “AI→\rightarrowSoftware→\rightarrowResult” pipeline.

This approach has three structural weaknesses:

1.Bottleneck persistence.The human engineer remains the critical path for design decisions, architecture, integration testing, and deployment. AI accelerates code generation (a sub-step of implementation) but does not remove the human from any phase.
2.Complexity ceiling intact.The final deliverable remains a traditional software systemS=(C,D,E)S=(C,D,E). Its complexity still scales with the size ofDD, and it still requires human understanding for any modification. AI merely made construction ofDDsomewhat faster.
3.Iteration latency.Even with AI assistance, any functional change requires traversing the full chain: requirements→\rightarrowdesign→\rightarrowcode→\rightarrowtest→\rightarrowdeploy. This latency cannot be reduced below human communication and coordination speeds.

3.3“Agent→\rightarrowResult”: Eliminating the Intermediary

The alternative paradigm eliminates the software artifact as a necessary intermediary:

1.Human articulates intent and constraints to an agent.
2.Agent autonomously plans, executes (generating code as needed), validates, and delivers the result.
3.Human audits the outcome and provides feedback.

In this model, software is not delivered;outcomesare delivered. The agent may generate thousands of lines of code, execute database queries, call external APIs, produce visualizations—all ephemerally. What persists is the agent’scapability, not its intermediate artifacts. Kumar and Ramagopal[7]capture this distinction precisely: “AI coding agents excel at translating intent into code within a single user-driven session. Agentic engineering operates at a higher level of abstraction—it’s a control plane that orchestrates cross-team workflows, maintains long-term memory across agents, and manages state and traceability across the full software delivery lifecycle.”

4Agentic Engineering: A New Discipline

4.1Defining the Field

Agentic Engineering, formally introduced by LangChain in April 2026[7], is defined as “a multi-agent coordination model where AI agents function as digital team members—each with defined roles, shared memory, and a unified observability layer—to drive software through the entire delivery pipeline, not merely to generate code faster.”

Wang et al.[4]provide a foundational taxonomy of LLM-based agents in software engineering, identifying three core modules; a complementary survey by Guo et al.[13]offers a systematic treatment of multi-agent collaboration patterns and progress in LLM-based multi-agent systems.

PerceptionMulti-modal inputprocessingMemorySemantic, episodic,proceduralActionInternal reasoning +external tool useLLM Reasoning CoreExternal EnvironmentFigure 1:The LLM-based agent framework for software engineering, adapted from Wang et al.[4]. The perception module handles multi-modal input; the memory module maintains semantic, episodic, and procedural knowledge; the action module executes both internal reasoning and external tool invocations. All are orchestrated by the LLM reasoning core.A concrete realization of this architecture can be observed in Hermes Agent[14], an open-source framework by Nous Research that operationalizes the perception-memory-action model with a distinctive self-evolution mechanism. Its most consequential feature is a closed learning loop: after completing complex tasks, the agent autonomously creates reusable Skills—parameterized procedural modules—that self-improve during subsequent use, automatically patching themselves when found insufficient. Cross-session episodic memory is realized through FTS5-backed conversation search with LLM summarization, enabling the agent to accumulate experiential knowledge over time. The framework’s subagent delegation mechanism further demonstrates early multi-agent coordination in a widely deployed production system.

4.2Contrasting Agentic and Traditional Engineering

Table2maps the key dimensions of difference between the two paradigms.

Table 2:Traditional Software Engineering vs. Agentic EngineeringDimensionTraditional SEAgentic EngineeringCore artifactSource code (static)Agent system (dynamic)Control centerHuman engineerLLM reasoning engineDecision mechanismPre-designed logicRuntime-generated reasoningDevelopment cycleLinear (design→\rightarrowcode→\rightarrowtest)Autonomous iterative loopHuman roleCode authorIntent architect, coordinator, auditorComplexity ceilingHuman cognition (O(1)O(1))Model capacity (growing with compute)Output unitFunctioning softwareDelivered outcomesError handlingProgrammer-definedModel-adaptiveEvolutionManual refactoringSelf-modification

4.3The Human Role Reimagined

Perhaps the most consequential shift is in the human role. In the traditional paradigm, human value was measured by the ability to produce correct, efficient code. In the agentic paradigm, code-generation skill becomes commoditized. The new human differentiators are:

•Intent articulation.The ability to specify goals with sufficient clarity and constraint that agents can operate autonomously without producing unintended outcomes.
•Architectural oversight.Understanding at the system level how multiple agents should coordinate, what memory should be shared, and where human judgment must intervene.
•Quality calibration.Defining what “good” looks like and building evaluation frameworks that agents can use for self-correction.
•Ethical governance.Ensuring agent behavior aligns with organizational values, legal requirements, and societal expectations.

We believe the implications for individual practitioners are profound: as agentic capabilities mature, the productivity multiplier for those who master agent orchestration will far exceed the traditional “10x engineer” benchmark—not through faster typing, but through the ability to coordinate swarms of agents toward complex outcomes. The ceiling is not fixed; it rises with each advance in model capability and orchestration infrastructure.

5Empirical Evidence and Current Limitations

5.1Breakthrough Results

The empirical record provides strong evidence for the agentic thesis. We highlight four representative data points.

SWE-bench Verified.Ma et al.[5]demonstrated that Lingma SWE-GPT 72B, an open development-process-centric model, resolves 30.20% of GitHub issues on SWE-bench Verified—approaching GPT-4o’s 31.80% while being fully open. Notably, even the 7B variant resolved 18.20%, proving that small models can perform meaningful automated software engineering when trained on process data rather than static code alone. This represents a 22.76% relative improvement over Llama 3.1 405B, a model nearly 6×\timeslarger.

Multi-Agent Coordination.Kumar and Ramagopal[7]report results from a pilot study deploying coordinated agent swarms across 20+ enterprise debugging workflows. The coordinated agent system reduced root-cause identification time by 93%, saving over 200 engineering hours in a single month. Critically, these gains came not from better individual agents but fromorchestration—the ability to maintain shared context across agents, to parallelize investigation, and to cross-validate findings.

Self-Evolution.Hermes Agent[14], an open-source framework by Nous Research with over 179,000 GitHub stars, provides the most complete realization of the self-evolution principle in a production system. Its architecture implements a closed learning loop: after completing complex tasks, the agent autonomously creates reusable “Skills”—parameterized procedural modules that capture successful strategies. Critically, these skills self-improve during use—when a skill is invoked and found lacking, the agent patches it automatically, accumulating refinements over successive interactions. This pattern—create, use, detect weakness, self-patch—operates without human intervention, embodying precisely the self-evolution dynamic that distinguishes agentic systems from traditional software. Cross-session continuity is maintained through FTS5-backed conversation search with LLM summarization, enabling the agent to recall and build upon prior experiences. The framework’s subagent delegation mechanism further demonstrates early multi-agent coordination in a widely deployed system.

Generalization.Wang et al.[4]catalog hundreds of studies applying LLM-based agents across the full software lifecycle: requirements analysis, architecture design, code generation, testing, debugging, deployment, and maintenance. The breadth of coverage suggests that the agentic pattern is not limited to narrow tasks but generalizes across software engineering activities.

5.2Persistent Challenges

Despite rapid progress, significant challenges remain. The EvoClaw benchmark[6]provides the most sobering data. Deng et al. constructed a benchmark requiring agents to performcontinuoussoftware evolution—not isolated issue fixes but sustained development across commit histories, where each change must preserve system integrity and where errors accumulate. Their key finding:

“Overall performance scores drop significantly from>80%>80\%on isolated tasks to at most 38% in continuous settings, exposing agents’ profound struggle with long-term maintenance and error propagation.”[6]

This reveals four core challenges:

1.Context drift.As codebases grow beyond the effective context window, agents lose coherent understanding of system-wide invariants and dependencies.
2.Error propagation.A small error in an early commit cascades into compounding failures in subsequent work, and agents lack robust mechanisms for detecting and recovering from these chains.
3.Technical debt awareness.Agents do not currently model the long-term costs of their design decisions—they optimize for immediate task completion without considering maintainability.
4.Verification fidelity.Automated testing remains incomplete; agents can pass tests while introducing subtle semantic errors that only manifest under novel inputs.

Figure2visualizes the performance cliff that EvoClaw reveals.

Isolated TasksContinuous Evolution02020404060608080100100–54% drop82823838Evaluation SettingSuccess Rate (%)Performance Degradation in Continuous Evolution (EvoClaw)Figure 2:Agent performance on the EvoClaw benchmark[6]. When evaluated on continuous software evolution (requiring sustained development across commits with error accumulation), success rates collapse from over 80% to at most 38%. Data based on evaluation of 12 frontier models across 4 agent frameworks.

5.3The Gap Analysis

The gap between isolated-task performance (>80%>80\%) and continuous-evolution performance (<38%<38\%) quantifies the distance between current agent capability and the threshold for fully autonomous software engineering. This gap is not fundamental—it reflects limitations in context management, memory architecture, and verification mechanisms that are active areas of research. But it serves as an important calibration: agentic engineering is real and transformative today as anaugmentationparadigm, but will require several more years of concentrated research before fully autonomous software development becomes reliable in production settings.

6Evolutionary Roadmap

Based on current capabilities and trajectories, we propose a four-stage roadmap for the evolution of agentic engineering. Table3summarizes.

Table 3:Four-Stage Evolution of Agentic Engineering### 6.1Stage I: Tool-Augmented (2023–2025)

The current dominant mode. Agents serve as assistants within human-led workflows. The breakthrough has been in coding: models can generate, explain, and debug code at near-expert level for well-scoped tasks. The limitation is that the human must still decompose problems, design architecture, and verify correctness.

6.2Stage II: Single-Task Autonomous (2025–2027)

Agents begin to own complete tasks from specification to deployment. Systems like Devin and OpenHands demonstrate that agents can autonomously navigate codebases, implement features, and submit pull requests. The human shifts from “doing” to “specifying what to do and verifying what was done.”

6.3Stage III: Multi-Agent Teams (2026–2029)

Specialized agents coordinate as teams, mirroring human engineering organizations. A “product manager agent” translates business requirements into technical specifications; “architect agents” design system structure; “developer agents” implement components; “QA agents” test and validate. Shared memory and observability become critical infrastructure. The LangChain pilot[7]represents an early validation of this pattern.

6.4Stage IV: Self-Evolving Ecosystems (2028+)

Agents gain the ability to improve their own architectures, spawn specialized sub-agents for new problem domains, and adapt to environmental changes without human intervention. At this stage, the distinction between “software” and “agent” dissolves entirely—the agentisthe system, and it evolves continuously. Human involvement shifts to meta-level governance: setting ethical boundaries, defining value functions, and ensuring alignment.

7Implications and Recommendations

7.1For Practitioners

The transition to agentic engineering demands a deliberate re-skilling strategy:

1.Shift from code production to intent engineering.The most valuable skill is no longer writing code efficiently but articulating tasks with sufficient clarity, context, and constraints that agents can execute them correctly.
2.Build agent orchestration competence.Understanding how to decompose work across agents, manage shared memory, and design evaluation rubrics will differentiate effective practitioners.
3.Invest in observability infrastructure.Agent systems require fundamentally different monitoring than traditional software. Tracing an agent’s reasoning chain, detecting hallucinations, and measuring outcome quality demand new tooling.
4.Adopt a “human-in-the-loop, agent-in-the-driver’s-seat” posture.The most effective model today is neither fully autonomous nor fully human-driven. Agents should own execution; humans should own intent, critical judgment, and ethical oversight.

7.2For Researchers

Several open problems emerge with particular urgency:

1.Long-context state management.As EvoClaw demonstrates, agents lose coherence over extended development sequences. Architectures for compressing, indexing, and retrieving relevant context at scale are critical.
2.Verification in open-ended settings.Current benchmarks test isolated correctness; real-world systems require guarantees of safety, reliability, and maintainability over time. New verification frameworks that capture these temporal dimensions are needed.
3.Agent alignment at scale.As agents become more autonomous and are composed into teams, ensuring that their collective behavior aligns with human values becomes both more important and more difficult.
4.Economic models.How should agentic services be priced? Outcome-based pricing (per resolved issue, per deployed feature) may replace subscription and usage-based models, but the incentive structures and risk allocation need careful analysis.

7.3For Organizations

Organizations should begin preparing for the agentic transition now:

1.Identify agent-ready workflows.Not all software work is equally amenable to agent automation. Tasks with clear success criteria, well-defined scope, and existing test infrastructure are ideal starting points.
2.Invest in evaluation frameworks.The quality of agent output depends critically on the quality of the evaluation signal. Organizations should build test suites that go beyond correctness to measure robustness, maintainability, and alignment with business intent.
3.Redesign team structures.As individual productivity multiplies through agent leverage, team topologies must evolve. Smaller teams of “agent orchestrators” may replace larger teams of developers, with corresponding shifts in hiring, promotion, and career development.

8Conclusion

This paper has argued that the emergence of AI agents constitutes a paradigm shift in software, not a tool upgrade. The transition from “AI→\rightarrowSoftware→\rightarrowResult” to “Agent→\rightarrowResult” eliminates the static software artifact as a necessary intermediary, just as SaaS eliminated on-premise installation and cloud eliminated physical infrastructure before it.

The shift is grounded in first principles. Traditional software requires human engineers to encode all decision logic explicitly; the complexity of this task grows exponentially with system size while human capacity remains fixed. Agentic systems outsources decision-making to LLMs whose capacity scales with training compute, decoupling solution capability from human cognitive limits. This is a qualitative change in what kinds of software problems become economically tractable.

Yet we are still in the early stages. Benchmarks like EvoClaw reveal a stark gap between isolated-task performance and sustained autonomous development. The current moment calls for ambitious but calibrated investment: embrace agentic engineering as the dominant paradigm for augmentation while recognizing that fully autonomous software engineering remains a multi-year research challenge.

Agentic Engineering is emerging as a distinct discipline with its own concepts, tools, and professional identity. Its practitioners will not be programmers who learned new tools but a new kind of professional: intent architects who direct swarms of AI agents toward complex outcomes. The old software engineering is ending; the new one has already begun.

Acknowledgments

The author thanks the open-source community for making research artifacts and benchmarks publicly available, and the teams behind SWE-bench, EvoClaw, and LangChain for their foundational contributions to agent evaluation infrastructure.

References

[1]P. Naur and B. Randell, Eds.,Software Engineering: Report on a Conference Sponsored by the NATO Science Committee. Garmisch, Germany: NATO, 1968.
[2]F. P. Brooks,The Mythical Man-Month: Essays on Software Engineering. Reading, MA: Addison-Wesley, 1975. (Anniversary Edition with new chapters, 1995.)
[3]A. Karpathy, “Software 2.0,”Medium, Nov. 2017. [Online]. Available:https://karpathy.medium.com/software-2-0-a64152b37c35(Accessed: June 4, 2026).
[4]Y. Wang, W. Zhong, Y. Huang, E. Shi, M. Yang, J. Chen, H. Li, Y. Ma, Q. Wang, and Z. Zheng, “Agents in Software Engineering: Survey, Landscape, and Vision,”arXiv preprint arXiv:2409.09030, 2024.
[5]Y. Ma, R. Cao, Y. Cao, Y. Zhang, J. Chen, Y. Liu, Y. Liu, B. Li, F. Huang, and Y. Li, “Lingma SWE-GPT: An Open Development-Process-Centric Language Model for Automated Software Improvement,”arXiv preprint arXiv:2411.00622, 2024.
[6]G. Deng, Z. Chen, Z. Yu, H. Fan, Y. Liu, Y. Yang, D. Parikh, R. Kannan, L. Cong, M. Wang, Q. Zhang, V. Prasanna, X. Tang, and X. Wang, “EvoClaw: Evaluating AI Agents on Continuous Software Evolution,”arXiv preprint arXiv:2603.13428, 2026.
[7]R. Kumar and P. Ramagopal, “Agentic Engineering: How Swarms of AI Agents Are Redefining Software Engineering,”LangChain Blog, Apr. 2026. [Online]. Available:https://www.langchain.com/blog/agentic-engineering-redefining-software-engineering(Accessed: June 4, 2026).
[8]J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. Chi, Q. Le, and D. Zhou, “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models,” inAdvances in Neural Information Processing Systems (NeurIPS), 2022.
[9]S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, and Y. Cao, “ReAct: Synergizing Reasoning and Acting in Language Models,” inInternational Conference on Learning Representations (ICLR), 2023.
[10]X. Wang, Y. Wang, Y. Wan, F. Mi, Y. Li, P. Zhou, L. Shang, X. Jiang, and Q. Liu, “SWE-bench: Can Language Models Resolve Real-World GitHub Issues?” inInternational Conference on Learning Representations (ICLR), 2024.
[11]S. Hong, X. Zheng, J. Chen, Y. Cheng, J. Wang, C. Zhang, Z. Wang, S. K. S. Yau, Z. Lin, L. Zhouet al., “MetaGPT: Meta Programming for a Multi-Agent Collaborative Framework,” inInternational Conference on Learning Representations (ICLR), 2024.
[12]T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askellet al., “Language Models are Few-Shot Learners,” inAdvances in Neural Information Processing Systems (NeurIPS), 2020.
[13]T. Guo, X. Chen, Y. Wang, R. Chang, S. Pei, N. V. Chawla, O. Wiest, and X. Zhang, “Large Language Model based Multi-Agents: A Survey of Progress and Challenges,” inInternational Joint Conference on Artificial Intelligence (IJCAI), 2024. [Online]. Available:https://arxiv.org/abs/2402.01680
[14]Nous Research, “Hermes Agent: The Self-Improving AI Agent,” 2025–2026. [Online]. Available:https://github.com/NousResearch/hermes-agent— Documentation:https://hermes-agent.nousresearch.com/docs(Accessed: June 4, 2026).

相似文章

@Xudong07452910: 这篇论文很适合所有重度使用 Claude Code、Codex 或者其他AI Agent 的人看。它研究的不是 Agent 在 benchmark 上怎么失败，而是一个更真实的问题：在真实开发里，AI coding agent 到底是…

X AI KOLs Timeline

This paper analyzes 20,574 real-world coding-agent sessions to identify how AI agents misalign with developer intent, finding that constraint violations and inaccurate self-reporting are the most common failure modes, imposing trust and effort costs rather than irreversible damage.

@Xudong07452910: 这可能是人类写给 AI 看的最后一篇论文了。最近刷到Stanford、CMU、Michigan 等 37 位作者联名的论文：《The Last Human-Written Paper》。核心观点很狠：沿用几百年的论文，在 AI 时代可…

X AI KOLs Timeline

来自Stanford、CMU、Michigan等37位作者联名的论文提出ARA（Agent原生研究工件）替代传统论文格式，旨在解决叙事税和工程税，让AI Agent能理解、复现和扩展研究。

@runes_leo: Karpathy 4/30 在 Sequoia Ascent 把今年最有用的 AI 解释，压缩成三个论点。读完你看 AI 的方式会变。 1. AI 不只是"更快"，是新范式过去 2 年大家都在讲 AI 让事情变快。 Karpathy 说…

X AI KOLs Timeline

本文总结了Karpathy在Sequoia Ascent大会上的核心观点，指出AI是重塑任务流的新范式而非单纯加速工具，通过可验证性与经济价值划分了模型能力的“参差不齐边界”，并预言未来软件将演变为以LLM为逻辑层、传统代码为传感器/执行器的智能体原生架构。

@hongming731: 阿里这篇关于 AI Native 时代组织研发的思考非常值得一读。它在思路一个非常重要的底层问题：过去两千年的组织形态，都是围绕人的局限建立起来的。人会遗忘，会疲惫，会误解，会有情绪。一个人能够稳定协作和管理的人数有限，信息在层级之间传…

X AI KOLs Timeline

阿里发布关于AI原生时代组织研发的思考，指出传统组织形态需从适应人类局限转向适应AI Agent的高效执行。文章强调，AI转型的核心瓶颈在于信息形态的落后，需将隐性经验转化为AI可理解的基础设施，同时保留人类在创新和文化建设中的核心作用。

@interjc: 程序员岗位又爆发了，可能很多老板一盘算，Claude 套餐敞开用忒贵了，还不如请个人呢

X AI KOLs Following

推文指出，虽然有人认为AI会淘汰程序员，但数据表明软件工程师岗位需求反而飙升，这可能是因为使用AI（如Claude）的成本过高，促使企业更倾向于雇佣人类程序员。

How AI Agents Are Fundamentally Restructuring the Software Paradigm

Abstract

1Introduction

2First-Principles Analysis

2.1The Nature of Traditional Software

Definition 2.1(Traditional Software System).

2.2The Complexity Barrier

Proposition 2.1(Complexity Scaling).

2.3Agentic Systems: A Formal Model

Definition 2.2(AI Agent System).

2.4Why Agents Inevitably Scale Better

3From SaaS to AaaS: The Third Paradigm Shift

3.1Three Generations of Software Delivery

3.2The Failure of “AI→\rightarrowSoftware→\rightarrowResult”

3.3“Agent→\rightarrowResult”: Eliminating the Intermediary

4Agentic Engineering: A New Discipline

4.1Defining the Field

4.2Contrasting Agentic and Traditional Engineering

4.3The Human Role Reimagined

5Empirical Evidence and Current Limitations

5.1Breakthrough Results

5.2Persistent Challenges

5.3The Gap Analysis

6Evolutionary Roadmap

6.2Stage II: Single-Task Autonomous (2025–2027)

6.3Stage III: Multi-Agent Teams (2026–2029)

6.4Stage IV: Self-Evolving Ecosystems (2028+)

7Implications and Recommendations

7.1For Practitioners

7.2For Researchers

7.3For Organizations

8Conclusion

Acknowledgments

References

相似文章

@Xudong07452910: 这篇论文很适合所有重度使用 Claude Code、Codex 或者其他AI Agent 的人看。 它研究的不是 Agent 在 benchmark 上怎么失败，而是一个更真实的问题： 在真实开发里，AI coding agent 到底是…

@Xudong07452910: 这可能是人类写给 AI 看的最后一篇论文了。 最近刷到Stanford、CMU、Michigan 等 37 位作者联名的论文：《The Last Human-Written Paper》。 核心观点很狠：沿用几百年的论文，在 AI 时代可…

@runes_leo: Karpathy 4/30 在 Sequoia Ascent 把今年最有用的 AI 解释，压缩成三个论点。读完你看 AI 的方式会变。 1. AI 不只是"更快"，是新范式 过去 2 年大家都在讲 AI 让事情变快。 Karpathy 说…

@interjc: 程序员岗位又爆发了，可能很多老板一盘算，Claude 套餐敞开用忒贵了，还不如请个人呢

提交意见反馈

@Xudong07452910: 这篇论文很适合所有重度使用 Claude Code、Codex 或者其他AI Agent 的人看。它研究的不是 Agent 在 benchmark 上怎么失败，而是一个更真实的问题：在真实开发里，AI coding agent 到底是…

@Xudong07452910: 这可能是人类写给 AI 看的最后一篇论文了。最近刷到Stanford、CMU、Michigan 等 37 位作者联名的论文：《The Last Human-Written Paper》。核心观点很狠：沿用几百年的论文，在 AI 时代可…

@runes_leo: Karpathy 4/30 在 Sequoia Ascent 把今年最有用的 AI 解释，压缩成三个论点。读完你看 AI 的方式会变。 1. AI 不只是"更快"，是新范式过去 2 年大家都在讲 AI 让事情变快。 Karpathy 说…