The Digital Apprentice: A Framework for Human-Directed Agentic AI Development

arXiv cs.AI Papers

Summary

This paper presents the 'Digital Apprentice,' a framework for scalable and safe agentic AI in which autonomy is earned incrementally through observational learning, human authorization, and continuous alignment correction. It introduces ADAPT, an inference-time control plane that operationalizes graduated autonomy tiers and converts human corrections into reusable preference data.

arXiv:2606.04321v1 Announce Type: new Abstract: Agentic AI deployments face a recurring design tension: heavy human oversight limits scale, while broad autonomy outruns accountability. Neither posture provides the governance infrastructure required for responsible delegation. We present the Digital Apprentice, a framework for scalable, safe AI agency in which autonomy is earned, not assumed. The Digital Apprentice is a developmental learner that internalizes the tacit methodology of a directing human, graduating through per-skill autonomy tiers only when empirical evidence justifies it. The result is an agent that becomes genuinely useful over time while remaining aligned to a specific human's standards. Three architectural components make this possible. (1) Methodology capture, distilling a directing professional's tacit approach into structured assets. (2) Authorization, with autonomy escalation gated by explicit human approval. (3) Continuous alignment, correcting drift at runtime and converting each correction into owned preference data. We instantiate this framework as an inference-time control plane. We mathematically model the quality framework and discuss policies and techniques designed to raise quality. We apply the framework to an open professional corpus, and we show how catching data drift and applying a different technique at runtime recovers degraded quality dimensions under traffic shift. The implication extends beyond any single application. We believe these three pillars, stitched together as a system, form a safer and more viable path to agentic systems that can scale without sacrificing trust.
Original Article
View Cached Full Text

Cached at: 06/05/26, 02:06 AM

# A Framework for Human-Directed Agentic AI Development Earned Autonomy Through Observational Learning and Inference-Time Decision Memory
Source: [https://arxiv.org/html/2606.04321](https://arxiv.org/html/2606.04321)
###### Abstract

Agentic AI deployments face a recurring design tension: heavy human oversight limits scale, while broad autonomy outruns accountability\. Neither posture provides the governance infrastructure required for responsible delegation\.

We present the Digital Apprentice, a framework for scalable, safe AI agency in which autonomy is earned, not assumed\. The Digital Apprentice is a developmental learner that internalizes the tacit methodology of a directing human, graduating through per\-skill autonomy tiers only when empirical evidence justifies it\. The result is an agent that becomes genuinely useful over time while remaining aligned to a specific human’s standards\.

Three architectural components make this possible\. \(1\) Methodology capture, distilling a directing professional’s tacit approach into structured assets\. \(2\) Authorization, with autonomy escalation gated by explicit human approval\. \(3\) Continuous alignment, correcting drift at runtime and converting each correction into owned preference data\.

We instantiate this framework as an inference\-time control plane\. We mathematically model the quality framework and discuss policies and techniques designed to raise quality\. We apply the framework to an open professional corpus, and we show how catching data drift and applying a different technique at runtime recovers degraded quality dimensions under traffic shift\.

The implication extends beyond any single application\. We believe these three pillars, stitched together as a system, form a safer and more viable path to agentic systems that can scale without sacrificing trust\.

## 1 Introduction

Professional adoption of agentic AI is constrained less by raw model capability than by governance\. Organizations delegate autonomy to software routinely, but rarely with the three things responsible delegation requires: a way to capture the tacit methodology that defines competent work in their setting, a record of who authorized what, and continuous alignment as conditions shift\. Tool\-centric copilots treat each inference as stateless and accumulate no durable account of a professional’s judgment\. Autonomy\-maximizing agents act broadly before their reliability in a specific setting has been established\. Both leave the organization exposed\.

We contribute a two\-part answer\. The Digital Apprentice is a conceptual framework for human\-directed development in which an agent’s authority over a skill expands only as it demonstrates competence on that skill and a human authorizes the expansion\. ADAPT \(Adaptive Data Augmentation and Preference Tuning\) is the inference\-time control plane that operationalizes the framework: it runs multiple policies, measures output quality along several dimensions, and records each human correction as a reusable preference signal that stays within the organization’s environment\. In this design, inference becomes a record\-generating event\. Each judgment is externalized, isolated to the tenant that produced it, and available for in\-context steering or, when warranted, model updating\. The conceptual framework specifies what autonomy looks like at each tier; the control plane specifies how that autonomy is measured, maintained, and reversed\. This separation lets the framework be assessed on its own terms, with ADAPT as one implementation rather than the definition\. Governance does not stop at policy; it enforces policy through data\-driven decisions grounded in the professional’s method\.

Taxonomies describe what autonomy looks like; our contribution is the machinery for moving between levels: what evidence justifies an increase in autonomy, who must authorize it, and how the system detects when autonomy should be withdrawn\.

## 2 Graduated Autonomy Framework

### 2\.1 Per\-Skill State Machine

Autonomy in the Digital Apprentice is a per\-skill property that the system holds at a given time, represented as a finite state machine \(Table[1](https://arxiv.org/html/2606.04321#S2.T1)\) rather than a fixed benchmark tier\. The agent cannot promote itself\. Promotion from one tier to the next requires both empirical evidence of competence on that skill and a recorded human authorization\. Demotion is asymmetric: the system rolls a skill back automatically when quality degrades, without waiting for human action\.

This per\-skill, role\-indexed structure is consistent with autonomy\-level taxonomies that index control by the role a human retains over the agent\(Feng et al\.,[2025](https://arxiv.org/html/2606.04321#bib.bib7);Beer et al\.,[2014](https://arxiv.org/html/2606.04321#bib.bib1)\)\. The difference is that those taxonomies describe the levels; we specify the transition conditions between them\.

Table 1:Per\-skill autonomy tiers\.Entry is Pre\-L0 \(Observe\-only\)\. Transition to L0 requires only explicit human authorization after an observation period \(NNsessions\); the graduated promotion criteria \(Eq\.[3](https://arxiv.org/html/2606.04321#S2.E3)\) apply to all subsequent tier transitions\.

### 2\.2 Graduation Mathematics

LetWtW\_\{t\}denote thett\-th evaluation window ofNNconsecutive outputs for a single skill\. Promotion checks use non\-overlapping windows; rolling windows may be used for runtime monitoring\. Letc​\(x\)=1c\(x\)=1if outputxxrequired a human correction and 0 otherwise\. The per\-window correction rate is:

ρ​\(W\)=1\|W\|​∑x∈Wc​\(x\)\\rho\(W\)=\\frac\{1\}\{\|W\|\}\\sum\_\{x\\in W\}c\(x\)\(1\)When a validated quality scorerQ​\(x\)Q\(x\)is available, with acceptability thresholdτ\\tau, a proportionppof outputs in the window must clear the threshold:

1\|W\|​∑x∈W𝟏​\[Q​\(x\)≥τ\]≥p\\frac\{1\}\{\|W\|\}\\sum\_\{x\\in W\}\\mathbf\{1\}\[Q\(x\)\\geq\\tau\]\\geq p\(2\)Graduation from tierLiL\_\{i\}toLi\+1L\_\{i\+1\}requires three conditions plus authorization\.\(C1\) Net improvement:the correction rate in the current window is lower thankkwindows earlier \(ρ​\(Wt\)<ρ​\(Wt−k\)\\rho\(W\_\{t\}\)<\\rho\(W\_\{t\-k\}\); we usek=3k\{=\}3\), accommodating normal variance while detecting genuine improvement\.\(C2\) Low residual correction:ρ​\(Wt\)≤τcorr\\rho\(W\_\{t\}\)\\leq\\tau\_\{\\mathrm\{corr\}\}\.\(Q1\) Scorer gate:Eq\.[2](https://arxiv.org/html/2606.04321#S2.E2)holds for the current window\. LetHauth∈\{0,1\}H\_\{\\mathrm\{auth\}\}\\in\\\{0,1\\\}record an explicit human authorization event\. Graduation is then:

Graduate​\(Li→Li\+1\)⇔\(C​1∧C​2\)\\displaystyle\\mathrm\{Graduate\}\(L\_\{i\}\\\!\\to\\\!L\_\{i\+1\}\)\\iff\(C1\\land C2\)\(3\)∧\(¬scorer∨Q​1\)∧\(Hauth=1\)\\displaystyle\\quad\\land\(\\neg\\textsc\{scorer\}\\lor Q1\)\\land\(H\_\{\\mathrm\{auth\}\}\{=\}1\)The correction\-rate gate has a known vulnerability that we address directly: a low correction rate can indicate either genuine competence or a reviewer who has stopped checking carefully\. This is the automation\-complacency failure documented in the human\-factors literature\(Parasuraman & Manzey,[2010](https://arxiv.org/html/2606.04321#bib.bib14)\)\. We treat reviewer engagement as a monitored quantity rather than an assumption\. A skill is demoted automatically whenρ​\(W\)\>ρdemote\\rho\(W\)\>\\rho\_\{\\mathrm\{demote\}\}or out\-of\-distribution uncertainty exceedsumaxu\_\{\\max\}; we use a rate\-based trigger so ordinary output noise at L2 does not cause oscillation\. Re\-escalation requires satisfying Eq\.[3](https://arxiv.org/html/2606.04321#S2.E3)again with newHauthH\_\{\\mathrm\{auth\}\}\.

### 2\.3 Two\-Phase Learning

Learning proceeds in two phases that differ in speed and reversibility\. In Phase 1, human corrections populate preference pairs that serve as tenant\-isolated decision memory, retrieved at inference time to steer outputs immediately\. Phase 1 acts as an immediate safety buffer: steering based on recent corrections adapts within hours, while the underlying model remains unchanged\. This phase is fully reversible and requires no update to the generator\. In Phase 2, once the volume and statistical significance of accumulated preferences cross set thresholds, the pairs are exported to a model\-updating step such as supervised fine\-tuning or direct preference optimization\(Ouyang et al\.,[2022](https://arxiv.org/html/2606.04321#bib.bib13);Rafailov et al\.,[2023](https://arxiv.org/html/2606.04321#bib.bib16);Christiano et al\.,[2017](https://arxiv.org/html/2606.04321#bib.bib2)\)\. Determining domain\-specific thresholds for Phase 2 activation is left to future work\. Quality regressions on the multidimensional scorer reject a bad update before it is committed\. Every update is traceable to a specific human correction or an observed human decision, which is the property that distinguishes this loop from open\-ended self\-improvement\.

## 3 ADAPT: Inference\-Time Control Plane

ADAPT operationalizes the Digital Apprentice as continuous\-learning infrastructure positioned between an organization’s orchestration layer and its model providers\. Four components form one loop: \(1\) asset synthesis \(methodology, style, and authority exemplars, after professional validation\); \(2\) multi\-policy inference \(RAG, methodology\-conditioned generation, best\-of\-NN, and diversity\-gated fusion\); \(3\) quality telemetry \(vector scoring across named dimensions, Table[2](https://arxiv.org/html/2606.04321#S3.T2)\); \(4\) preference emission \(weighted pairs from human corrections and policy comparisons\)\. These draw from classical ML \(RAG, best\-of\-NNsampling\) as well as control\-plane techniques we introduce \(methodology\-conditioned generation, diversity\-gated fusion\)\.

Each inference event follows a four\-step pipeline\.Branch:given a promptxx, a branch policy \(e\.g\., best\-of\-NNsampling with different model temperatures, RAG retrieval top\-kksettings, or different prompt framings such as “advise” vs\. “draft” vs\. “challenge”\) produces candidate outputsYN=\{y\(1\),…,y\(N\)\}Y\_\{N\}=\\\{y^\{\(1\)\},\\ldots,y^\{\(N\)\}\\\}\.Score:each candidate receives a radar vector𝐫​\(x,y\)\\mathbf\{r\}\(x,y\)\(Table[2](https://arxiv.org/html/2606.04321#S3.T2)\)\.Triage:a pluggable scorer \(LLM\-as\-judge rubric, trained reward model, or embedding centroid over approved exemplars\) computesR​\(y\)=agg​\(𝐫\)R\(y\)=\\mathrm\{agg\}\(\\mathbf\{r\}\)and ranks candidates; the system presents the highest\-scoringy\+y^\{\+\}to the directing professional for validation or correction, and treats the remainder asy−y^\{\-\}\.Emit:every comparison becomes a preference tuple\. A rejected candidate forms an automatic tuple\(x,y\+,y−,wauto\)\(x,y^\{\+\},y^\{\-\},w\_\{\\mathrm\{auto\}\}\)withwauto∈\[0\.2,0\.5\]w\_\{\\mathrm\{auto\}\}\\in\[0\.2,0\.5\]andpolicy\_comparisonprovenance\. A human correction forms a tuple withwhuman=1\.0w\_\{\\mathrm\{human\}\}=1\.0andhuman\_correctionprovenance\. These records are tenant\-isolated decision memory for in\-context steering or optional model updating\. The pilot usedwauto=0\.35w\_\{\\mathrm\{auto\}\}=0\.35\.

### 3\.1 Methodology Quality Rubric

We score professional quality as a six\-dimensional vector rather than a single opaque number, so that drift in one dimension is visible even when others hold\. The rubric dimensions were defined during structured onboarding with the directing professional, reflecting the qualities that distinguish competent from expert work in that practice\. The framework supports anyd≥1d\\geq 1; we instantiated=6d=6here \(Table[2](https://arxiv.org/html/2606.04321#S3.T2)\)\.

Table 2:Quality rubric dimensions \(d=6d=6\)\.
### 3\.2 Drift, Policy Switching, and Recalibration

Drift is multidimensional\. When incoming requests move into areas adjacent to the onboarding distribution \(the same profession, but unseen topics or cases\), methodology dimensions may remain strong while operational dimensions \(actionability, context sensitivity, safety boundary\) deform\. The control plane detects this shift quantitatively: rolling\-window radar telemetry on each output reveals dimension\-specific score deformation \(e\.g\., falling grounding or actionability while methodology fit holds\)\. When per\-dimension or mean scores cross calibrated thresholds, a localized regression is flagged and policy switching is triggered\. The control plane distinguishes three causes: human methodology evolved \(accelerate observation and incorporate new exemplars\), agent regressed \(rollback and increase review frequency\), or evaluation criteria shifted \(revise rubric definitions and re\-baseline historical telemetry\)\. Each cause requires a different response\. When localized degradation is detected, ADAPT switches policy at runtime rather than serving a statically “optimal” onboarding policy\. The control plane applies techniques from a broader policy repertoire\. Classical mutual information measures pairwise statistical dependence between candidates; we instead use a lightweight alternative based on dispersion in quality\-score space\. For a shortlisted subsetS⊆YNS\\subseteq Y\_\{N\}, typically the top\-kkcandidates by mean score, generated under different framings, we compute diversity as the mean pairwise Euclidean distance between radar vectors:

Δf​\(S\)=2\|S\|​\(\|S\|−1\)​∑1≤i<j≤\|S\|‖𝐫​\(y\(i\)\)−𝐫​\(y\(j\)\)‖2\\Delta\_\{f\}\(S\)=\\frac\{2\}\{\|S\|\(\|S\|\-1\)\}\\sum\_\{1\\leq i<j\\leq\|S\|\}\\left\\lVert\\mathbf\{r\}\\bigl\(y^\{\(i\)\}\\bigr\)\-\\mathbf\{r\}\\bigl\(y^\{\(j\)\}\\bigr\)\\right\\rVert\_\{2\}\(4\)where𝐫​\(y\(i\)\)\\mathbf\{r\}\(y^\{\(i\)\}\)is the six\-dimensional radar vector for candidatey\(i\)y^\{\(i\)\}\. LowΔf\\Delta\_\{f\}indicates that candidates collapse onto the same quality profile despite different framings; in this case synthesis is skipped\. HighΔf\\Delta\_\{f\}indicates candidates occupy distinct points in the quality\-space, justifying fusion to recover dimensions where no single candidate is strong\. This fusion may combine outputs with complementary score profiles via the fusion operatorℱ\\mathcal\{F\}, which synthesizes a composite answer:

yfuse=ℱ​\(x,S\)when​Δf​\(S\)≥δy\_\{\\mathrm\{fuse\}\}=\\mathcal\{F\}\\big\(x,S\\big\)\\quad\\text\{when \}\\Delta\_\{f\}\(S\)\\geq\\delta\(5\)whereδ\\deltais a tunable threshold\. Because radar dimensions are normalized to\[0,1\]\[0,1\],Δf\\Delta\_\{f\}is bounded byd\\sqrt\{d\}\(at most≈2\.45\\approx 2\.45ford=6d=6\);δ\\deltais calibrated on validation traffic\. This dispersion serves as a lightweight diversity gate for inference\-time fusion: it reuses radar scores already computed by the control plane and requires no additional embedding model\. At runtime, the control plane may apply, skip, or switch policies entirely when drift is detected; the diversity metric is a component, not the entire strategy\. Whenyfusey\_\{\\mathrm\{fuse\}\}improves degraded dimensions without collapsing strong ones relative to the single\-best candidate, a fusion tuple is emitted, recovering operational dimensions without sacrificing strong methodology scores\.

## 4 Related Work

Human\-in\-the\-loop and human\-on\-the\-loop designs either scale poorly or intervene too late for agentic workloads\(Wu et al\.,[2022](https://arxiv.org/html/2606.04321#bib.bib19)\)\. Reinforcement learning from human feedback and direct preference optimization align a model to aggregated preferences\(Christiano et al\.,[2017](https://arxiv.org/html/2606.04321#bib.bib2);Ouyang et al\.,[2022](https://arxiv.org/html/2606.04321#bib.bib13);Rafailov et al\.,[2023](https://arxiv.org/html/2606.04321#bib.bib16)\), but they align to a population rather than to a specific directing professional’s methodology, and they do not by themselves detect drift during observation or action\. We expect this gap to widen as deployments move beyond text into multimodal sensing\. Best\-of\-NNmethods discard rejected candidates rather than retaining them as decision memory\(Liao et al\.,[2026](https://arxiv.org/html/2606.04321#bib.bib9)\)\. Autonomy taxonomies assign tiers but leave transition machinery unspecified\(Feng et al\.,[2025](https://arxiv.org/html/2606.04321#bib.bib7);Beer et al\.,[2014](https://arxiv.org/html/2606.04321#bib.bib1)\); governance frameworks for agentic AI remain similarly high\-level\(IMDA,[2026](https://arxiv.org/html/2606.04321#bib.bib8)\)\. Corrigibility work complements our authorization gate\(Nayebi,[2025](https://arxiv.org/html/2606.04321#bib.bib10)\)\. Our contribution is the integration pattern: each inference produces a durable, tenant\-isolated judgment record governing immediate steering and optional model update\.

## 5 Proof\-of\-Concept

We instantiate ADAPT on an openly available professional\-methodology corpus to illustrate the mechanisms, not to establish a general result\. The setup uses4040to6060prompts per arm, a Qwen model as generator, and a Gemma model as LLM\-as\-judge evaluator over the six\-dimensional rubric, accessed through OpenRouter\(OpenRouter,[2024](https://arxiv.org/html/2606.04321#bib.bib12)\)\. Reported means are the arithmetic average of the six rubric dimensions across all prompts in the evaluation arm\. Branch\-and\-triage does most of the narrowing work: the judge scores and ranks candidates; the directing professional then validates or corrects the presented output\. Judge\-ranked comparisons emit automatic preference tuples \(wauto=0\.35w\_\{\\mathrm\{auto\}\}\{=\}0\.35\); human acceptance or correction emits full\-weight tuples \(whuman=1\.0w\_\{\\mathrm\{human\}\}\{=\}1\.0\)\. Figures[7](https://arxiv.org/html/2606.04321#S7)and[7](https://arxiv.org/html/2606.04321#S7)show triage\-stage radar profiles \(judge\-measured, pre\-human validation\) under each policy arm\.

Two\-arm pilot\.Arm A measures onboarding quality by comparing corpus\-only retrieval \(no methodology assets\) to an onboarding\-guided policy\. Arm B measures runtime drift recovery by shifting live traffic to a new topic area and switching from the onboarding policy to diversity\-gated fusion over retrieval candidates\.

Before and after structured onboarding\.With the corpus and plain retrieval only, the mean triage\-stage score is 0\.717\. Best\-of\-NNsampling raises it to 0\.780, and diversity\-gated fusion to 0\.803\. After structured onboarding converts raw artifacts into methodology and style assets, the mean rises to 0\.957 at triage \(Figure[7](https://arxiv.org/html/2606.04321#S7)\), before the human validates the presented output\.

Under runtime drift\.When live requests differ from those seen during onboarding, the onboarding\-selected policy retains strong methodology, voice, and grounding at triage, but the mean falls to 0\.930, with actionability at 0\.770 and safety boundary at 0\.870\. Switching to diversity\-gated recalibration restores actionability to 0\.905 and the triage mean to 0\.957 \(Figure[7](https://arxiv.org/html/2606.04321#S7)\); human validation of the final output remains the authority step that emits preference data\.

Limitations of this evidence\.The radar profiles are triage\-stage, judge\-measured telemetry on one corpus\(Zheng et al\.,[2023](https://arxiv.org/html/2606.04321#bib.bib20)\), illustrating the measurement\-and\-switching loop rather than post\-human ground truth\. The pilot does not report inter\-rater agreement, confidence intervals, or significance testing\. The intended production path is branch\-and\-triage first, human validation second, preference emission on every human decision\.

These results support a systems claim: professional AI quality is a runtime variable, measurable, improvable, monitorable, and convertible to organization\-owned preference data through the inference control plane\.

## 6 Limitations and Risks

Tacit knowledge as an underdetermined inverse problem\.Much of what experts know cannot be fully articulated\(Polanyi,[1966](https://arxiv.org/html/2606.04321#bib.bib15);Nonaka & Takeuchi,[1995](https://arxiv.org/html/2606.04321#bib.bib11)\), and situated\-action work argues that plans cannot be read off from observed behavior \([Suchman](https://arxiv.org/html/2606.04321#bib.bib17),[1987](https://arxiv.org/html/2606.04321#bib.bib17),[2007](https://arxiv.org/html/2606.04321#bib.bib18)\)\. Recovering a professional’s methodology from observation is an ill\-posed inverse mapping: many world models are consistent with any finite observation set, and any single modality captures only a compressed projection\. Our current work addresses the tractable slice: text\-mediated interaction as a proxy for professional judgment, where the inference space is constrained enough to yield learnable decision regularities\.

Consent and confidentiality of observed interaction\.Recording human\-to\-human professional interaction, for example coaching sessions or clinical exchanges, raises consent, confidentiality, and data\-protection obligations, and in some jurisdictions implicates obligations under the EU AI Act\(European Parliament,[2024](https://arxiv.org/html/2606.04321#bib.bib6)\)and proposed civil\-liability rules\(European Commission,[2022](https://arxiv.org/html/2606.04321#bib.bib5)\)\. Any deployment must establish the lawful basis for observation and the rights of the observed parties before the Pre\-L0 stage begins\. We flag this as a design precondition, not an afterthought\.

Trust\.Promotion reads low correction rate as competence, creating complacency risk\(Parasuraman & Manzey,[2010](https://arxiv.org/html/2606.04321#bib.bib14);Dietvorst et al\.,[2015](https://arxiv.org/html/2606.04321#bib.bib4);de Visser et al\.,[2018](https://arxiv.org/html/2606.04321#bib.bib3)\)\. Reviewer engagement must be monitored; we recommend periodic seeded checks\.

## 7 Conclusion

The Digital Apprentice and ADAPT treat governance as an in\-deployment property rather than a fixed configuration\. Observational learning grounds capability in demonstrated practice; graduation gates and authorization lineage govern escalation; multidimensional measurement and runtime policy switching maintain alignment as conditions shift; and accumulated preference pairs form a durable, organization\-owned improvement substrate\. We suggest that continuous, human\-grounded learning infrastructure is a productive direction for human\-directed agentic AI in high\-stakes settings\. The measure of success is not how autonomous AI becomes, but how much further human expertise can reach\.

![Refer to caption](https://arxiv.org/html/2606.04321v1/radar_onboarding_grid.png)Figure 1:Before/after structured onboarding \(n=40n\{=\}40to6060\): corpus\-only RAG \(0\.717\), corpus\-only diversity\-gated fusion \(0\.803\), and onboarding\-guided policy \(0\.957\)\.![Refer to caption](https://arxiv.org/html/2606.04321v1/radar_drift_vertical.png)Figure 2:Before/after runtime drift and policy recalibration \(Arm B:n=40n\{=\}40\): drifted onboarding policy \(0\.930\) and diversity\-gated recalibrated policy \(0\.957\)\.
## References

- Beer et al\. \(2014\)Beer, J\. M\., Fisk, A\. D\., & Rogers, W\. A\. \(2014\)\. Toward a framework for levels of robot autonomy in human\-robot interaction\.Journal of Human\-Robot Interaction, 3\(2\), 74–99\.
- Christiano et al\. \(2017\)Christiano, P\. F\., et al\. \(2017\)\. Deep reinforcement learning from human preferences\.Advances in Neural Information Processing Systems, 30\.
- de Visser et al\. \(2018\)de Visser, E\. J\., Pak, R\., & Shaw, T\. H\. \(2018\)\. From ‘automation’ to ‘autonomy’: The importance of trust repair in human–machine interaction\.Ergonomics, 61\(10\), 1409–1427\.
- Dietvorst et al\. \(2015\)Dietvorst, B\. J\., Simmons, J\. P\., & Massey, C\. \(2015\)\. Algorithm aversion: People erroneously avoid algorithms after seeing them err\.Journal of Experimental Psychology: General, 144\(1\), 114\.
- European Commission \(2022\)European Commission\. \(2022\)\. Proposal for a Directive on adapting non\-contractual civil liability rules to artificial intelligence \(AI Liability Directive\)\. COM/2022/496 final\.
- European Parliament \(2024\)European Parliament and Council of the European Union\. \(2024\)\. Regulation \(EU\) 2024/1689 laying down harmonised rules on artificial intelligence \(Artificial Intelligence Act\)\.Official Journal of the European Union, L 2024/1689\.
- Feng et al\. \(2025\)Feng, K\. J\. K\., McDonald, D\. W\., & Zhang, A\. X\. \(2025\)\. Levels of autonomy for AI agents\. arXiv preprint arXiv:2506\.12469\.
- IMDA \(2026\)IMDA \(Infocomm Media Development Authority\)\. \(2026\)\. Model AI Governance Framework for Agentic AI\. Singapore: IMDA\.[https://www\.imda\.gov\.sg](https://www.imda.gov.sg/)
- Liao et al\. \(2026\)Liao, R\., Röhrich, N\., Wang, X\., Zhang, Y\., Samadzadeh, Y\., Tresp, V\., & Yeung\-Levy, S\. \(2026\)\. Tool verification for test\-time reinforcement learning\. arXiv:2603\.02203\.
- Nayebi \(2025\)Nayebi, A\. \(2025\)\. Core safety values for provably corrigible agents\. arXiv:2507\.20964\. To appear in AAAI 2026 Machine Ethics Workshop Proceedings\.
- Nonaka & Takeuchi \(1995\)Nonaka, I\., & Takeuchi, H\. \(1995\)\.The Knowledge\-Creating Company: How Japanese Companies Create the Dynamics of Innovation\. Oxford University Press\.
- OpenRouter \(2024\)OpenRouter\. \(2024\)\. OpenRouter: A unified interface for LLMs\.[https://openrouter\.ai](https://openrouter.ai/)\.
- Ouyang et al\. \(2022\)Ouyang, L\., et al\. \(2022\)\. Training language models to follow instructions with human feedback\.Advances in Neural Information Processing Systems, 35\.
- Parasuraman & Manzey \(2010\)Parasuraman, R\., & Manzey, D\. H\. \(2010\)\. Complacency and bias in human use of automation: An attentional integration\.Human Factors, 52\(3\), 381–410\.
- Polanyi \(1966\)Polanyi, M\. \(1966\)\.The Tacit Dimension\. University of Chicago Press\.
- Rafailov et al\. \(2023\)Rafailov, R\., Sharma, A\., Mitchell, E\., et al\. \(2023\)\. Direct preference optimization: Your language model is secretly a reward model\.Advances in Neural Information Processing Systems, 36\.
- Suchman \(1987\)Suchman, L\. A\. \(1987\)\.Plans and Situated Actions: The Problem of Human\-Machine Communication\. Cambridge University Press\.
- Suchman \(2007\)Suchman, L\. A\. \(2007\)\.Human\-Machine Reconfigurations: Plans and Situated Actions\(2nd ed\.\)\. Cambridge University Press\.
- Wu et al\. \(2022\)Wu, X\., et al\. \(2022\)\. A survey of human\-in\-the\-loop for machine learning\.Future Generation Computer Systems, 135, 364–381\.
- Zheng et al\. \(2023\)Zheng, L\., et al\. \(2023\)\. Judging LLM\-as\-a\-judge with MT\-Bench and Chatbot Arena\.Advances in Neural Information Processing Systems, 36\.

Similar Articles

Position: Agentic AI System Is a Foreseeable Pathway to AGI

arXiv cs.AI

This paper argues that monolithic scaling of a single model is insufficient for achieving AGI and proposes Agentic AI with multi-agent collaboration as a necessary paradigm, demonstrating theoretically that agentic systems achieve exponentially superior generalization and sample efficiency.

Experiments in Agentic AI for Science

arXiv cs.AI

This paper presents two agentic AI frameworks, DeepTS/DeepCollector and DeepScribe, that automate scientific workflows including time-series data curation and conversion of physics lectures into structured reports, using a hybrid local-cloud architecture with LLMs.