Body-Grounded Perspective Formation and Conative Attunement in Artificial Agents
Summary
This paper proposes a minimal architecture for body-grounded perspective formation in artificial agents, extending prior work with an interoceptive viability signal and conative alignment mechanism to operationalize machine subjectivity from a phenomenological perspective.
View Cached Full Text
Cached at: 05/19/26, 06:35 AM
# Body-Grounded Perspective Formation and Conative Attunement in Artificial Agents
Source: [https://arxiv.org/html/2605.16728](https://arxiv.org/html/2605.16728)
11institutetext:Active Inference Institute, CA, USA
11email:hjpae@activeinference\.institute###### Abstract
This paper proposes a minimal architecture for body\-grounded perspective formation in artificial agents\. Extending prior work, the model introduces an interoceptive viability signal, a Fisher\-style metric over fused exteroceptive\-interoceptive states, and a conative alignment mechanism linking bodily tendency to action readiness\. In a reward\-free gridworld, conation converts learned bodily tendency into stable body\-directed behavior, while body\-to\-perspective routing allows bodily perturbations to leave a recoverable geometric residue in the perspective latent\. This study shows how minimal structural conditions for artificial subjectivity can be operationalized in the phenomenological sense, through the embodied organization of how a world is given to an agent\.
## 1Introduction
If artificial agents are to be studied as candidates for any form of machine subjectivity, the question is not whether they reach a behavioral threshold, but whether they instantiate thestructural conditionsunder which a world could begiven to a subject at all\. The phenomenological tradition has long argued that such conditions are not peripheral but the core constitutive features of experience itself\[[13](https://arxiv.org/html/2605.16728#bib.bib14),[11](https://arxiv.org/html/2605.16728#bib.bib15),[5](https://arxiv.org/html/2605.16728#bib.bib19)\]\. Among these, two are central to the present paper: \(1\) experience is alwaysperspectival, in the sense that it is givenassomething, fromsomestandpoint; and \(2\) this standpoint is grounded in alived body, such that the world opens fromhere, throughthisembodied point of orientation\[[13](https://arxiv.org/html/2605.16728#bib.bib14),[21](https://arxiv.org/html/2605.16728#bib.bib23),[22](https://arxiv.org/html/2605.16728#bib.bib18),[7](https://arxiv.org/html/2605.16728#bib.bib24)\]\.
In two earlier studies that sought to give computational form to phenomenological accounts of subjectivity, Pae\[[15](https://arxiv.org/html/2605.16728#bib.bib12)\]introduced a slow global latentggthat evolves on a timescale dissociable from policy and exhibits directional hysteresis under regime switching, providing a measurable signature of perspective\-like internal structure; Pae\[[16](https://arxiv.org/html/2605.16728#bib.bib13)\]then allowedggto feed back into perception through salience gating, showing that the same nominal observation is encoded differently as a function of accumulated perspective\.
However, what was absent from both studies was thebodyitself, since perspectiveggwas shaped mainly by exteroceptive asymmetries such as observation\-noise gradients\. Phenomenologically, the body is the lived center from which a world becomes meaningful and affectively valenced; if intero/proprioception is understood as an informational relation, this lived center need not presuppose biological substrate, opening the possibility that artificial systems satisfying the same structural conditions could in principle be subjective agents\. The present paper accordingly models the interoceptive feedback arc, integrated at the informational level into the prediction\-action cycle, that structurally enables the agent to evaluate the world from a situated standpoint, asking how the qualitative affection the body brings into play—closely related to what Husserl describes asintentional quality\[[10](https://arxiv.org/html/2605.16728#bib.bib17)\]—can be made computationally tractable\.
Key Contributions\.\(1\) Body\-grounded perspective\.The slow perspective latent of\[[15](https://arxiv.org/html/2605.16728#bib.bib12),[16](https://arxiv.org/html/2605.16728#bib.bib13)\]is now grounded in an internal bodily\-viability signal\.\(2\) Qualitative geometry\.The perspective latent is analyzed through an information\-geometric structure over fused exteroceptive\-interoceptive states\.\(3\) Conation as the bridge to action\.Conative attunement is modeled as a one\-sided link from bodily viability to policy\-level action preference\.
## 2Phenomenological Foundations
### 2\.1Embodied Subjectivity and Pre\-Reflective Self\-Awareness
The subjectivity of conscious experience begins from a tacit, non\-thematic mineness that accompanies every conscious act\. Heidegger names thisJemeinigkeit\[[8](https://arxiv.org/html/2605.16728#bib.bib16)\]; Sartre and Zahavi develop it as a non\-objectifying self\-acquaintance that precedes any reflective “I”\[[19](https://arxiv.org/html/2605.16728#bib.bib21),[22](https://arxiv.org/html/2605.16728#bib.bib18)\]\. Crucially, this pre\-reflective self\-awareness is not a reflective inner gaze or introspection\. It is realized through the lived body’s kinesthetic and affective engagement with the world \- in Merleau\-Ponty’s terms, throughbodily intentionality\[[13](https://arxiv.org/html/2605.16728#bib.bib14),[6](https://arxiv.org/html/2605.16728#bib.bib20)\]\. To be pre\-reflectively self\-aware is therefore to be situated as an embodied intentional being, capable of interacting with and adjusting to its environment\. Contemporary cognitive science converges through multiple routes—Edelman’s primary consciousness\[[4](https://arxiv.org/html/2605.16728#bib.bib10)\]and Damasio’s proto\-self\[[3](https://arxiv.org/html/2605.16728#bib.bib11)\], interoceptive inference and embodied predictive\-self accounts\[[20](https://arxiv.org/html/2605.16728#bib.bib9)\], and self\-organization accounts of embodied consciousness\[[17](https://arxiv.org/html/2605.16728#bib.bib6),[18](https://arxiv.org/html/2605.16728#bib.bib7)\]—all treating body\-related signals as the substrate from which a minimal sense of self can be built\.
For an artificial system, two architectural requirements follow\. First, bodily history must be sedimented in the system’s “perspective” for minimal subjectivity in the phenomenological sense\. Second, bodily information must enter via a bounded interoceptive pathway distinct from the exteroceptive observation vector\. These requirements motivate the body\-as\-internal\-allostatic\-state and metric\-based body\-coupled perception developed in Section[3](https://arxiv.org/html/2605.16728#S3)\.
### 2\.2Qualitative Geometry of Subjective Experience
A further structural feature follows from embodied pre\-reflectivity\. Phenomenology speaks oftransparency, that is, we do not encounter our own perspective as an object; rather, we encounter the worldthroughit\[[22](https://arxiv.org/html/2605.16728#bib.bib18)\]\. What is given to a subject is therefore inseparable from how the world is taken up under a determinate qualitative character\. Husserl’s distinction betweenintentional matterandintentional quality\[[10](https://arxiv.org/html/2605.16728#bib.bib17)\]captures this: how something is given—as feared, as inviting, as indifferent—is a structural feature of experience itself rather than a property inferred from its content\. On this reading, the qualitative character of subjective experience is not a functional variable summarizing the utility of states for action, but a qualitative organization of how a situation is given\.
However, this does not render the qualitative character empirically mysterious\. Geometry provides a useful way of characterizing qualitative structure: differences in how a situation is given can be expressed as geometric differences in the organization of the relevant state space\. A related intuition appears in Integrated Information Theory’s description of experience as a “constellation shape” in cause\-effect coordinate space\[[1](https://arxiv.org/html/2605.16728#bib.bib5),[14](https://arxiv.org/html/2605.16728#bib.bib4)\]\. Section[3](https://arxiv.org/html/2605.16728#S3)accordingly configures the perspective latent so that the geometric trajectory ofggbecomes an operational trace of how the agent’s qualitative learning history is organized\.
### 2\.3Conative Attunement as the Bridge to Action
If subjective experience is not itself a behavioral function, an additional structural link is needed for it to matter for action\. I useconationas a compact term for this link: the step by which a bodily organized way of taking up the world becomes a readiness to act\. This resonates with predictive processing accounts that contrast agent\-driven conative attitudes with stimulus\-driven cognitive ones\[[12](https://arxiv.org/html/2605.16728#bib.bib3)\], and with treatments of valence as emerging from the agent’s own regulatory dynamics rather than being externally imposed\[[9](https://arxiv.org/html/2605.16728#bib.bib2)\]\.
Once subjectivity is understood as needing to become behaviorally attuned, a conative moment is structurally required\. The architecture in Section[3](https://arxiv.org/html/2605.16728#S3)is a minimal computational rendering of this role: it separates a learned bodily tendency field from the policy, then adds a one\-sided alignment that trains the policy to respect that field without sending policy gradients back into it\. The field expresses how bodily viability has been organized across action possibilities; conation is the mediating step that makes the field behaviorally consequential\.
## 3Agent Architecture Design
### 3\.1Carryovers from Prior Work
The base architecture follows\[[15](https://arxiv.org/html/2605.16728#bib.bib12),[16](https://arxiv.org/html/2605.16728#bib.bib13)\]\. At each timesteptt, the agent receives an exteroceptive observationxtx\_\{t\}and the efference copyptp\_\{t\}of the previous actionat−1a\_\{t\-1\}\. A fast perceptual pathway encodes the current perceptual state asztz\_\{t\}, while a slower global latentgtg\_\{t\}carries history\-sensitive structure across time\. The policy statests\_\{t\}then combinesztz\_\{t\},ptp\_\{t\}, andgtg\_\{t\}, and feeds the categorical action policyπθ\(at∣st\)\\pi\_\{\\theta\}\(a\_\{t\}\\mid s\_\{t\}\)\. An action\-conditioned observation decoder predicts the next exteroceptive observationxt\+1x\_\{t\+1\}, so learning remains reward\-free\.
Three commitments carry over from this base: \(1\)ggevolves on a slower timescale than the policy and accumulates history; \(2\) policy\-side gradients are blocked from rewriting the perspective pathway; and \(3\)ggfeeds back into perceptual organization, so that the same nominal input can be interpreted differently under different accumulated histories\. Together, these allow the question of whether behaviorally similar agents may nevertheless differ in the internal organization of how the world is given to them\.
### 3\.2Architectural Extensions and Implementation Mechanisms
Figure 1:Architecture overview\.The perspective is connected to the interoceptive loop throughbt\+1b\_\{t\+1\}andη\(a\)\\eta\(a\)\. Exteroceptive and interoceptive inputs are fused intoMgM\_\{g\}\. Ablated cohorts remove either body→g\\to grouting or conative coupling\.The present paper extends this base skeleton in three directions, corresponding to the three foundational points developed in Section[2](https://arxiv.org/html/2605.16728#S2)\. These additions are designed to expand the role ofgg\. An overview of the full architecture is provided in Fig\.[1](https://arxiv.org/html/2605.16728#S3.F1)\.
#### Body as Internal Allostatic State\.
The agent has an internal scalar bodily\-viability variableutu\_\{t\}, which evolves in the environment under a slow allostatic process\. The agent receives a bounded interoceptive readout:
b~t=σ\(ut\)\\tilde\{b\}\_\{t\}=\\sigma\(u\_\{t\}\)\(1\)whereσ\\sigmais the logistic function\. Thus, the body is available from within, but only through a partial and saturating channel\.
The environment couplesutu\_\{t\}to position through a vertical affordance gradient\. Some regions are bodily favorable while others are not, independent of any exteroceptive cuextx\_\{t\}\. The full environmental setup is described in Section[4](https://arxiv.org/html/2605.16728#S4)\.
#### Fisher\-style Metric\-Based Perspective Geometry\.
In the previous architecture, perspective\-to\-perception feedback was implemented through FiLM\-based salience gating\[[16](https://arxiv.org/html/2605.16728#bib.bib13)\]\. The present model uses a metric\-based variant instead\. The exteroceptive encoder produceszobsz\_\{\\mathrm\{obs\}\}and the interoceptive codezbodyz\_\{\\mathrm\{body\}\}, which are concatenated into a fused stateztz\_\{t\}\. The perspective latentgtg\_\{t\}then induces a positive\-definite metricMgM\_\{g\}over this fused state\.
Following the information\-geometric view of the Fisher information as a local Riemannian metric on a statistical manifold\[[2](https://arxiv.org/html/2605.16728#bib.bib1)\],MgM\_\{g\}is defined as a learned Fisher\-style metric over the fused state space\. Concretely, a metric network mapsgtg\_\{t\}to the entries of a lower\-triangular matrixLgL\_\{g\}\. The metric is constructed as:
Mg=LgLg⊤\+ϵIM\_\{g\}=L\_\{g\}L\_\{g\}^\{\\top\}\+\\epsilon I\(2\)whereIIis the identity matrix andϵ\>0\\epsilon\>0is a small diagonal jitter term that ensures positive definiteness\.
In the metric condition,ztz\_\{t\}itself is preserved, as the perspective\-dependent modulation enters downstream through quadratic features in the state head:
ϕg\(zt\)=vec\[zt\(Mgzt\)⊤\]\\phi\_\{g\}\(z\_\{t\}\)=\\mathrm\{vec\}\\\!\\left\[z\_\{t\}\(M\_\{g\}z\_\{t\}\)^\{\\top\}\\right\]\(3\)wherevec\[⋅\]\\mathrm\{vec\}\[\\cdot\]flattens the resulting matrix\. The policy\-facing state is then computed fromztz\_\{t\},ϕg\(zt\)\\phi\_\{g\}\(z\_\{t\}\), the action traceptp\_\{t\}, andgtg\_\{t\}:st=State\(zt,ϕg\(zt\),pt,gt\)s\_\{t\}=\\mathrm\{State\}\\\!\\left\(z\_\{t\},\\,\\phi\_\{g\}\(z\_\{t\}\),\\,p\_\{t\},\\,g\_\{t\}\\right\)\. Through these steps,gtg\_\{t\}induces a stance\-dependent geometry over the fused state space, allowing inter\-dimensional couplings between exteroceptive and interoceptive components to shape the policy\-facing state\.
#### Body Decoder and Conative Attunement\.
The body decoder supports body\-prediction minimization by predicting action\-conditioned bodily consequencesbt\+1b\_\{t\+1\}\. For each candidate actiona∈𝒜a\\in\\mathcal\{A\}, it predicts bodily consequences, including the expected action\-conditioned tendency field:
η^t\(a\)≈𝔼\[ut\+k−ut∣a\(k\)\]\\hat\{\\eta\}\_\{t\}\(a\)\\approx\\mathbb\{E\}\\\!\\left\[u\_\{t\+k\}\-u\_\{t\}\\mid a^\{\(k\)\}\\right\]\(4\)wherekkis the counterfactual rollout horizon, anda\(k\)a^\{\(k\)\}denotes repeating actionaaforkksteps\. Thus,η^t\(a\)\\hat\{\\eta\}\_\{t\}\(a\)estimates the expected change in latent bodily viability if actionaawere sustained over that short horizon\. This field is learned from the environment\-computed counterfactual bodily change\. Although the agent receives only the bounded readoutb~t\\tilde\{b\}\_\{t\}, the tendency target is computed in the latent viability coordinateutu\_\{t\}, preserving directional information near the saturation limits of the readout\.
Importantly, the body decoder by itself does not directly drive the action\. Its outputs provide a learned bodily field for the perspectiveggpathway, but are not directly routed into the policy logits \(Fig\.[1](https://arxiv.org/html/2605.16728#S3.F1)\)\. Instead, conation links its output to action\. A detached conative score is computed from the predicted bodily tendency and the predicted next\-body state:
vt\(a\)=wηstopgrad\[η^t\(a\)\]\+wbstopgrad\[b^t\+1\(a\)\]v\_\{t\}\(a\)=w\_\{\\eta\}\\,\\mathrm\{stopgrad\}\[\\hat\{\\eta\}\_\{t\}\(a\)\]\+w\_\{b\}\\,\\mathrm\{stopgrad\}\[\\hat\{b\}\_\{t\+1\}\(a\)\]\(5\)This score is then converted into a soft action\-preference distribution:
qt\(a\)=exp\(vt\(a\)/T\)∑a′∈𝒜exp\(vt\(a′\)/T\)q\_\{t\}\(a\)=\\frac\{\\exp\(v\_\{t\}\(a\)/T\)\}\{\\sum\_\{a^\{\\prime\}\\in\\mathcal\{A\}\}\\exp\(v\_\{t\}\(a^\{\\prime\}\)/T\)\}\(6\)whereTTis the conative temperature\. The policy is then aligned toqtq\_\{t\}by
ℒconative\(t\)=DKL\(qt∥πθ\(⋅∣st,b~t\)\)\\mathcal\{L\}\_\{\\mathrm\{conative\}\}\(t\)=D\_\{\\mathrm\{KL\}\}\\\!\\left\(q\_\{t\}\\,\\middle\\\|\\,\\pi\_\{\\theta\}\(\\cdot\\mid s\_\{t\},\\tilde\{b\}\_\{t\}\)\\right\)\(7\)The conative target is detached, so this term trains the policy to respect the bodily field without sending policy gradients back into the body decoder or the perspective pathway\.
## 4Experiment Methods
### 4\.1Simulation Environment
The agent is trained in a fixed 2\-D15×1515\\times 15gridworld with two orthogonal gradients along the horizontal and vertical axes\. Along the horizontal axis, an exteroceptive prediction gradient controls observation noiseσ\\sigma, which decreases linearly from0\.400\.40on the leftmost column to0\.050\.05on the rightmost\. This axis is epistemically relevant to the reward\-free prediction objective; by moving rightward, the agent enters regions where its world\-model can predict more reliably, postponing prediction\-driven thermodynamic degradation\. The gradient is available only through local observation, wherextx\_\{t\}consists of the 8 surrounding cells\. The action space has 5 discrete actions: UP, DOWN, LEFT, RIGHT, and STAY\.
Along the vertical axis, a bodily affordance gradient controls the flow of the latent bodily\-viability variableutu\_\{t\}\. The top of the grid is bodily favorable and the bottom is bodily unfavorable\. The affordance gradient is sigmoid\-shaped, ranging from approximately\+0\.5\+0\.5at the top row to−0\.5\-0\.5at the bottom row with slope parameter1\.61\.6\. At each step, this affordance contributes to the latent body update together with metabolic and movement costs:
ut\+1=ρaffut−cmet−cmove𝕀moved\+λaffA\(it,jt\)u\_\{t\+1\}=\\rho\_\{\\mathrm\{aff\}\}u\_\{t\}\-c\_\{\\mathrm\{met\}\}\-c\_\{\\mathrm\{move\}\}\\mathbb\{I\}\_\{\\mathrm\{moved\}\}\+\\lambda\_\{\\mathrm\{aff\}\}A\(i\_\{t\},j\_\{t\}\)\(8\)whereA\(it,jt\)A\(i\_\{t\},j\_\{t\}\)is the affordance value at the agent’s current cell, and𝕀moved=1\\mathbb\{I\}\_\{\\mathrm\{moved\}\}=1if the agent changes cells and0otherwise\. The hyperparameters are set toρaff=0\.995\\rho\_\{\\mathrm\{aff\}\}=0\.995,cmet=0\.002c\_\{\\mathrm\{met\}\}=0\.002,cmove=0\.001c\_\{\\mathrm\{move\}\}=0\.001, andλaff=0\.05\\lambda\_\{\\mathrm\{aff\}\}=0\.05\. The agent does not observeutu\_\{t\}directly, but only the bounded interoceptive readoutb~t=σ\(ut\)\\tilde\{b\}\_\{t\}=\\sigma\(u\_\{t\}\)\. The body input is supplemented by a local four\-direction silhouette over cardinal neighbors, giving a noisy interoceptive indication of neighboring affordanceA\(it\+Δid,jt\+Δjd\)A\(i\_\{t\}\+\\Delta i\_\{d\},j\_\{t\}\+\\Delta j\_\{d\}\)with Gaussian noiseσsil=0\.1\\sigma\_\{\\mathrm\{sil\}\}=0\.1\.
For analysis, the15×1515\\times 15grid is divided into nine5×55\\times 5zones \(left/middle/right×\\timestop/middle/bottom\)\. The three rightmost zones are epistemically favorable, the three top zones bodily favorable, and the top\-right zone is where exteroceptive predictability and bodily viability jointly align\.
### 4\.2Training Protocol and Cohorts
All agents are trained in a single continuous training run\. Each episode begins from the center of the15×1515\\times 15grid and lasts up to 200 steps\. For each seed, training proceeds for 180 episodes\. The first 30 episodes serve as warmup, delaying actor, body, and conative losses while the predictive backbone stabilizes\. The remaining 150 episodes train the full model under the cohort\-specific switches\. Learned network parameters are carried across episodes\. The perspective stateggis also carried across episode boundaries, but with multiplicative decay \(g←0\.99gg\\leftarrow 0\.99\\,g\)\. The body prediction state is reset at episode boundaries\.
Learning remains reward\-free\. The observation decoder is trained by one\-step prediction error, the perspective latent is updated by the adaptive GRU/AlphaNet mechanism inherited from\[[16](https://arxiv.org/html/2605.16728#bib.bib13)\], and the policy is trained by the reward\-free actor objective used in\[[15](https://arxiv.org/html/2605.16728#bib.bib12)\], here augmented by body decoder and conative terms\. Schematically, training minimizes
ℒ=ℒbase\+λbodyℒbody\+λconℒconative\\mathcal\{L\}=\\mathcal\{L\}\_\{\\mathrm\{base\}\}\+\\lambda\_\{\\mathrm\{body\}\}\\mathcal\{L\}\_\{\\mathrm\{body\}\}\+\\lambda\_\{\\mathrm\{con\}\}\\mathcal\{L\}\_\{\\mathrm\{conative\}\}\(9\)whereℒbase\\mathcal\{L\}\_\{\\mathrm\{base\}\}collects the observation\-prediction and actor terms,ℒbody\\mathcal\{L\}\_\{\\mathrm\{body\}\}trains the body decoder heads, andℒconative\\mathcal\{L\}\_\{\\mathrm\{conative\}\}is the policy\-only alignment term in Eq\.[7](https://arxiv.org/html/2605.16728#S3.E7)\.
Under same environment and optimizer settings, three experimental cohorts are trained over 30 seeds \(0\-29\)\. The cohorts isolate each mechanism’s role by holding the other active, producing a cross\-dissociation\.Fullenables both body→g\{\\rightarrow\}grouting and conative alignment\.No conationkeeps body→g\{\\rightarrow\}grouting and the body decoder trainable, but omits conative loss in Eq\.[9](https://arxiv.org/html/2605.16728#S4.E9)\.No𝐛𝐨𝐝𝐲→g\\boldsymbol\{\\mathrm\{body\}\\to g\}keeps conation active but removes the body\-prediction error input to the GRU update ofgg\. These ablation paths are also shown at Fig\.[1](https://arxiv.org/html/2605.16728#S3.F1)\.
### 4\.3Analysis Protocol
The analyses test two questions: \(1\) whether conation transforms a learned bodily field into action, and \(2\) whether body→g\{\\to\}gGRU update leaves a recoverable geometric residue of bodily history\.
#### Conative behavioral link\.
The effect of conation is measured from final training behavior, body\-decoder calibration, and conative readiness\. Spatial zone occupancy andqt\(a\)q\_\{t\}\(a\)\(Eq\.[6](https://arxiv.org/html/2605.16728#S3.E6)\) are averaged over the final 50 training episodes per seed\. Body\-decoder calibration is assessed by comparingη^t\(a\)\\hat\{\\eta\}\_\{t\}\(a\)\(Eq\.[4](https://arxiv.org/html/2605.16728#S3.E4)\) against environment\-computed counterfactual tendency targets\.
#### Geometric residue assay\.
Frozen\-parameter rollouts are used to test whether bodily perturbation history leaves a residue ingg\. Learned weights remain fixed, whileggis reset at rollout onset and then evolves online through the agent’s forward dynamics\. Trained agents are run for 160 steps in the same environment under matched control and body\-shock conditions\. In the control condition, bodily dynamics are unperturbed\. In the body\-shock condition, the latent bodily potentialutu\_\{t\}is perturbed duringt=60t=60\-7979by addingΔushock=−0\.08\\Delta u\_\{\\mathrm\{shock\}\}=\-0\.08at each timestep; the agent observes this only through the bounded readoutb~t=σ\(ut\)\\tilde\{b\}\_\{t\}=\\sigma\(u\_\{t\}\)\. Analyses focus on the recovery phaset=80t=80\-159159after the shock has ceased, so that measured separation reflects residual history\.
The assay is summarized by recovery\-phase PCA displacement, same\-state history\-ggprobe, and recovery shock\-control distance\. In the probe, shock\-conditionedggstates are injected into an identical fixed input set\. Differences in policy state and metric geometry therefore measure how bodily history reorganizes the same input\. Seed\-level PCA displacement is then compared with same\-state metric\-geometry separation to test whether latent displacement and induced geometry covary\.
## 5Analysis Results
### 5\.1Conation Translates Bodily Tendency into Action
Figure 2:Conation is required to translate bodily tendency to action\.\(a\-b\)\.Median zone occupancy over the final 50 training episodes with IQR\. Cohorts with active conation loss show high top\-right zone occupancy and low bottom zone occupancy, whereas the No conation cohort does not show this pattern\.\(c\)\.The body decoder learnsη^UP−DOWN\\hat\{\\eta\}\_\{\\mathrm\{UP\-DOWN\}\}in all cohorts\.\(d\)\.Only cohorts with conation convert the body field tendency into selective action readiness\.The first analysis shows that a learned bodily tendency field becomes behaviorally consequential only through conative alignment\. Fig\.[2](https://arxiv.org/html/2605.16728#S5.F2)clearly shows this result\. Fig\.[2](https://arxiv.org/html/2605.16728#S5.F2)\(a\-b\) report spatial occupancy averaged over the final 50 training episodes\.FullandNo𝐛𝐨𝐝𝐲→g\\boldsymbol\{\\mathrm\{body\}\\to g\}agents reliably dwell in the top\-right zone, where exteroceptive predictability and bodily viability jointly align \(medians69\.0%69\.0\\%and70\.1%70\.1\\%vs\. a 9\-zone chance of11\.1%11\.1\\%\), and rarely visit the bodily unfavorable bottom zone \(both2\.2%2\.2\\%\)\.No conationagents are markedly weaker: top\-right occupancy drops to38\.4%38\.4\\%and bottom\-zone visits rise to14\.6%14\.6\\%, with several high\-failure outliers approaching75%75\\%\.
Fig\.[2](https://arxiv.org/html/2605.16728#S5.F2)\(c\) and[2](https://arxiv.org/html/2605.16728#S5.F2)\(d\) rule out failed bodily tendency learning as the explanation\. Fig\.[2](https://arxiv.org/html/2605.16728#S5.F2)\(c\) compares the predicted action tendencyη^UP−DOWN\\hat\{\\eta\}\_\{\\mathrm\{UP\-DOWN\}\}against the environment\-computed ground truth\. All three cohorts produce strongly calibrated predictions, showing thatη^t\\hat\{\\eta\}\_\{t\}is learned even without conative alignment\. Fig\.[2](https://arxiv.org/html/2605.16728#S5.F2)\(d\) plots the conative target distributionqt\(a\)q\_\{t\}\(a\)\(Eq\.[6](https://arxiv.org/html/2605.16728#S3.E6)\) over the final 50 episodes\. Cohorts with conation selectively biasq\(a\)q\(a\)along the body prediction \(UP≈25\.5%\\approx 25\.5\\%, DOWN≈12\.8%\\approx 12\.8\\%\), whereas No conation remains uniform\.
Together, these results show that the body decoder learns the bodily tendency field across all cohorts, but only the conative link converts it into action readiness, and thereby into affordance\-aligned behavior\. In other words, agents without conation learn the qualitative contrast but fail to express it as an overt behavior\.
### 5\.2Bodily Perturbation Leaves a Geometric Residue ingg
Figure 3:Bodily perturbation leaves a geometric residue ingg\.Bands and boxes show median and IQR\.\(a\-c\)\.Median PCA displacement vectors show strong recovery\-phase perturbation effects when body→g\\to grouting is intact\.\(d\)\.Same\-state metric\-spectrum distance is high in Full and No conation, but reduced in No body→g\\to g\.\(e\)\.PCA displacement correlates with metric\-spectrum distance\.\(f\)\.Given the recovery trajectory among cohorts, body perturbation effect onggis largest in the Full cohort\.Fig\.[3](https://arxiv.org/html/2605.16728#S5.F3)shows that bodily perturbation history clearly leaves a residue in the geometry induced by the perspective latentgg\. When body→g\\to grouting is intact, bodily perturbation substantially reorganizes the recovery\-phase trajectory ofgg\. Fig\.[3](https://arxiv.org/html/2605.16728#S5.F3)\(a\-c\) show the median PCA displacement vectors after the body\-shock perturbation\. The displacement is largest in the Full cohort \(median0\.250\.25\), intermediate in No conation \(0\.190\.19\), and strongly reduced in No body→g\\to g\(0\.040\.04\)\. Fig\.[3](https://arxiv.org/html/2605.16728#S5.F3)\(f\) shows this over time with the IQR band\. Together, these results show that bodily perturbation does leave a persistent geometric trace when bodily prediction history can enter the perspective update\.
Fig\.[3](https://arxiv.org/html/2605.16728#S5.F3)\(d\) asks whether this latent residue also changes the geometry induced byggunder matched inputs\. In the same\-state history\-ggprobe, metric\-spectrum distance is high in the two cohorts with intact body→g\\to grouting and strongly reduced in No body→g\\to g, differing atp<\.001p<\.001\. Bodily perturbation history therefore changes not only wheregglies, but also the metric geometry induced byggunder identical probe inputs\. Fig\.[3](https://arxiv.org/html/2605.16728#S5.F3)\(e\) further supports this interpretation by showing a positive seed\-level relationship between PCA displacement and metric\-geometry separation \(ρ=0\.76\\rho=0\.76,p<\.001p<\.001\)\. The latent body shock itself was matched across cohorts \(medianΔu≈−1\.25\\Delta u\\approx\-1\.25, all pairwise comparisons n\.s\.\), so the geometric separation cannot be attributed to unequal perturbation magnitude\.
This result shows that, regardless of whether conative link is present, perspective latentggcan exhibit a qualitative difference of its geometric trajectory even if it is not necessarily expressed directly as behavior\. It also confirms that the architecture proposed in this study successfully implements such targeted aspect of subjective experience\.
## 6Discussion and Remarks
#### A Double Dissociation Between Conation and Perspective\.
The present architecture separates two roles often conflated in agent models: being affected by bodily history and being disposed to act on that affection\. Phenomenologically, a qualitatively valenced way of taking up the world is not identical to the conative tendency to realize that valence in action\[[12](https://arxiv.org/html/2605.16728#bib.bib3)\]\. The present results make this distinction explicit in minimal form: body→g\\to grouting makes bodily history perspectivally consequential, whereas conative alignment makes bodily tendency behaviorally consequential\.
#### Qualitative Geometry, Not Scalar Valence\.
The body decoder could be mistaken for a state\-action value estimator, but this is not the intended interpretation\. Its tendency field is trained on counterfactual bodily change rather than return, and without conation the policy need not express it\. The field is therefore not a scalar reward proxy, but an action\-conditioned bodily organization that can remain behaviorally latent or be coupled to readiness\. The geometric assay extends this point: when bodily history is routed intogg, the same input is reorganized through a different metric geometry, so the relevant object is not a single valence index but a structured difference in how the situation is taken up\.
#### Limitations and Extensions\.
These results do not claim that the presented agent is conscious or that subjective experience has been fully realized\. Rather, the architecture tests minimal structural roles of embodied subjectivity, including bodily sedimentation in perspective, qualitative geometric organization, and conative linkage to action\. Future work should test whether this separation scales beyond the present simple gridworld and scalar body\.
## References
- \[1\]L\. Albantakis, L\. Barbosa, G\. Findlay, M\. Grasso, A\. M\. Haun, W\. Marshall, W\. G\. P\. Mayner, A\. Zaeemzadeh, M\. Boly, B\. E\. Juel, S\. Sasai, K\. Fujii, I\. David, J\. Hendren, J\. P\. Lang, and G\. Tononi\(2023\)Integrated information theory \(IIT\) 4\.0: formulating the properties of phenomenal existence in physical terms\.PLOS Computational Biology19\(10\),pp\. 1–45\.External Links:[Document](https://dx.doi.org/10.1371/journal.pcbi.1011465)Cited by:[§2\.2](https://arxiv.org/html/2605.16728#S2.SS2.p2.1)\.
- \[2\]S\. Amari\(2016\)Information geometry and its applications\.Applied Mathematical Sciences, Vol\.194,Springer\.External Links:[Document](https://dx.doi.org/10.1007/978-4-431-55978-8)Cited by:[§3\.2](https://arxiv.org/html/2605.16728#S3.SS2.SSS0.Px2.p2.3)\.
- \[3\]A\. Damasio\(1999\)The feeling of what happens: body and emotion in the making of consciousness\.Harcourt Brace and Co\.Cited by:[§2\.1](https://arxiv.org/html/2605.16728#S2.SS1.p1.1)\.
- \[4\]G\. M\. Edelman\(1989\)The remembered present: a biological theory of consciousness\.Basic Books\.Cited by:[§2\.1](https://arxiv.org/html/2605.16728#S2.SS1.p1.1)\.
- \[5\]S\. Gallagher and D\. Zahavi\(2008\)The phenomenological mind\.Routledge\.Cited by:[§1](https://arxiv.org/html/2605.16728#S1.p1.1)\.
- \[6\]S\. Gallagher\(2000\)Philosophical conceptions of the self: implications for cognitive science\.Trends in Cognitive Sciences4\(1\),pp\. 14–21\.External Links:[Document](https://dx.doi.org/10.1016/S1364-6613%2899%2901417-5)Cited by:[§2\.1](https://arxiv.org/html/2605.16728#S2.SS1.p1.1)\.
- \[7\]S\. Gallagher\(2023\)Embodied and enactive approaches to cognition\.Cambridge University Press\.Cited by:[§1](https://arxiv.org/html/2605.16728#S1.p1.1)\.
- \[8\]M\. Heidegger\(1996\)Being and time\.Revised edition,SUNY Series in Contemporary Continental Philosophy\.Note:Translated fromSein und ZeitExternal Links:ISBN 9780791426784Cited by:[§2\.1](https://arxiv.org/html/2605.16728#S2.SS1.p1.1)\.
- \[9\]C\. Hesp, R\. Smith, T\. Parr, M\. Allen, K\. J\. Friston, and M\. J\. D\. Ramstead\(2021\)Deeply felt affect: the emergence of valence in deep active inference\.Neural Computation33\(2\),pp\. 398–446\.External Links:[Document](https://dx.doi.org/10.1162/neco%5Fa%5F01341)Cited by:[§2\.3](https://arxiv.org/html/2605.16728#S2.SS3.p1.1)\.
- \[10\]E\. Husserl\(2001\)Logical investigations i\-ii\.Routledge\.Note:Translated fromLogische Untersuchungen I\-IICited by:[§1](https://arxiv.org/html/2605.16728#S1.p3.1),[§2\.2](https://arxiv.org/html/2605.16728#S2.SS2.p1.1)\.
- \[11\]E\. Husserl\(2014\)Ideas for a pure phenomenology and phenomenological philosophy i\.Hackett Publishing Company\.Note:Translated fromIdeen zu einer reinen Phänomenologie und phänomenologischen Philosophie ICited by:[§1](https://arxiv.org/html/2605.16728#S1.p1.1)\.
- \[12\]A\. Kiefer and J\. Hohwy\(2025\)A predictive architecture for the attitudes\.OSF Preprints\.External Links:[Document](https://dx.doi.org/10.31219/osf.io/4n27k%5Fv1)Cited by:[§2\.3](https://arxiv.org/html/2605.16728#S2.SS3.p1.1),[§6](https://arxiv.org/html/2605.16728#S6.SS0.SSS0.Px1.p1.1)\.
- \[13\]M\. Merleau\-Ponty\(2013\)Phenomenology of perception\.Routledge\.Note:Translated fromPhénoménologie de la perceptionCited by:[§1](https://arxiv.org/html/2605.16728#S1.p1.1),[§2\.1](https://arxiv.org/html/2605.16728#S2.SS1.p1.1)\.
- \[14\]M\. Oizumi, L\. Albantakis, and G\. Tononi\(2014\)From the phenomenology to the mechanisms of consciousness: integrated information theory 3\.0\.PLOS Computational Biology10\(5\),pp\. e1003588\.External Links:[Document](https://dx.doi.org/10.1371/journal.pcbi.1003588)Cited by:[§2\.2](https://arxiv.org/html/2605.16728#S2.SS2.p2.1)\.
- \[15\]H\. Pae\(2026\)Minimal computational preconditions for subjective perspective in artificial agents\.External Links:2602\.02902,[Link](https://arxiv.org/abs/2602.02902)Cited by:[§1](https://arxiv.org/html/2605.16728#S1.p2.2),[§1](https://arxiv.org/html/2605.16728#S1.p4.1),[§3\.1](https://arxiv.org/html/2605.16728#S3.SS1.p1.12),[§4\.2](https://arxiv.org/html/2605.16728#S4.SS2.p2.4)\.
- \[16\]H\. Pae\(2026\)Same world, differently given: history\-dependent perceptual reorganization in artificial agents\.External Links:2604\.04637,[Link](https://arxiv.org/abs/2604.04637)Cited by:[§1](https://arxiv.org/html/2605.16728#S1.p2.2),[§1](https://arxiv.org/html/2605.16728#S1.p4.1),[§3\.1](https://arxiv.org/html/2605.16728#S3.SS1.p1.12),[§3\.2](https://arxiv.org/html/2605.16728#S3.SS2.SSS0.Px2.p1.5),[§4\.2](https://arxiv.org/html/2605.16728#S4.SS2.p2.4)\.
- \[17\]A\. Safron\(2020\)An integrated world modeling theory \(IWMT\) of consciousness: combining integrated information and global neuronal workspace theories with the free energy principle and active inference framework; toward solving the hard problem and characterizing agentic causation\.Frontiers in Artificial Intelligence3,pp\. 30\.External Links:[Document](https://dx.doi.org/10.3389/frai.2020.00030)Cited by:[§2\.1](https://arxiv.org/html/2605.16728#S2.SS1.p1.1)\.
- \[18\]A\. Safron\(2021\)The radically embodied conscious cybernetic bayesian brain: from free energy to free will and back again\.Entropy23\(6\),pp\. 783\.External Links:[Document](https://dx.doi.org/10.3390/e23060783)Cited by:[§2\.1](https://arxiv.org/html/2605.16728#S2.SS1.p1.1)\.
- \[19\]J\. Sartre\(2021\)Being and nothingness: an essay in phenomenological ontology\.Washington Square Press\.Note:Translated fromL’Être et le néantCited by:[§2\.1](https://arxiv.org/html/2605.16728#S2.SS1.p1.1)\.
- \[20\]A\. K\. Seth and M\. Tsakiris\(2018\)Being a beast machine: the somatic basis of selfhood\.Trends in Cognitive Sciences22\(11\),pp\. 969–981\.External Links:[Document](https://dx.doi.org/10.1016/j.tics.2018.08.008)Cited by:[§2\.1](https://arxiv.org/html/2605.16728#S2.SS1.p1.1)\.
- \[21\]E\. Thompson\(2007\)Mind in life: biology, phenomenology, and the sciences of mind\.Harvard University Press\.Cited by:[§1](https://arxiv.org/html/2605.16728#S1.p1.1)\.
- \[22\]D\. Zahavi\(2005\)Subjectivity and selfhood: investigating the first\-person perspective\.Bradford Book/MIT Press\.Cited by:[§1](https://arxiv.org/html/2605.16728#S1.p1.1),[§2\.1](https://arxiv.org/html/2605.16728#S2.SS1.p1.1),[§2\.2](https://arxiv.org/html/2605.16728#S2.SS2.p1.1)\.Similar Articles
What if the path to genuine AI companionship isn't bigger models — it's better architecture?
Introduces PHI // DRIFT, a cognitive middleware that enhances LLMs with persistent homeostatic needs, salience-weighted memory, and a Jungian shadow module, claiming that architecture produces measurably different behavior than model scale. Preprint under review.
Toward Enactive Artificial Intelligence
This paper advocates for incorporating enactive approaches to perception and cognition into AI, highlighting four key concepts: experience, action-perception inseparability, autonomy, and embodiment. It finds resonance with reinforcement learning but suggests broader integration of enactive ideas.
A Persona-Based Evaluation Framework for Pluralistic Alignment in Generative AI
This paper introduces a persona-based evaluation framework that uses synthetic cognitive profiles to represent diverse human perspectives for pluralistic alignment in generative AI, addressing the limitations of monolithic benchmarks.
Emergence of grounded compositional language in multi-agent populations
OpenAI researchers propose a multi-agent learning environment where agents develop a grounded compositional language with defined vocabulary and syntax to achieve goals. The study also observes emergence of non-verbal communication like pointing when language is unavailable.
Context: Proactive Goal-Directed Intelligence via Composable Sandboxed Programs, Declarative Wiring, and Structured Interaction
This paper introduces Context, a new architecture for proactive goal-directed agents that replaces reactive chatbots. It presents formal theorems proving efficiency gains through composable sandboxed programs, declarative wiring, and proactive state machines, with an open-source implementation.