DTVEM-RE: A Hierarchical Random-Effects Extension of the Differential Time-Varying Effect Model for Person-Specific Multi-Lag Estimation in Intensive Longitudinal Data

arXiv cs.LG Papers

Summary

This paper presents DTVEM-RE, a hierarchical random-effects extension of the Differential Time-Varying Effect Model that estimates person-specific multi-lag coefficients via Hamiltonian Monte Carlo in Stan, addressing a limitation of the original DTVEM which assumed a single group-level lag structure. Simulation and empirical results demonstrate recovery of between-person variance and improvements over hierarchical and non-hierarchical baselines.

arXiv:2606.14116v1 Announce Type: new Abstract: The Differential Time-Varying Effect Model (DTVEM) of Jacobson et al. (2019) is a popular tool for finding the best time lag in intensive longitudinal data, but it assumes everyone shares the same lag structure. The original authors named fixing this as future work, and it clashes with the premise of modern clinical research, which is that people differ. We present DTVEM-RE, an extension that lets each person have their own lag coefficients, with two versions of the confirmatory step: a discrete-time hierarchical Bayesian VAR in Stan, which pools across people and gives calibrated uncertainty, and a continuous-time per-person Ornstein-Uhlenbeck model in ctsem, which handles unevenly spaced beeps directly. We report four results. A simulation shows the Bayesian version recovers the between-person spread tau_a with bias below 0.01 and coverage of 90 to 93 percent. On the Fisher et al. (2017) EMA dataset (N=40), person-specific lag-1 effects vary by an order of magnitude across three mood items, the Bayesian and GAMM estimates agree closely (r=0.87 to 0.92), and DTVEM-RE gives the best one-step-ahead prediction among four discrete-time methods. A multi-lag version shows all nine tau_k values have credible intervals excluding zero, and the lag where people differ most changes across items, something lag-1-only methods like mlVAR cannot detect. Finally, the two versions agree almost exactly on person-specific lag-1 estimates (r >= 0.995), differing only as shrinkage predicts. DTVEM-RE is, to our knowledge, the first person-specific implementation of DTVEM-style lag detection, and it contains standard DTVEM as a special case.
Original Article
View Cached Full Text

Cached at: 06/15/26, 09:10 AM

# DTVEM-RE: A Hierarchical Random-Effects Extension of the Differential Time-Varying Effect Model for Person-Specific Multi-Lag Estimation in Intensive Longitudinal Data
Source: [https://arxiv.org/html/2606.14116](https://arxiv.org/html/2606.14116)
###### Abstract

The Differential Time\-Varying Effect Model \(DTVEM\) introduced by Jacobson, Chow, and Newman \(2019\) has become a widely\-used tool for identifying optimal lag structures in intensive longitudinal data, pairing a generalized additive mixed model \(GAMM\) exploratory stage with a state\-space vector autoregression \(VAR\) confirmatory stage\. However, DTVEM as published assumes a single group\-level lag structure shared across individuals, a limitation that the original authors named as future work and one that conflicts with the idiographic premise of much modern psychopathology research\. We present DTVEM\-RE, a hierarchical random\-effects extension of DTVEM that estimates person\-specific multi\-lag coefficients with shrinkage toward group\-level means via Hamiltonian Monte Carlo in Stan\. We report three sets of results\. First, a simulation study across four heterogeneity levels confirms clean recovery of the between\-person variance parameter \(τa\\tau\_\{a\}\) with absolute bias below 0\.01 and credible interval coverage of 90 to 93 percent at sample sizes typical of EMA studies\. Second, a three\-item empirical demonstration on the Fisher et al\. \(2017\) ecological momentary assessment dataset \(N=40N=40outpatients with generalized anxiety and major depressive disorder\) shows that person\-specific lag\-1 autoregressive effects vary by an order of magnitude across affect items, that hierarchical Bayesian and independent GAMM estimates of person\-specific coefficients agree closely \(r=0\.87r=0\.87to0\.920\.92\), and that DTVEM\-RE achieves the best one\-step\-ahead predictive log\-likelihood and root\-mean\-square error among four hierarchical and non\-hierarchical baselines, albeit by modest margins\. Third, a multi\-lag extension shows that all nineτk\\tau\_\{k\}estimates across three items and three lags have 90 percent credible intervals excluding zero, with the lag at which heterogeneity is largest differing across items\. This multi\-lag person\-specific heterogeneity is outside the modeling scope of existing hierarchical VAR methods that estimate random effects only on lag\-1 coefficients, such as mlVAR\. We conclude that DTVEM\-RE provides, to our knowledge, the first principled idiographic implementation of DTVEM\-style lag detection while retaining standard DTVEM as a fixed\-effects special case\.

Keywords:DTVEM; hierarchical state\-space model; ecological momentary assessment; idiographic psychopathology; random effects; intensive longitudinal data; mood and anxiety dynamics

## 1 Introduction

### 1\.1 Idiographic dynamics in clinical psychology

The recognition that individuals differ meaningfully in their psychological dynamics has reshaped quantitative clinical research over the past two decades\.Molenaar \([2004](https://arxiv.org/html/2606.14116#bib.bib14)\)formalized this concern in a non\-ergodicity argument, demonstrating that conclusions drawn from between\-person variation generally do not transfer to within\-person dynamics\. The implication is methodologically consequential: standard cross\-sectional and group\-based longitudinal models can misrepresent the very phenomena they are designed to study when the question concerns processes unfolding within an individual\.

Empirical evidence for this position has accumulated rapidly\.Fisher et al\. \([2017](https://arxiv.org/html/2606.14116#bib.bib6)\)provided a particularly influential demonstration in a sample of 40 outpatients with generalized anxiety disorder \(GAD\), major depressive disorder \(MDD\), or both, showing that person\-specific symptom networks differed substantially from group\-aggregated patterns\.Fisher et al\. \([2018](https://arxiv.org/html/2606.14116#bib.bib7)\)formalized this into a generalizability critique, arguing that lack of group\-to\-individual transferability is a structural threat to human subjects research that cannot be solved through larger samples alone\. Subsequent work has built personalized treatment\-targeting algorithms on top of idiographic dynamic models\(Fernandez et al\.,[2017](https://arxiv.org/html/2606.14116#bib.bib5); Fisher et al\.,[2019](https://arxiv.org/html/2606.14116#bib.bib8)\), contributing to a broader shift toward person\-centered clinical methodology\.

### 1\.2 The DTVEM framework and its group\-level limitation

A core question in this research program concerns the temporal architecture of symptom dynamics: at what time interval does one symptom or mood predict another? The Differential Time\-Varying Effect Model \(DTVEM\) introduced byJacobson et al\. \([2019](https://arxiv.org/html/2606.14116#bib.bib11)\)addresses this question through an elegant two\-stage hybrid procedure\.

In Stage 1, an exploratory generalized additive mixed model \(GAMM\) smooths the autoregressive coefficient across lag distanceΔ​t\\Delta t, producing a continuous lag\-effect curvef^​\(Δ​t\)\\hat\{f\}\(\\Delta t\)\. Peaks and valleys in this curve, where the 95 percent confidence band excludes zero, identify candidate lags at which dependencies appear statistically reliable\. In Stage 2, a vector autoregression \(VAR\) model is specified using only those candidate lags as predictors, then estimated as a state\-space model in OpenMx\. The result is a confirmatory model with only the lags that Stage 1 has identified as supported by the data\.

DTVEM has been widely adopted across psychopathology, sleep, personality, and digital biomarker research\. Its appeal lies in combining nonparametric flexibility \(Stage 1 makes few assumptions about lag shape\) with parsimonious confirmatory estimation \(Stage 2 fits only the structure the data supports\)\.

However, DTVEM as published assumes that the lag structuref^​\(Δ​t\)\\hat\{f\}\(\\Delta t\)is shared across all individuals in the dataset\. The exploratory smoother is fit to pooled data, and the confirmatory VAR estimates a single set of coefficients applied uniformly\. The original authors explicitly acknowledged this limitation in their concluding remarks:

> “The present simulation studies did not consider person\-specific differences in dynamics or lag structures\. Given this, future extensions of the DTVEM model may be able to include random effects within the GAMM framework to model person\-specific differences”\(Jacobson et al\.,[2019](https://arxiv.org/html/2606.14116#bib.bib11), p\. 311\)\.

This limitation is more than technical\. The premise of idiographic clinical research is that individuals differ in their dynamics\. Applying a method that assumes shared lag structure to a research question fundamentally about heterogeneity is internally inconsistent: it discards the very variation the research is designed to study\. The methodological gap is precisely the one named by the original authors as a priority for future work\.

### 1\.3 The present work

We present DTVEM\-RE \(DTVEM with Random Effects\), a hierarchical extension that estimates person\-specific lag profiles with shrinkage toward a group\-level mean\. The exploratory stage uses factor\-smooth GAMMs to recover individual lag curves with automatic per\-person regularization\. The confirmatory stage embeds those lags in a hierarchical Bayesian state\-space VAR estimated via Hamiltonian Monte Carlo in Stan\.

The contributions are threefold\. First, we specify the DTVEM\-RE model and provide a Stan implementation, with full code released for replication\. Second, we validate parameter recovery through a controlled simulation study, showing that the between\-person variance parameterτa\\tau\_\{a\}is recovered without meaningful bias and with near\-nominal credible interval coverage at the sample sizes typical of EMA studies\. Third, we apply DTVEM\-RE to the Fisher et al\. \(2017\) ecological momentary assessment dataset and demonstrate \(a\) substantial between\-person heterogeneity in lag\-1 autoregressive effects across three affect items, \(b\) modest but consistent predictive improvement over alternative methods in held\-out evaluation, and \(c\) statistically robust between\-person heterogeneity at lags 1, 2, and 3 simultaneously in a multi\-lag extension, with the lag at which heterogeneity is largest differing across items\. This last finding is outside the modeling scope of existing hierarchical VAR methods that place random effects only on lag\-1 coefficients\.

The paper proceeds as follows\. Section 2 reviews related methods and positions DTVEM\-RE within the existing landscape\. Section 3 specifies the data, the model, the priors, the estimation procedure, and the simulation and held\-out prediction protocols\. Section 4 reports results in four parts: parameter recovery in simulation, the Stage 1 GAMM exploration on Fisher’s data, the lag\-1 Stage 2 fits with cross\-method validation and held\-out prediction, and the multi\-lag extension\. Section 5 discusses implications, limitations, and directions for future work\.

## 2 Related Methods

DTVEM\-RE sits at the intersection of two methodological literatures: hierarchical extensions of VAR models for intensive longitudinal data, and lag\-exploration methods that go beyond fixed lag\-1 specifications\. We briefly review each\.

### 2\.1 Hierarchical lag\-1 methods

The dominant hierarchical extension of VAR models for idiographic psychopathology research is multilevel VAR, or mlVAR\(Bringmann et al\.,[2013](https://arxiv.org/html/2606.14116#bib.bib2); Epskamp et al\.,[2018](https://arxiv.org/html/2606.14116#bib.bib4)\)\. The framework originated withBringmann et al\. \([2013](https://arxiv.org/html/2606.14116#bib.bib2)\), which introduced a multilevel VAR with random effects on lag\-1 autoregressive and cross\-lagged coefficients for ESM data, and was extended byEpskamp et al\. \([2018](https://arxiv.org/html/2606.14116#bib.bib4)\)to combine temporal and contemporaneous network estimation\. mlVAR is widely used and has informed many of the substantive findings in the network psychopathology literature\. As implemented, however, it estimates random effects only on lag\-1 coefficients; it is not designed to discover whether between\-person heterogeneity differs in magnitude across lag distances\.

Group Iterative Multiple Model Estimation \(GIMME\) and its latent\-variable extension LV\-GIMME\(Gates et al\.,[2020](https://arxiv.org/html/2606.14116#bib.bib10)\)take a different approach to heterogeneity, performing a data\-driven search for person\-specific structural paths\. Lagged paths can be included, and GIMME does not constrain itself to lag\-1 paths in principle\. However, GIMME is a model\-search method rather than a hierarchical\-pooling method; it does not place a population distribution on person\-specific lag coefficients and does not provide the shrinkage that hierarchical models do at moderate per\-person sample sizes\.

Multi\-VAR\(Fisher et al\.,[2022](https://arxiv.org/html/2606.14116#bib.bib9)\)estimates VAR models for multiple subjects simultaneously using adaptive\-LASSO penalization, allowing for shared and idiosyncratic transition\-matrix elements across persons\. The framework supports multiple lags, but it does not place a hierarchical population distribution on person\-specific lag coefficients with explicit shrinkage\. The mechanism for handling heterogeneity is penalized estimation rather than partial pooling\.

### 2\.2 Multi\-lag exploration methods

DTVEM\(Jacobson et al\.,[2019](https://arxiv.org/html/2606.14116#bib.bib11)\)is the leading method for nonparametric lag exploration in intensive longitudinal data\. Its GAMM\-based Stage 1 makes no parametric assumption about the shape of the lag\-effect kernel, allowing detection of complex structures including delayed effects, secondary peaks, and oscillatory patterns\. However, DTVEM operates at the group level only\.

Continuous\-time alternatives, including hierarchical ctsem\(Driver & Voelkle,[2018](https://arxiv.org/html/2606.14116#bib.bib3)\), model the underlying process in continuous time and naturally handle irregular spacing\. These methods impose an exponential decay shape by construction, sacrificing the nonparametric flexibility of DTVEM in exchange for principled handling of unequal time intervals\. Hierarchical ctsem supports random effects across all model parameters, including the drift matrix that governs temporal dependence, but the lag\-effect shape is fixed by the continuous\-time formulation rather than estimated nonparametrically\.

### 2\.3 The gap DTVEM\-RE fills

To our knowledge, no published method combines nonparametric multi\-lag exploration with hierarchical pooling on person\-specific lag coefficients at each detected lag\. The closest existing approaches each cover one half of this combination:

- •Standard DTVEM: multi\-lag exploration, group\-level only\.
- •mlVAR: hierarchical pooling, but on lag\-1 random effects only\.
- •LV\-GIMME and multi\-VAR: support multiple lags and person\-specific structure, but handle heterogeneity via model search or penalization rather than hierarchical pooling\.
- •Hierarchical ctsem: hierarchical and continuous\-time, but with exponential decay imposed by the continuous\-time formulation\.

## 3 Methods

This section unifies the data, model specification, estimation procedure, and simulation design\. The presentation order is substantive material first \(what we analyzed\) followed by methodological material in service of it\. Stage 1 of DTVEM\-RE is a hierarchical GAMM; Stage 2 is a hierarchical Bayesian state\-space VAR\. When the between\-person variance components in Stage 2 are constrained to zero, DTVEM\-RE reduces to standard DTVEM\.

### 3\.1 Data

We use the publicly available Fisher et al\. \(2017\) ecological momentary assessment \(EMA\) dataset, hosted on the Open Science Framework at[https://osf\.io/zefbc/](https://osf.io/zefbc/)\. The sample comprises 40 outpatients meeting DSM criteria for generalized anxiety disorder \(GAD\), major depressive disorder \(MDD\), or both\. Each participant completed smartphone\-delivered EMA prompts four times daily at pseudo\-random intervals within waking hours, for approximately thirty days\. Each prompt collected0to100100visual\-analog ratings on twenty\-eight affect and anxiety items\.

After dropping missed prompts, the analytic dataset comprised 4,463 completed beeps across the forty participants, with a median of 113 beeps per person \(range 90 to 151\)\. Two participants had within\-protocol compliance breaks exceeding seventy\-two hours; we segmented these into separate sessions and used only the first session per participant throughout, in order to maintain comparable analytic windows across the sample\. Median spacing between consecutive completed beeps was approximately 4\.3 hours during the day, with overnight gaps of 10 to 12 hours\.

We focus on three items chosen to span affect domains:down\(negative affect, low arousal; representative of the depression\-core cluster\),worried\(negative affect, high arousal; representative of the anxiety\-core cluster\), andenergetic\(positive activation\)\. Demonstrating the method on three items rather than one establishes that results are not item\-specific\. We report all three items in full at every stage of analysis\. We did not pre\-screen items for the property under study \(between\-person heterogeneity in lag coefficients\); these three were selected on substantive grounds before estimation\.

![Refer to caption](https://arxiv.org/html/2606.14116v1/ematrajectories.png)Figure 1:EMA item trajectories for four representative participants \(P001, P040, P145, P202\) across the full study period\. Three of the four items shown \(down,worried,energetic\) are analyzed throughout this paper;angryis included for additional illustration of the qualitative range of dynamics in this dataset\. The horizontal axis displays hours from study start; the analyses reported below use sequential beep number as the lag index \(see Section 3\.2\)\. The four participants were selected to span the range of dynamic profiles in the sample, from high\-amplitude and volatile \(P001\) to low\-amplitude and tightly bounded \(P145\)\. Participant identifiers are the original IDs from the Fisher et al\. \(2017\) dataset\.Figure[1](https://arxiv.org/html/2606.14116#S3.F1)shows EMA item trajectories for four representative participants, illustrating the qualitative range of dynamics this paper aims to characterize\.

### 3\.2 Notation and preprocessing

Letyi,ty\_\{i,t\}denote the rating from participantii\(i=1,…,N=40i=1,\\ldots,N=40\) at beep occasiontt\(t=1,…,Tit=1,\\ldots,T\_\{i\}\)\. Within each participant, ratings were z\-scored using the participant\-specific mean and standard deviation\. This within\-person standardization removes individual differences in mean level and rating\-scale use, allowing the analysis to focus on within\-person fluctuation\. Standardization additionally allows direct interpretation of AR coefficients across items measured on a common scale\.

Lags are constructed using sequential beep number rather than wall\-clock time\. That is,yi,t−1y\_\{i,t\-1\}denotes participantii’s rating at the beep occasion immediately precedingtt, irrespective of whether that interval was approximately three hours \(within\-day\) or twelve hours \(overnight\)\. This approximation treats beeps as approximately equally spaced, which holds within waking hours but not across the overnight gap\. We adopt this convention because it matches the practice used byJacobson et al\. \([2019](https://arxiv.org/html/2606.14116#bib.bib11)\)and the broader EMA modeling literature\. The implications are revisited in Section 5\. Rows for which any of the requisite lagged values is missing are dropped from the analytic sample\. For the lag\-1 model this excludes the first within\-session observation per participant and any row following a within\-session gap; for the multi\-lag model fit at lags 1 to 3 the excluded set is larger becauseyi,t−1y\_\{i,t\-1\},yi,t−2y\_\{i,t\-2\}, andyi,t−3y\_\{i,t\-3\}must all be observed\.

### 3\.3 Stage 1: Hierarchical exploratory GAMM

Stage 1 uses a generalized additive mixed model \(GAMM\) to estimate the lag\-effect curve as a function of lag distanceΔ​t\\Delta t\. Standard DTVEM Stage 1 fits a single population\-level smoothf​\(Δ​t\)f\(\\Delta t\)to pooled data\. We extend it to allow person\-specific deviations from this population smooth:

yi,t=\[f​\(Δ​t\)\+bi​\(Δ​t\)\]⋅yi,t−Δ​t\+εi,t,εi,t∼𝒩​\(0,σ2\)y\_\{i,t\}=\\big\[f\(\\Delta t\)\+b\_\{i\}\(\\Delta t\)\\big\]\\cdot y\_\{i,t\-\\Delta t\}\+\\varepsilon\_\{i,t\},\\qquad\\varepsilon\_\{i,t\}\\sim\\mathcal\{N\}\(0,\\sigma^\{2\}\)\(1\)Heref​\(Δ​t\)f\(\\Delta t\)is the population\-level smooth lag\-effect curve andbi​\(Δ​t\)b\_\{i\}\(\\Delta t\)is a person\-specific deviation from it\. Both are represented via thin\-plate regression splines\. Implementation uses the factor\-smooth basis \(bs = "fs"\) inmgcv::bam\(Wood,[2017](https://arxiv.org/html/2606.14116#bib.bib15)\), which couples all participant\-specific smooths through a single shared smoothing parameter, providing automatic per\-person shrinkage toward the population smooth: heavy shrinkage where individual data is sparse, less shrinkage where individual data is informative enough to support a distinct shape\.

Stage 1 plays two roles in DTVEM\-RE\. First, it provides an exploratory view of which lags carry substantial dependence at the population level and how person\-specific lag profiles deviate from that average\. Second, it provides an independent estimate of person\-specific lag\-1 coefficients that serves as a cross\-method check on Stage 2 posterior estimates\.

We do not use Stage 1 to perform automated lag\-set selection in the present paper\. Stage 2 is fit with an analyst\-specified lag set𝒦\\mathcal\{K\}, with𝒦=\{1\}\\mathcal\{K\}=\\\{1\\\}and𝒦=\{1,2,3\}\\mathcal\{K\}=\\\{1,2,3\\\}both reported below\. Automating the handoff from Stage 1 candidate lags to Stage 2 lag set is straightforward in principle and follows the design of standard DTVEM, but the additional methodological choices it raises \(significance thresholding, multiple\-lag selection rules\) are kept out of scope here to focus this paper on the hierarchical pooling step\.

### 3\.4 Stage 2 \(lag\-1 case\): hierarchical Bayesian state\-space VAR

In the present paper we report results for the univariate special case in which a single item is modeled at a time\. The framework extends directly to multivariate VAR by replacing the scalar coefficientaia\_\{i\}with a transition matrix𝐀i\\mathbf\{A\}\_\{i\}at each lag, with hierarchical priors on the matrix entries\. We restrict attention here to the univariate case to focus the methodological development on the hierarchical pooling step; the multivariate extension is a natural follow\-up\.

For the single\-lag case, Stage 2 places a hierarchical normal distribution over person\-specific autoregressive coefficientsaia\_\{i\}and a hierarchical log\-normal distribution over person\-specific residual standard deviationsσi\\sigma\_\{i\}:

yi,t\\displaystyle y\_\{i,t\}∼𝒩​\(ai⋅yi,t−1,σi2\)\\displaystyle\\sim\\mathcal\{N\}\\big\(a\_\{i\}\\cdot y\_\{i,t\-1\},\\ \\sigma\_\{i\}^\{2\}\\big\)\(2\)ai\\displaystyle a\_\{i\}=μa\+τa⋅airaw,airaw∼𝒩​\(0,1\)\\displaystyle=\\mu\_\{a\}\+\\tau\_\{a\}\\cdot a^\{\\text\{raw\}\}\_\{i\},\\quad a^\{\\text\{raw\}\}\_\{i\}\\sim\\mathcal\{N\}\(0,1\)\(3\)σi\\displaystyle\\sigma\_\{i\}=σmean⋅exp⁡\(σsd⋅σiraw\),σiraw∼𝒩​\(0,1\)\\displaystyle=\\sigma\_\{\\text\{mean\}\}\\cdot\\exp\(\\sigma\_\{\\text\{sd\}\}\\cdot\\sigma^\{\\text\{raw\}\}\_\{i\}\),\\quad\\sigma^\{\\text\{raw\}\}\_\{i\}\\sim\\mathcal\{N\}\(0,1\)\(4\)Equations \([3](https://arxiv.org/html/2606.14116#S3.E3)\) and \([4](https://arxiv.org/html/2606.14116#S3.E4)\) implement non\-centered parameterizations of the person\-specific coefficient and the person\-specific residual SD\. Non\-centered parameterizations reparameterize the relationship between a person\-level parameter and its population\-level scale to a unit\-scale auxiliary variable, breaking the funnel\-shaped posterior pathology that hierarchical models exhibit when the scale parameter is small relative to the per\-person sample size\(Betancourt & Girolami,[2015](https://arxiv.org/html/2606.14116#bib.bib1)\)\. In an initial development version of the model, the centered parameterization forσi\\sigma\_\{i\}produced 127 divergent transitions across four chains on theenergeticitem; the non\-centered log\-normal reparameterization eliminated all divergences without altering parameter estimates beyond Monte Carlo error\.

The hyperparameters areμa\\mu\_\{a\}, the population\-level mean of the AR\(1\) coefficient;τa\\tau\_\{a\}, the between\-person standard deviation of the AR\(1\) coefficient and the parameter of primary substantive interest;σmean\\sigma\_\{\\text\{mean\}\}, the population\-level residual standard deviation; andσsd\\sigma\_\{\\text\{sd\}\}, the between\-person variation in residual SDs on the log scale\. The substantive yield of DTVEM\-RE relative to standard DTVEM is captured byτa\\tau\_\{a\}: whenτa=0\\tau\_\{a\}=0, the model reduces to a fixed\-effects AR\(1\) shared across persons\.

### 3\.5 Stage 2 \(multi\-lag case\)

The single\-lag model extends toKKlags by giving each person a vector ofKKcoefficients with its own hyperparameters:

yi,t\\displaystyle y\_\{i,t\}∼𝒩​\(∑k=1Kai,k⋅yi,t−k,σi2\)\\displaystyle\\sim\\mathcal\{N\}\\Bigg\(\\sum\_\{k=1\}^\{K\}a\_\{i,k\}\\cdot y\_\{i,t\-k\},\\ \\sigma\_\{i\}^\{2\}\\Bigg\)\(5\)ai,k\\displaystyle a\_\{i,k\}=μk\+τk⋅ai,kraw,ai,kraw∼𝒩​\(0,1\),k=1,…,K\\displaystyle=\\mu\_\{k\}\+\\tau\_\{k\}\\cdot a^\{\\text\{raw\}\}\_\{i,k\},\\quad a^\{\\text\{raw\}\}\_\{i,k\}\\sim\\mathcal\{N\}\(0,1\),\\quad k=1,\\ldots,K\(6\)withσi\\sigma\_\{i\}retaining the log\-normal hierarchical specification above\. Each lag receives its own population meanμk\\mu\_\{k\}and its own between\-person standard deviationτk\\tau\_\{k\}\. This is the design choice that distinguishes DTVEM\-RE from a multi\-lag extension constraining heterogeneity to be uniform across lags\. The separate\-τk\\tau\_\{k\}specification allows theshapeof between\-person heterogeneity across lags to differ from theshapeof population means across lags, and is what makes the empirical result in Section 4\.5 possible\.

The raw deviationsai,krawa^\{\\text\{raw\}\}\_\{i,k\}are sampled independently across lags in the present specification; we do not place a prior on cross\-lag covariance within𝐚i=\(ai,1,…,ai,K\)⊤\\mathbf\{a\}\_\{i\}=\(a\_\{i,1\},\\ldots,a\_\{i,K\}\)^\{\\top\}\. A more flexible specification would estimate aK×KK\\times Kcovariance matrix over the person\-level coefficient vector and is a natural extension\. We report results from the independent specification throughout and examine empirical cross\-lag correlations among posterior means in Section 4\.5\.

### 3\.6 Priors and rationale

The model places weakly informative priors on the hyperparameters:

μa\\displaystyle\\mu\_\{a\}∼𝒩​\(0,0\.5\),τa∼Half\-​𝒩​\(0,0\.3\)\\displaystyle\\sim\\mathcal\{N\}\(0,\\ 0\.5\),\\qquad\\tau\_\{a\}\\sim\\text\{Half\-\}\\mathcal\{N\}\(0,\\ 0\.3\)\(7\)μk\\displaystyle\\mu\_\{k\}∼𝒩​\(0,0\.5\),τk∼Half\-​𝒩​\(0,0\.3\)\(k=1,…,K\)\\displaystyle\\sim\\mathcal\{N\}\(0,\\ 0\.5\),\\qquad\\tau\_\{k\}\\sim\\text\{Half\-\}\\mathcal\{N\}\(0,\\ 0\.3\)\\quad\(k=1,\\ldots,K\)\(8\)σmean\\displaystyle\\sigma\_\{\\text\{mean\}\}∼𝒩​\(0\.9,0\.3\),σsd∼Half\-​𝒩​\(0,0\.3\)\\displaystyle\\sim\\mathcal\{N\}\(0\.9,\\ 0\.3\),\\qquad\\sigma\_\{\\text\{sd\}\}\\sim\\text\{Half\-\}\\mathcal\{N\}\(0,\\ 0\.3\)\(9\)
The𝒩​\(0,0\.5\)\\mathcal\{N\}\(0,0\.5\)prior on the population meansμa\\mu\_\{a\}andμk\\mu\_\{k\}is centered at zero \(no temporal dependence\) with a standard deviation of 0\.5 on the within\-person z\-scored response scale\. This places approximately 95% prior mass between−1\-1and\+1\+1, comfortably covering the stationary range\(−1,1\)\(\-1,1\)for an AR\(1\) coefficient on a standardized series while pushing posterior mass away from the unit\-root boundary\. The Half\-𝒩​\(0,0\.3\)\\mathcal\{N\}\(0,0\.3\)prior on the between\-person SDsτa\\tau\_\{a\}andτk\\tau\_\{k\}places approximately 95% prior mass below0\.60\.6, an upper bound chosen to be substantially larger than the magnitudes of between\-person variation reported in the broader EMA affect\-dynamics literature \(typically well below0\.30\.3on standardized AR coefficients\) while concentrating prior density near zero to express skepticism toward implausibly large heterogeneity\. We note that this prior was chosen before the empirical fits in Section 4\.3 and reflects expectations from prior work rather than the observed values in the present sample\. The𝒩​\(0\.9,0\.3\)\\mathcal\{N\}\(0\.9,0\.3\)prior onσmean\\sigma\_\{\\text\{mean\}\}reflects the expectation that residual SDs on a unit\-variance standardized series will be near 1 if the AR\(1\) coefficient is small and somewhat less if it is large; the prior accommodates either case\.

A note on identifiability is warranted\. WithN=40N=40participants, estimating a hierarchical between\-person SDτa\\tau\_\{a\}requires care: at smallNN, half\-normal priors on hierarchical scale parameters can produce posteriors that are partially prior\-driven, especially when the empirical variance is small relative to the prior scale\. The simulation study reported in Section 4\.1 addresses this directly by examiningτa\\tau\_\{a\}recovery under known generating values across the empirically relevant range\. The simulation results show thatτa\\tau\_\{a\}is recovered without meaningful bias and with approximately nominal credible interval coverage atN=40N=40,T=130T=130\.

### 3\.7 Estimation

Models are estimated via Hamiltonian Monte Carlo in Stan through thecmdstanrinterface\. Default settings use 4 chains of 1000 warmup plus 1000 sampling iterations, withadapt\_delta = 0\.95\. Convergence is assessed viaR^≤1\.01\\hat\{R\}\\leq 1\.01, divergent transition count, and energy Bayesian fraction of missing information \(E\-BFMI\)\. Person\-specific posterior summaries are extracted as posterior means with 90 percent credible intervals throughout\. Lag\-1 model fits complete in approximately 15 seconds per item on a standard desktop; multi\-lag fits atK=3K=3complete in approximately 2 minutes per item\.

### 3\.8 Simulation study design

We assess parameter recovery for DTVEM\-RE under controlled conditions matched to the structure of the empirical data\. Synthetic datasets were generated withN=40N=40persons,T=130T=130observations each, true population meanμa=0\.40\\mu\_\{a\}=0\.40, and four between\-person SD levelsτa∈\{0,0\.10,0\.20,0\.30\}\\tau\_\{a\}\\in\\\{0,0\.10,0\.20,0\.30\\\}\. The range spans from the null case \(no heterogeneity\) through the moderate heterogeneity level observed empirically in Fisher’s data \(τa≈0\.16\\tau\_\{a\}\\approx 0\.16\) up to a high\-heterogeneity stress test\.

For each condition, 30 independent replicates were generated\. Within each replicate, person\-specific true coefficients were sampled from𝒩​\(μa,τa2\)\\mathcal\{N\}\(\\mu\_\{a\},\\tau\_\{a\}^\{2\}\), constrained to the stationary range\[−0\.95,0\.95\]\[\-0\.95,0\.95\]\. Person\-specific residual standard deviations were sampled from a log\-normal distribution with mean 0\.9 and small variation, matching the structure assumed by the Stan model\. AR\(1\) data was then generated for 130 time points per person, and the simulated series were demeaned within person \(without rescaling\) before fitting\. The same Stan model used on Fisher’s data \(non\-centered parameterizations on bothaia\_\{i\}andσi\\sigma\_\{i\}\) was fit to each synthetic dataset\.

Coverage is computed as the proportion of 30 replicates per condition in which the true generating value falls within the 90% posterior credible interval\. At the null conditionτa=0\\tau\_\{a\}=0, the half\-normal prior excludes negative values by construction; coverage is therefore undefined forτa\\tau\_\{a\}in this row and we report it as N/A\.

### 3\.9 Held\-out prediction protocol

To assess whether DTVEM\-RE’s parameter estimates translate into improved predictive performance, we conducted a held\-out prediction comparison on thedownitem\. For each participant, the last 12 beeps \(approximately three days of data\) were held out as a test set; the remaining beeps formed the training set\. Four methods were fit to the training data and used to produce one\-step\-ahead predictions for the held\-out beeps:

1. 1\.Group\-only DTVEM: pooled OLS estimating a single autoregressive coefficient shared across persons\.
2. 2\.Naive per\-person DTVEM: separate OLS estimation per participant, no pooling\.
3. 3\.Hierarchical simple\(mlVAR\-style baseline\): hierarchical autoregressive model with random lag\-1 coefficient and a single shared residual SD\.
4. 4\.DTVEM\-RE \(full\): the lag\-1 specification given in Section 3\.4, with hierarchical lag coefficient and hierarchical person\-specific residual SD\.

For each method, predictive performance is summarized by mean and total held\-out log\-likelihood across the 480 held\-out observations and root\-mean\-square error\.

## 4 Results

We report results in four parts\. Section 4\.1 reports the simulation study validating parameter recovery\. Section 4\.2 reports the Stage 1 GAMM exploration\. Sections 4\.3 and 4\.4 report the lag\-1 Stage 2 fits and the held\-out prediction comparison against alternative methods\. Section 4\.5 reports the multi\-lag extension\.

### 4\.1 Simulation study: parameter recovery

Table[1](https://arxiv.org/html/2606.14116#S4.T1)summarizes parameter recovery across the 120 simulation fits \(4 conditions×\\times30 replicates each\)\.

Table 1:Simulation recovery results across 4 conditions×\\times30 replicates\. The between\-person SD \(τa\\tau\_\{a\}\) is recovered with negligible bias and approximately nominal 90 percent coverage\. The population mean \(μa\\mu\_\{a\}\) shows a small consistent downward bias of approximately 0\.02\.Three findings emerge, visualized in Figure[2](https://arxiv.org/html/2606.14116#S4.F2)\. First, recovery ofτa\\tau\_\{a\}, the parameter of primary methodological interest, is essentially unbiased\. Absolute bias is at most 0\.009 across all conditions with non\-zero true heterogeneity\. The half\-normal prior produces a small positive estimate of 0\.030 at the null condition by construction, since posterior mass concentrates near but cannot extend below zero\. Second,τa\\tau\_\{a\}credible interval coverage is 90 to 93 percent across non\-null conditions, closely matching the nominal 90 percent target\. This is the result that licenses interpreting the empiricalτa\\tau\_\{a\}estimates in Section 4\.3 as reliable indicators of between\-person heterogeneity rather than artifacts of the half\-normal prior\.

Third,μa\\mu\_\{a\}shows a small consistent downward bias of approximately0\.020\.02, with empirical coverage 73 to 87 percent\. The bias matches the leading\-order finite\-sample prediction−\(1\+3​a\)/T≈−0\.017\-\(1\+3a\)/T\\approx\-0\.017ata=0\.4,T=130a=0\.4,T=130\(Marriott & Pope,[1954](https://arxiv.org/html/2606.14116#bib.bib13); Kendall,[1954](https://arxiv.org/html/2606.14116#bib.bib12)\)closely at low\-to\-moderate heterogeneity conditions, with a slightly larger empirical bias \(−0\.036\-0\.036\) at the high\-heterogeneity stress conditionτa=0\.30\\tau\_\{a\}=0\.30\. The bias affects all persons uniformly within a sample and would diminish asTTgrows\. Since DTVEM\-RE’s methodological claims center on between\-person variation rather than the absolute level of the population mean, this bias does not threaten the substantive interpretation\.

![Refer to caption](https://arxiv.org/html/2606.14116v1/tau_recovery.png)\(a\)Recovery of between\-person SD \(τa\\tau\_\{a\}\)\. Each point is one of 30 simulation replicates; red dashes mark the true generating values\.
![Refer to caption](https://arxiv.org/html/2606.14116v1/mu_recovery.png)\(b\)Recovery of population mean \(μa\\mu\_\{a\}\)\. Trueμa=0\.40\\mu\_\{a\}=0\.40\(red dashes\) across all conditions\.
![Refer to caption](https://arxiv.org/html/2606.14116v1/coverage.png)\(c\)Empirical 90% credible interval coverage by parameter and condition; red dashed line marks nominal 90%\.

Figure 2:Parameter recovery from 120 simulation fits \(4 conditions×\\times30 replicates\), withN=40N=40persons andT=130T=130observations per person\. Panel \(a\):τa\\tau\_\{a\}posterior means recover the true generating values with negligible bias across non\-null conditions\. Panel \(b\):μa\\mu\_\{a\}posterior means show a small consistent downward bias of approximately0\.020\.02relative to the trueμa=0\.40\\mu\_\{a\}=0\.40\. Panel \(c\): credible interval coverage is approximately at nominal 90% forτa\\tau\_\{a\}\(orange\) across non\-null conditions, whileμa\\mu\_\{a\}coverage \(blue\) falls short of nominal due to the downward bias in panel \(b\)\. Coverage is undefined forτa\\tau\_\{a\}at the null conditionτa=0\\tau\_\{a\}=0because the half\-normal prior excludes negative values\.MCMC diagnostics were uniformly clean across all 120 fits, with zero divergent transitions, no treedepth saturation, and E\-BFMI greater than 0\.3 for all chains\.

### 4\.2 Per\-person GAMM exploration

We first apply the Stage 1 hierarchical GAMM to all three items\. The factor\-smooth interaction model is fit usingbamwith basis dimensionk=8k=8per smooth term\. For each participant, the model produces an estimated person\-specific lag\-effect curvef^​\(Δ​t\)\+b^i​\(Δ​t\)\\hat\{f\}\(\\Delta t\)\+\\hat\{b\}\_\{i\}\(\\Delta t\)over lag distances 1 through 14\.

Table[2](https://arxiv.org/html/2606.14116#S4.T2)summarizes the range of person\-specific lag\-1 effects estimated by Stage 1\.

Table 2:Person\-specific lag\-1 autoregressive effects from per\-person factor\-smooth GAMMs \(N=40N=40participants\)\. For all three items, person\-specific effects span an order of magnitude or more, demonstrating that between\-person heterogeneity is substantial and not item\-specific\.For all three items, person\-specific lag\-1 effects span an order of magnitude or more, ranging from near zero to above 0\.8\. The between\-person standard deviation is approximately 0\.14 to 0\.17 across items\. These results establish that group\-level estimation of a single lag\-1 effect would mask substantial between\-person variation in the actual dynamics, motivating the hierarchical Stage 2 model\.

Figure[3](https://arxiv.org/html/2606.14116#S4.F3)displays the resulting person\-specific lag\-effect curves for thedownitem in two complementary views: an unsorted spaghetti plot \(panel a\) and a sorted heatmap \(panel b\)\. The spaghetti view emphasizes the spread of person\-specific curves around the population average; the heatmap view, with participants ordered top\-to\-bottom by lag\-1 effect magnitude, reveals the gradient structure of person\-specific decay profiles across the sample\.

![Refer to caption](https://arxiv.org/html/2606.14116v1/person_specific_lag_effects.png)\(a\)Person\-specific lag\-effect curves \(spaghetti plot\)\. Each blue line is one participant’s estimated profile; the red line is the population average\.
![Refer to caption](https://arxiv.org/html/2606.14116v1/person_specific_heatmaps.png)\(b\)The same per\-person lag\-effect estimates as a heatmap, with participants sorted top\-to\-bottom by lag\-1 effect magnitude\.

Figure 3:Person\-specific lag\-effect curves for thedownitem, estimated by the Stage 1 hierarchical GAMM with factor\-smooth interaction across the 40 participants\. Panel \(a\) shows the heterogeneity unsorted; panel \(b\) sorts participants by lag\-1 magnitude to reveal the gradient structure of person\-specific decay profiles\.
### 4\.3 Stage 2 hierarchical Bayesian estimation

We next fit the lag\-1 DTVEM\-RE Stage 2 model to each of the three items\. Table[3](https://arxiv.org/html/2606.14116#S4.T3)summarizes the posterior estimates\.

Table 3:DTVEM\-RE Stage 2 posterior summaries across three EMA items\. All credible intervals forτa\\tau\_\{a\}exclude zero, confirming that person\-specific variation in lag\-1 dynamics is statistically robust\. Person\-specific posterior means agree strongly with independent per\-person GAMM estimates \(r=0\.87r=0\.87to0\.920\.92\)\.Three findings deserve emphasis\. First, all 90 percent credible intervals forτa\\tau\_\{a\}are bounded above zero across all three items, with point estimates of 0\.12 to 0\.16\. This provides direct statistical evidence that between\-person variation in lag\-1 dynamics is robust and not noise\. Second, the Stan posterior mean estimates of person\-specific lag\-1 coefficients agree strongly with the independent per\-person GAMM estimates from Stage 1 \(r=0\.87r=0\.87to0\.920\.92\)\. This cross\-method validation rules out the possibility that the heterogeneity finding is an artifact of a specific statistical approach\.

Third, the relative ordering of items by mean persistence is preserved:downandworried\(both negative affect\) show higher mean autoregressive coefficients thanenergetic\(positive affect\)\. We return to this pattern in the Discussion\. Figure[4](https://arxiv.org/html/2606.14116#S4.F4)displays the person\-specific lag\-1 posterior distributions for thedownitem, sorted by posterior mean\. The figure also documents that 39 of 40 participants have 90 percent credible intervals on their lag\-1 coefficient excluding zero, indicating that the autoregressive dependence is reliably non\-zero for the substantial majority of individuals in the sample\.

![Refer to caption](https://arxiv.org/html/2606.14116v1/03_person_caterpillar.png)Figure 4:Person\-specific lag\-1 autoregressive posteriors from the Stage 2 hierarchical Bayesian fit on thedownitem\. Each point is the posterior mean for one of the 40 participants; vertical bars show 90% credible intervals\. Participants are sorted left\-to\-right by posterior mean\. The horizontal red line marks the population meanμ^a=0\.394\\hat\{\\mu\}\_\{a\}=0\.394; the dashed line at zero marks the no\-dependence reference\. Posterior means range from approximately0\.150\.15to0\.830\.83, and 39 of 40 credible intervals exclude zero, indicating substantive person\-specific autoregressive dependence across the sample\.MCMC diagnostics for all three lag\-1 fits were clean\. The model with non\-centered parameterizations on bothaia\_\{i\}andσi\\sigma\_\{i\}produced zero divergent transitions, zero treedepth saturations, and minimum E\-BFMI greater than 0\.70 across all 12 chains \(4 chains×\\times3 items\)\. Total computation time was approximately 15 seconds per item\.

### 4\.4 Held\-out prediction comparison

Following the protocol described in Section 3\.9, we held out the last 12 beeps per participant on thedownitem and compared one\-step\-ahead predictions from four methods: group\-only DTVEM, naive per\-person DTVEM, a hierarchical simple \(mlVAR\-style\) baseline, and DTVEM\-RE in its full form\. Performance is summarized in Table[4](https://arxiv.org/html/2606.14116#S4.T4)\.

Table 4:Held\-out predictive log\-likelihood for thedownitem across four methods \(480 held\-out beeps from 40 participants\)\. DTVEM\-RE achieves the highest mean and total log\-likelihood and the lowest root\-mean\-square error\.DTVEM\-RE attains the highest log\-likelihood and lowest RMSE across the four methods\. In per\-participant decomposition, DTVEM\-RE outperforms naive per\-person OLS in 25 of 40 participants \(62\.5 percent\)\. However, the absolute magnitude of the advantage is modest, with differences across methods of approximately 0\.01 log\-likelihood units per beep\. This modest advantage is consistent with the underlying structure: atT≈110T\\approx 110training beeps per person, naive per\-person OLS is sufficiently stable that hierarchical pooling delivers only incremental gains\. The substantive value of DTVEM\-RE in this regime lies in calibrated uncertainty quantification \(proper credible intervals on person\-specificaia\_\{i\}\) and in the multi\-lag extension reported next, not in dramatic predictive supremacy\.

### 4\.5 Multi\-lag extension: the differentiating result

We next refit the DTVEM\-RE model in its multi\-lag form, simultaneously estimating person\-specific coefficients at lags 1, 2, and 3 with separate hyperparameters\(μk,τk\)\(\\mu\_\{k\},\\tau\_\{k\}\)at each lag\. The model specification is given in Section 3\.5; computation took approximately 2 minutes per item\.

Table 5:Multi\-lag DTVEM\-RE posterior means at lags 1, 2, and 3 for three EMA items\. Starred \(⋆\) values forτ^k\\hat\{\\tau\}\_\{k\}have 90 percent credible intervals excluding zero\. All nineτ^k\\hat\{\\tau\}\_\{k\}estimates exclude zero, establishing that person\-specific variation in autoregressive dynamics is statistically robust at every lag examined\.The key result is in the right half of Table[5](https://arxiv.org/html/2606.14116#S4.T5)and visualized in Figure[5](https://arxiv.org/html/2606.14116#S4.F5)\. All nineτ^k\\hat\{\\tau\}\_\{k\}values, across three items and three lags, have 90 percent credible intervals that exclude zero\. Person\-specific variation in lag coefficients is statistically robust at every lag tested, not only at lag\-1\.

Furthermore, the lag at which heterogeneity is largest differs across items\. Fordown,τ^1=0\.111\\hat\{\\tau\}\_\{1\}=0\.111is the largest, with heterogeneity monotonically declining at higher lags\. Forworried,τ^3=0\.108\\hat\{\\tau\}\_\{3\}=0\.108is the largest, with heterogeneity dipping at lag 2 and rising at lag 3\. Forenergetic,τ^2=0\.118\\hat\{\\tau\}\_\{2\}=0\.118exceedsτ^1=0\.078\\hat\{\\tau\}\_\{1\}=0\.078, indicating that individual differences in mood persistence are most pronounced at the two\-beep horizon for positive affect\.

This non\-monotonic structure is outside the modeling scope of hierarchical VAR methods that place random effects only on lag\-1 coefficients, such as mlVAR\. A method that estimates a singleτa\\tau\_\{a\}at lag\-1 would miss the larger heterogeneity at lag\-2 forenergeticor at lag\-3 forworried\. The shape of dynamic heterogeneity, not just its magnitude, varies across affect domains\.

Population\-level means follow monotonic decay across lags \(left half of Table[5](https://arxiv.org/html/2606.14116#S4.T5)\), consistent with the GAMM Stage 1 results\. The aggregate dynamics are well\-described by exponential decay at the group level; the novelty of DTVEM\-RE is that it reveals heterogeneity in how individuals depart from that aggregate decay\. Figure[6](https://arxiv.org/html/2606.14116#S4.F6)displays the person\-specific multi\-lag profiles for thedownitem, with each blue line tracing one participant’s three posterior means and the red line showing the population trajectory\.

A natural follow\-up question is whether person\-specific lag coefficients are correlated across lags\. We computed the Pearson correlations across the 40 participants between the posterior means at each lag pair\. Fordown:r12=0\.11r\_\{12\}=0\.11,r13=0\.06r\_\{13\}=0\.06,r23=0\.31r\_\{23\}=0\.31\. Forworried:r12=0\.13r\_\{12\}=0\.13,r13=0\.14r\_\{13\}=0\.14,r23=0\.23r\_\{23\}=0\.23\. Forenergetic:r12=0\.45r\_\{12\}=0\.45,r13=0\.10r\_\{13\}=0\.10,r23=0\.28r\_\{23\}=0\.28\. For the two negative\-affect items, person\-specific lag profiles are largely independent across lag distance: knowing a person’s lag\-1 dependence ondownorworriedcarries little information about their lag\-2 or lag\-3 dependence\. The positive\-affect item shows a different pattern:energeticexhibits a substantial lag\-1/lag\-2 correlation \(r=0\.45r=0\.45\), indicating that individuals with strong short\-horizon dependence on positive activation also tend to be persistent at the next\-shortest horizon\. The contrast between the affect domains is itself a substantive observation enabled by the multi\-lag specification and worth following up in larger samples\.

![Refer to caption](https://arxiv.org/html/2606.14116v1/multilag_mu_by_lag.png)\(a\)Population\-level mean coefficientsμ^k\\hat\{\\mu\}\_\{k\}at lags 1, 2, and 3 for each item\. All three items show monotonic decay of mean autoregressive effect across lags\.
![Refer to caption](https://arxiv.org/html/2606.14116v1/multilag_tau_by_lag.png)\(b\)Between\-person SDτ^k\\hat\{\\tau\}\_\{k\}at lags 1, 2, and 3 for each item\. All nine credible intervals exclude zero\. The lag at which heterogeneity is largest differs across items: lag 1 fordown, lag 2 forenergetic, and lag 3 forworried\.

Figure 5:Multi\-lag DTVEM\-RE posterior summaries for the three EMA items\. Points show posterior means; bars show 90% credible intervals\. The contrast between the two panels is the central methodological result of this paper: population\-level means follow monotonic decay across lags \(a\), but between\-person heterogeneity does not\. All nineτ^k\\hat\{\\tau\}\_\{k\}credible intervals exclude zero, and the lag at which heterogeneity peaks differs by item—a structure that hierarchical methods placing random effects only on lag\-1 coefficients are not designed to detect\.![Refer to caption](https://arxiv.org/html/2606.14116v1/multilag_person_profiles_down.png)Figure 6:Person\-specific multi\-lag profiles for thedownitem from the multi\-lag DTVEM\-RE fit\. Each blue line connects one participant’s posterior mean lag coefficients at lags 1, 2, and 3; the red line shows the population means\(μ^1,μ^2,μ^3\)\(\\hat\{\\mu\}\_\{1\},\\hat\{\\mu\}\_\{2\},\\hat\{\\mu\}\_\{3\}\)\. The spread of person\-specific profiles around the population mean at each lag illustrates the between\-person heterogeneity quantified by theτ^k\\hat\{\\tau\}\_\{k\}estimates in Figure[5\(b\)](https://arxiv.org/html/2606.14116#S4.F5.sf2)\. While most individuals show monotonic decay across lags, the rate and magnitude of decay vary substantially across the sample\.

## 5 Discussion

### 5\.1 Summary of contributions

This paper introduced DTVEM\-RE, a hierarchical random\-effects extension of the Differential Time\-Varying Effect Model that estimates person\-specific multi\-lag coefficients with shrinkage toward group\-level means\. We provided three sets of results\. Simulation studies demonstrated clean recovery of the between\-person variance parameterτa\\tau\_\{a\}with absolute bias below 0\.01 and near\-nominal credible interval coverage at the sample sizes typical of EMA studies\. Empirical demonstration on the Fisher \(2017\) dataset showed that person\-specific lag\-1 effects vary by an order of magnitude across affect items, that hierarchical Bayesian and independent GAMM estimates agree closely, and that DTVEM\-RE attains the best one\-step\-ahead predictive performance among four hierarchical and non\-hierarchical baselines, though by modest margins at the sample sizes available\. Most importantly, the multi\-lag extension showed that all nine between\-person variance estimates across three items and three lags have credible intervals excluding zero, with the lag at which heterogeneity is largest differing across items in a manner that hierarchical methods placing random effects only on lag\-1 are not designed to detect\.

### 5\.2 Methodological implications

The principal methodological implication is that the standard practice of estimating single group\-level lag structures in intensive longitudinal data discards substantial between\-person variation\. For mood and anxiety dynamics in particular, the heterogeneity is large in magnitude \(individual lag\-1 effects ranging from near\-zero to above 0\.85\), reliable in statistical terms \(credible intervals excluding zero\), and structurally complex \(with non\-monotonic distribution across lags\)\.

DTVEM\-RE provides a principled framework for engaging with this heterogeneity\. The hierarchical priors enable stable estimation of person\-specific coefficients even at the sample sizes typical of EMA studies \(T≈100T\\approx 100to 150 per person\)\. The cross\-method agreement between Stan and independent GAMM estimates \(r=0\.87r=0\.87to0\.920\.92\) provides reassurance that the recovered heterogeneity reflects genuine signal rather than methodological artifact\.

### 5\.3 Clinical implications

The present results are exploratory analyses of a single dataset withN=40N=40participants and three affect items, and they do not license direct clinical recommendations\. We outline below what the heterogeneity reported herecouldmean clinically if it were to replicate in larger and more diverse samples, with the conditional emphasis throughout\.

The substantive finding most relevant to clinical questions is that the lag at which an individual’s affect shows its strongest self\-prediction varies meaningfully across people\. If this pattern replicates, it implies that the natural time scale of affect dynamics is itself a person\-level feature rather than a universal property of the construct being measured\. Two questions that downstream research could pursue in light of this\. First, whether person\-specific lag profiles are stable enough across measurement occasions to function as individual\-difference variables: a person’s lag profile is informative for any intervention or assessment decision only if it is stable enough to act on\. Second, whether person\-specific lag profiles relate to clinically meaningful variables — symptom severity, treatment response, diagnostic boundary — in ways that aggregate lag estimates obscure\.

We emphasize that DTVEM\-RE in its present form is a methodological tool for describing person\-specific lag structure; the question of whether that description has clinical leverage is a separate empirical question and one this paper does not attempt to answer\. Validation in independent samples, assessment of test\-retest reliability, and prospective tests of intervention timing all lie beyond the present scope\. We frame the multi\-lag heterogeneity finding as a candidate phenomenon worth following up rather than as evidence for any specific clinical practice change\.

### 5\.4 Substantive observations

Two substantive patterns emerge from the empirical application that may warrant follow\-up in future work, though we note that the current sample size \(N=40N=40\) is small for generalization\. First, the two negative affect items in this analysis \(downandworried\) show numerically higher mean persistence than the positive affect item \(energetic\)\. Whether this reflects a general asymmetry between negative and positive affect persistence in clinical samples cannot be established from three items in one sample, but the pattern is consistent enough with prior affect dynamics literature to suggest it as a hypothesis worth testing in larger datasets\. Second, the lag at which heterogeneity is largest differs across affect domains: depression\-related dynamics show maximum heterogeneity at the shortest lag, positive affect at an intermediate lag, and anxiety\-related dynamics at longer lags\. The substantive interpretability of these patterns is limited by the small number of items examined, but the observation that they differ at all is methodologically relevant: any analysis that assumes heterogeneity is concentrated at a single lag will produce systematically distorted descriptions of those items for which that assumption is false\.

### 5\.5 Limitations and future work

Several limitations bear acknowledgment\. First, the discrete\-time assumption inherent in DTVEM\-RE treats beeps as approximately equally spaced\. Fisher’s data have substantial within\-day spacing variability \(median 4\.3 hours, with overnight gaps of 10 to 12 hours\), which our analysis approximates by sequential beep numbering\. Continuous\-time methods such as ctsem\(Driver & Voelkle,[2018](https://arxiv.org/html/2606.14116#bib.bib3)\)handle irregular spacing natively at the cost of imposing exponential decay\. A direct comparison between DTVEM\-RE and hierarchical ctsem on the same data would be informative and is a natural next step\.

Second, the simulation study revealed a small finite\-sample downward bias in the population meanμa\\mu\_\{a\}, traceable to the well\-characterized small\-sample bias in AR\(1\) estimation\(Marriott & Pope,[1954](https://arxiv.org/html/2606.14116#bib.bib13); Kendall,[1954](https://arxiv.org/html/2606.14116#bib.bib12)\); the magnitude tracks the leading\-order prediction at low\-to\-moderate heterogeneity and exceeds it at the highest heterogeneity condition examined \(Section 4\.1\)\. This bias scales asO​\(1/T\)O\(1/T\)and would diminish in studies with longer per\-person observation periods\. For the heterogeneity\-focused interpretation of DTVEM\-RE results, the bias is uniform across persons within a sample and does not affect relative comparisons; researchers interested in absolute level interpretations should apply standard bias corrections\.

Third, the held\-out prediction advantage of DTVEM\-RE over alternatives was modest atT≈110T\\approx 110\. Exploratory analysis atT=30T=30showed that very small per\-person samples can favor pooled estimation; we note this as a regime where prior choice matters and leave detailed investigation for future work\. The regime where DTVEM\-RE provides the largest predictive advantage relative to alternatives appears to be moderate per\-person sample sizes \(approximatelyT=50T=50to 100\), corresponding to typical week\-long to month\-long EMA designs\.

Fourth, this paper has not yet incorporated diagnostic group information from the Fisher sample\. The dataset’s three subgroups \(GAD\-only, MDD\-only, and comorbid\) provide a natural test of whether person\-specific lag profiles distinguish clinical presentations\. We plan to add this analysis in subsequent work, conditional on hand\-coding diagnostic labels from the original publication’s Table 1\.

### 5\.6 Conclusion

DTVEM has become a standard tool for lag detection in intensive longitudinal data, but its group\-level estimation assumption is at odds with the idiographic premise of much of the psychopathology literature that uses it\. DTVEM\-RE addresses this gap by extending both stages of DTVEM with hierarchical random\-effects estimation, while retaining standard DTVEM as a fixed\-effects special case\. Empirical application demonstrates that person\-specific variation in lag\-1 effects is substantial in benchmark clinical data, and that multi\-lag person\-specific heterogeneity is statistically robust at every lag tested\. This combination of nonparametric multi\-lag exploration with hierarchical pooling on person\-specific lag coefficients is, to our knowledge, methodologically new, and directly implements the random\-effects extension named as future work by the original DTVEM authors\.

## Code and data availability

## References

- Betancourt & Girolami \(2015\)Betancourt, M\., & Girolami, M\. \(2015\)\.Hamiltonian Monte Carlo for hierarchical models\.In S\. K\. Upadhyay, U\. Singh, D\. K\. Dey, & A\. Loganathan \(Eds\.\),Current Trends in Bayesian Methodology with Applications\(pp\. 79–101\)\. Boca Raton, FL: CRC Press\.
- Bringmann et al\. \(2013\)Bringmann, L\. F\., Vissers, N\., Wichers, M\., Geschwind, N\., Kuppens, P\., Peeters, F\., Borsboom, D\., & Tuerlinckx, F\. \(2013\)\.A network approach to psychopathology: New insights into clinical longitudinal data\.PLoS ONE, 8\(4\), e60188\.[doi:10\.1371/journal\.pone\.0060188](https://doi.org/10.1371/journal.pone.0060188)
- Driver & Voelkle \(2018\)Driver, C\. C\., & Voelkle, M\. C\. \(2018\)\.Hierarchical Bayesian continuous time dynamic modeling\.Psychological Methods, 23\(4\), 774–799\.[doi:10\.1037/met0000168](https://doi.org/10.1037/met0000168)
- Epskamp et al\. \(2018\)Epskamp, S\., van Borkulo, C\. D\., van der Veen, D\. C\., Servaas, M\. N\., Isvoranu, A\.\-M\., Riese, H\., & Cramer, A\. O\. J\. \(2018\)\.Personalized network modeling in psychopathology: The importance of contemporaneous and temporal connections\.Clinical Psychological Science, 6\(3\), 416–427\.[doi:10\.1177/2167702617744325](https://doi.org/10.1177/2167702617744325)
- Fernandez et al\. \(2017\)Fernandez, K\. C\., Fisher, A\. J\., & Chi, C\. \(2017\)\.Development and initial implementation of the Dynamic Assessment Treatment Algorithm \(DATA\)\.PLoS ONE, 12\(6\), e0178806\.[doi:10\.1371/journal\.pone\.0178806](https://doi.org/10.1371/journal.pone.0178806)
- Fisher et al\. \(2017\)Fisher, A\. J\., Reeves, J\. W\., Lawyer, G\., Medaglia, J\. D\., & Rubel, J\. A\. \(2017\)\.Exploring the idiographic dynamics of mood and anxiety via network analysis\.Journal of Abnormal Psychology, 126\(8\), 1044–1056\.[doi:10\.1037/abn0000311](https://doi.org/10.1037/abn0000311)
- Fisher et al\. \(2018\)Fisher, A\. J\., Medaglia, J\. D\., & Jeronimus, B\. F\. \(2018\)\.Lack of group\-to\-individual generalizability is a threat to human subjects research\.Proceedings of the National Academy of Sciences, 115\(27\), E6106–E6115\.[doi:10\.1073/pnas\.1711978115](https://doi.org/10.1073/pnas.1711978115)
- Fisher et al\. \(2019\)Fisher, A\. J\., Bosley, H\. G\., Fernandez, K\. C\., Reeves, J\. W\., Soyster, P\. D\., Diamond, A\. E\., & Barkin, J\. \(2019\)\.Open trial of a personalized modular treatment for mood and anxiety\.Behaviour Research and Therapy, 116, 69–79\.[doi:10\.1016/j\.brat\.2019\.01\.010](https://doi.org/10.1016/j.brat.2019.01.010)
- Fisher et al\. \(2022\)Fisher, Z\. F\., Kim, Y\., Fredrickson, B\. L\., & Pipiras, V\. \(2022\)\.Penalized estimation and forecasting of multiple subject intensive longitudinal data\.Psychometrika, 87\(2\), 1–29\.[doi:10\.1007/s11336\-021\-09825\-7](https://doi.org/10.1007/s11336-021-09825-7)
- Gates et al\. \(2020\)Gates, K\. M\., Fisher, Z\. F\., & Bollen, K\. A\. \(2020\)\.Latent variable GIMME using model implied instrumental variables \(MIIVs\)\.Psychological Methods, 25\(2\), 227–242\.[doi:10\.1037/met0000229](https://doi.org/10.1037/met0000229)
- Jacobson et al\. \(2019\)Jacobson, N\. C\., Chow, S\.\-M\., & Newman, M\. G\. \(2019\)\.The Differential Time\-Varying Effect Model \(DTVEM\): A tool for diagnosing and modeling time lags in intensive longitudinal data\.Behavior Research Methods, 51\(1\), 295–315\.[doi:10\.3758/s13428\-018\-1101\-0](https://doi.org/10.3758/s13428-018-1101-0)
- Kendall \(1954\)Kendall, M\. G\. \(1954\)\.Note on bias in the estimation of autocorrelation\.Biometrika, 41\(3–4\), 403–404\.[doi:10\.1093/biomet/41\.3\-4\.403](https://doi.org/10.1093/biomet/41.3-4.403)
- Marriott & Pope \(1954\)Marriott, F\. H\. C\., & Pope, J\. A\. \(1954\)\.Bias in the estimation of autocorrelations\.Biometrika, 41\(3–4\), 390–402\.[doi:10\.1093/biomet/41\.3\-4\.390](https://doi.org/10.1093/biomet/41.3-4.390)
- Molenaar \(2004\)Molenaar, P\. C\. M\. \(2004\)\.A manifesto on psychology as idiographic science: Bringing the person back into scientific psychology, this time forever\.Measurement: Interdisciplinary Research and Perspectives, 2\(4\), 201–218\.[doi:10\.1207/s15366359mea0204\_1](https://doi.org/10.1207/s15366359mea0204_1)
- Wood \(2017\)Wood, S\. N\. \(2017\)\.Generalized Additive Models: An Introduction with R\(2nd ed\.\)\.Boca Raton, FL: Chapman & Hall/CRC\.

Similar Articles

Treatment Effect Estimation with Differentiated Networked Effect on Graph Data

arXiv cs.LG

This paper addresses the challenge of estimating individual treatment effects from graph data by modeling differentiated networked effects, proposing a mechanism with partial attention and a message amplifier to capture varying neighbor importance and scale. Experiments show improved performance over existing methods.

Behavior-Aware Auxiliary Corrections for Off-Policy Temporal-Difference Prediction

arXiv cs.AI

This paper proposes behavior-aware auxiliary corrections for off-policy temporal-difference prediction, introducing BA-TDC and BA-TDRC algorithms that replace the auxiliary covariance matrix with the behavior Bellman matrix to improve stability and convergence. Theoretical analysis and experiments on standard benchmarks validate the effectiveness of the proposed methods.