Non-Myopic Active Feature Acquisition via Pathwise Policy Gradients

arXiv cs.LG 05/08/26, 04:00 AM Papers

Summary

This paper introduces NM-PPG, a non-myopic active feature acquisition method using pathwise policy gradients to optimize sequential feature selection in costly prediction scenarios.

arXiv:2605.05511v1 Announce Type: new Abstract: Active feature acquisition (AFA) considers prediction problems in which features are costly to obtain and the learner adaptively decides which feature values to acquire for each instance and when to stop and predict. AFA can be formulated as a partially observable Markov decision process (POMDP), which naturally admits a sequential decision-making perspective. In this paper, we present non-myopic pathwise policy gradients (NM-PPG), a new AFA method built around this formulation. We introduce a continuous relaxation of the acquisition process that enables pathwise gradients through the full acquisition trajectory, avoiding the high variance of standard score-function policy gradients while allowing end-to-end optimization of a non-myopic acquisition policy. To better align training with deployment, we further develop a straight-through rollout scheme that follows hard feature acquisitions in the forward pass while backpropagating through the corresponding soft relaxation in the backward pass. We stabilize optimization with entropy regularization and staged temperature sharpening. Experiments on both synthetic and real-world datasets demonstrate that NM-PPG yields superior performance relative to state-of-the-art AFA baselines.

Original Article

View Cached Full Text

Cached at: 05/08/26, 07:42 AM

# Non-Myopic Active Feature Acquisition via Pathwise Policy Gradients
Source: [https://arxiv.org/abs/2605.05511](https://arxiv.org/abs/2605.05511)
[View PDF](https://arxiv.org/pdf/2605.05511)

> Abstract:Active feature acquisition \(AFA\) considers prediction problems in which features are costly to obtain and the learner adaptively decides which feature values to acquire for each instance and when to stop and predict\. AFA can be formulated as a partially observable Markov decision process \(POMDP\), which naturally admits a sequential decision\-making perspective\. In this paper, we present non\-myopic pathwise policy gradients \(NM\-PPG\), a new AFA method built around this formulation\. We introduce a continuous relaxation of the acquisition process that enables pathwise gradients through the full acquisition trajectory, avoiding the high variance of standard score\-function policy gradients while allowing end\-to\-end optimization of a non\-myopic acquisition policy\. To better align training with deployment, we further develop a straight\-through rollout scheme that follows hard feature acquisitions in the forward pass while backpropagating through the corresponding soft relaxation in the backward pass\. We stabilize optimization with entropy regularization and staged temperature sharpening\. Experiments on both synthetic and real\-world datasets demonstrate that NM\-PPG yields superior performance relative to state\-of\-the\-art AFA baselines\.

## Submission history

From: Linus Aronsson \[[view email](https://arxiv.org/show-email/190947df/2605.05511)\] **\[v1\]**Wed, 6 May 2026 23:24:54 UTC \(850 KB\)

Non-Myopic Active Feature Acquisition via Pathwise Policy Gradients

Similar Articles

Gradient Extrapolation-Based Policy Optimization

Accelerating Multi-Objective Bayesian Optimisation via Predictive-Gradient Catalysts

Steered Generation via Gradient-Based Optimization on Sparse Query Features

Metric-Gradient Projection for Stable Multi-Agent Policy Learning

Multi-module GRPO: Composing Policy Gradients and Prompt Optimization for Language Model Programs

Submit Feedback

Similar Articles

Gradient Extrapolation-Based Policy Optimization

Accelerating Multi-Objective Bayesian Optimisation via Predictive-Gradient Catalysts

Steered Generation via Gradient-Based Optimization on Sparse Query Features

Metric-Gradient Projection for Stable Multi-Agent Policy Learning

Multi-module GRPO: Composing Policy Gradients and Prompt Optimization for Language Model Programs