Non-Myopic Active Feature Acquisition via Pathwise Policy Gradients

arXiv cs.LG Papers

Summary

This paper introduces NM-PPG, a non-myopic active feature acquisition method using pathwise policy gradients to optimize sequential feature selection in costly prediction scenarios.

arXiv:2605.05511v1 Announce Type: new Abstract: Active feature acquisition (AFA) considers prediction problems in which features are costly to obtain and the learner adaptively decides which feature values to acquire for each instance and when to stop and predict. AFA can be formulated as a partially observable Markov decision process (POMDP), which naturally admits a sequential decision-making perspective. In this paper, we present non-myopic pathwise policy gradients (NM-PPG), a new AFA method built around this formulation. We introduce a continuous relaxation of the acquisition process that enables pathwise gradients through the full acquisition trajectory, avoiding the high variance of standard score-function policy gradients while allowing end-to-end optimization of a non-myopic acquisition policy. To better align training with deployment, we further develop a straight-through rollout scheme that follows hard feature acquisitions in the forward pass while backpropagating through the corresponding soft relaxation in the backward pass. We stabilize optimization with entropy regularization and staged temperature sharpening. Experiments on both synthetic and real-world datasets demonstrate that NM-PPG yields superior performance relative to state-of-the-art AFA baselines.
Original Article
View Cached Full Text

Cached at: 05/08/26, 07:42 AM

# Non-Myopic Active Feature Acquisition via Pathwise Policy Gradients
Source: [https://arxiv.org/abs/2605.05511](https://arxiv.org/abs/2605.05511)
[View PDF](https://arxiv.org/pdf/2605.05511)

> Abstract:Active feature acquisition \(AFA\) considers prediction problems in which features are costly to obtain and the learner adaptively decides which feature values to acquire for each instance and when to stop and predict\. AFA can be formulated as a partially observable Markov decision process \(POMDP\), which naturally admits a sequential decision\-making perspective\. In this paper, we present non\-myopic pathwise policy gradients \(NM\-PPG\), a new AFA method built around this formulation\. We introduce a continuous relaxation of the acquisition process that enables pathwise gradients through the full acquisition trajectory, avoiding the high variance of standard score\-function policy gradients while allowing end\-to\-end optimization of a non\-myopic acquisition policy\. To better align training with deployment, we further develop a straight\-through rollout scheme that follows hard feature acquisitions in the forward pass while backpropagating through the corresponding soft relaxation in the backward pass\. We stabilize optimization with entropy regularization and staged temperature sharpening\. Experiments on both synthetic and real\-world datasets demonstrate that NM\-PPG yields superior performance relative to state\-of\-the\-art AFA baselines\.

## Submission history

From: Linus Aronsson \[[view email](https://arxiv.org/show-email/190947df/2605.05511)\] **\[v1\]**Wed, 6 May 2026 23:24:54 UTC \(850 KB\)

Similar Articles

Gradient Extrapolation-Based Policy Optimization

arXiv cs.LG

The article introduces Gradient Extrapolation-Based Policy Optimization (GXPO), a method that approximates multi-step lookahead in RL training for LLMs using only three backward passes. It demonstrates improved reasoning performance on math benchmarks over standard GRPO while maintaining fixed active-phase costs.

Steered Generation via Gradient-Based Optimization on Sparse Query Features

arXiv cs.LG

This paper introduces Prototype-Based Sparse Steering, a method that applies sparse autoencoders to attention query activations in LLMs, then uses gradient-based optimization during inference to steer generation toward target behaviors. The approach is validated in both a logical planning task and a stylistic educational domain, demonstrating interpretable and disentangled control.

Metric-Gradient Projection for Stable Multi-Agent Policy Learning

arXiv cs.LG

Introduces HPML, a method that projects the joint update field of multi-agent systems onto a metric-gradient component to stabilize and improve multi-agent reinforcement learning. It provides theoretical guarantees and shows improved stability and returns on CTDE benchmarks.