Non-Myopic Active Feature Acquisition via Pathwise Policy Gradients
Summary
This paper introduces NM-PPG, a non-myopic active feature acquisition method using pathwise policy gradients to optimize sequential feature selection in costly prediction scenarios.
View Cached Full Text
Cached at: 05/08/26, 07:42 AM
# Non-Myopic Active Feature Acquisition via Pathwise Policy Gradients Source: [https://arxiv.org/abs/2605.05511](https://arxiv.org/abs/2605.05511) [View PDF](https://arxiv.org/pdf/2605.05511) > Abstract:Active feature acquisition \(AFA\) considers prediction problems in which features are costly to obtain and the learner adaptively decides which feature values to acquire for each instance and when to stop and predict\. AFA can be formulated as a partially observable Markov decision process \(POMDP\), which naturally admits a sequential decision\-making perspective\. In this paper, we present non\-myopic pathwise policy gradients \(NM\-PPG\), a new AFA method built around this formulation\. We introduce a continuous relaxation of the acquisition process that enables pathwise gradients through the full acquisition trajectory, avoiding the high variance of standard score\-function policy gradients while allowing end\-to\-end optimization of a non\-myopic acquisition policy\. To better align training with deployment, we further develop a straight\-through rollout scheme that follows hard feature acquisitions in the forward pass while backpropagating through the corresponding soft relaxation in the backward pass\. We stabilize optimization with entropy regularization and staged temperature sharpening\. Experiments on both synthetic and real\-world datasets demonstrate that NM\-PPG yields superior performance relative to state\-of\-the\-art AFA baselines\. ## Submission history From: Linus Aronsson \[[view email](https://arxiv.org/show-email/190947df/2605.05511)\] **\[v1\]**Wed, 6 May 2026 23:24:54 UTC \(850 KB\)
Similar Articles
Gradient Extrapolation-Based Policy Optimization
The article introduces Gradient Extrapolation-Based Policy Optimization (GXPO), a method that approximates multi-step lookahead in RL training for LLMs using only three backward passes. It demonstrates improved reasoning performance on math benchmarks over standard GRPO while maintaining fixed active-phase costs.
Accelerating Multi-Objective Bayesian Optimisation via Predictive-Gradient Catalysts
This paper introduces a general acceleration mechanism for multi-objective Bayesian optimisation that uses Gaussian process predictive gradients as auxiliary signals to augment existing acquisition functions, enabling faster convergence to the global Pareto set under limited evaluation budgets.
Steered Generation via Gradient-Based Optimization on Sparse Query Features
This paper introduces Prototype-Based Sparse Steering, a method that applies sparse autoencoders to attention query activations in LLMs, then uses gradient-based optimization during inference to steer generation toward target behaviors. The approach is validated in both a logical planning task and a stylistic educational domain, demonstrating interpretable and disentangled control.
Metric-Gradient Projection for Stable Multi-Agent Policy Learning
Introduces HPML, a method that projects the joint update field of multi-agent systems onto a metric-gradient component to stabilize and improve multi-agent reinforcement learning. It provides theoretical guarantees and shows improved stability and returns on CTDE benchmarks.
Multi-module GRPO: Composing Policy Gradients and Prompt Optimization for Language Model Programs
The paper introduces mmGRPO, a multi-module extension of Group Relative Policy Optimization (GRPO) that improves accuracy in modular AI systems by optimizing language model calls and prompts. It reports an average 11% accuracy improvement across various tasks and provides an open-source implementation in DSPy.