axbench

#axbench

Beyond Steering Vector: Flow-based Activation Steering for Inference-Time Intervention

arXiv cs.CL ↗ · 2d ago Cached

This paper introduces FLAS, a flow-based activation steering method that learns a concept-conditioned velocity field to steer language model activations at inference time. On the AxBench benchmark, FLAS is the first learned method to consistently outperform in-context prompting on held-out concepts without per-concept tuning.

0 favorites 0 likes

axbench

Beyond Steering Vector: Flow-based Activation Steering for Inference-Time Intervention

Submit Feedback