axbench

Tag

Cards List
#axbench

Beyond Steering Vector: Flow-based Activation Steering for Inference-Time Intervention

arXiv cs.CL · 2d ago Cached

This paper introduces FLAS, a flow-based activation steering method that learns a concept-conditioned velocity field to steer language model activations at inference time. On the AxBench benchmark, FLAS is the first learned method to consistently outperform in-context prompting on held-out concepts without per-concept tuning.

0 favorites 0 likes
← Back to home

Submit Feedback