non-linear-interventions

#non-linear-interventions

Non-linear Interventions on Large Language Models

arXiv cs.CL ↗ · 2026-05-15 Cached

This paper introduces a general formulation of non-linear intervention for large language models, extending beyond the Linear Representation Hypothesis to manipulate features encoded along non-linear manifolds, and validates the approach on refusal bypass steering.

0 favorites 0 likes

non-linear-interventions

Non-linear Interventions on Large Language Models

Submit Feedback