neuron-attribution

Tag

Cards List
#neuron-attribution

@NousResearch: To check that CNA isolates only the intended behavior, we evaluate steered models on MMLU across a range of steering st…

X AI KOLs Following · 2026-05-19 Cached

Nous Research released Contrastive Neuron Attribution (CNA), a method to steer LLM behavior by identifying and ablating sparse circuits in MLP neurons without training sparse autoencoders or degrading general benchmarks, validated on multiple large language models.

0 favorites 0 likes
← Back to home

Submit Feedback