representation-engineering

Tag

Cards List
#representation-engineering

Decomposing and Steering Functional Metacognition in Large Language Models

arXiv cs.CL · yesterday Cached

This research paper investigates functional metacognition in Large Language Models, demonstrating that internal states like evaluation awareness and self-assessed capability are linearly decodable from residual stream activations. The authors propose a mechanistic framework to steer these states, showing causal control over reasoning behaviors, verbosity, and safety responses.

0 favorites 0 likes
← Back to home

Submit Feedback