latent-shift

Tag

Cards List
#latent-shift

Coherent Context Can Silently Shift LLMs Into a Different Internal Regime — And Current Safety Systems Are Blind To It [D]

Reddit r/MachineLearning · 4d ago

An independent researcher presents evidence that coherent context can shift LLMs into a different internal regime before producing output, bypassing surface-level safety filters. This suggests current alignment methods like RLHF may not be robust defenses.

0 favorites 0 likes
← Back to home

Submit Feedback