attention-layers

Tag

Cards List
#attention-layers

Investigating Implicit Latent Trajectory Shifts: Bypassing Alignment via Long-Form Coherent Context

Reddit r/ArtificialInteligence · 2d ago

An empirical study investigating how long, semantically dense benign text can shift a model's latent space trajectory, diluting initial system prompts and bypassing post-training alignment constraints, as observed in both closed and open-source models.

0 favorites 0 likes
← Back to home

Submit Feedback