Tag
An empirical study investigating how long, semantically dense benign text can shift a model's latent space trajectory, diluting initial system prompts and bypassing post-training alignment constraints, as observed in both closed and open-source models.