Tag
This paper introduces idSCD, a white-box method that uses semantic correlation descriptors to identify whether a dataset was used in training a model, outperforming existing baselines across multiple settings.
Describes a white-box memory system for AI agents where every entry is visible and editable, and includes a 'Dream' feature for nighttime memory consolidation and reorganization with one-click rollback.
This paper introduces Semantic Representation Attack (SRA), a novel LLM-agnostic method that optimizes for malicious semantic representations rather than exact text, achieving high attack success rates across multiple open-source models.