adversarial-generation

Tag

Cards List
#adversarial-generation

OSCToM: RL-Guided Adversarial Generation for High-Order Theory of Mind

arXiv cs.AI · 2026-05-22 Cached

This paper presents OSCToM, an RL-guided method for generating adversarial data to test nested belief conflicts in LLMs, improving Theory of Mind reasoning on benchmarks like FANToM.

0 favorites 0 likes
#adversarial-generation

SafeHarbor: Hierarchical Memory-Augmented Guardrail for LLM Agent Safety

Hugging Face Daily Papers · 2026-05-07 Cached

SafeHarbor is a novel framework for LLM agent safety that uses hierarchical memory and self-evolution to balance safety and utility, achieving state-of-the-art performance on benign and malicious tasks.

0 favorites 0 likes
← Back to home

Submit Feedback