rewriting

#rewriting

CR4T: Rewrite-Based Guardrails for Adolescent LLM Safety

arXiv cs.CL ↗ · 2026-05-22 Cached

This paper proposes CR4T, a model-agnostic safeguarding framework that rewrites unsafe or refusal-style LLM outputs into developmentally appropriate, guidance-oriented responses for adolescents, offering a more human-centered alternative to traditional refusal-centric guardrails.

0 favorites 0 likes

#rewriting

Be Kind, Rewrite: Benign Projections via Rewriting Defend Against LLM Data Poisoning Attacks

Hugging Face Daily Papers ↗ · 2026-05-18 Cached

This paper introduces open-book benign rewriting (OBBR) as a proactive defense against backdoor attacks on LLMs, showing it neutralizes harmful content by projecting to benign prompts, and improves safety by 51% over state-of-the-art defenses.

0 favorites 0 likes

rewriting

CR4T: Rewrite-Based Guardrails for Adolescent LLM Safety

Be Kind, Rewrite: Benign Projections via Rewriting Defend Against LLM Data Poisoning Attacks

Submit Feedback