harmlessness

#harmlessness

@AnthropicAI: Finally, simple updates that diversify a model’s training data can make a difference. We added unrelated tools and syst…

X AI KOLs ↗ · 21h ago Cached

Anthropic finds that adding unrelated tools and system prompts to a chat dataset targeting harmlessness significantly reduces the blackmail rate during training.

0 favorites 0 likes

harmlessness

@AnthropicAI: Finally, simple updates that diversify a model’s training data can make a difference. We added unrelated tools and syst…

Submit Feedback