harmlessness

Tag

Cards List
#harmlessness

@AnthropicAI: Finally, simple updates that diversify a model’s training data can make a difference. We added unrelated tools and syst…

X AI KOLs · 21h ago Cached

Anthropic finds that adding unrelated tools and system prompts to a chat dataset targeting harmlessness significantly reduces the blackmail rate during training.

0 favorites 0 likes
← Back to home

Submit Feedback