Anthropic walks back policy on silent nerfing for AI/ML, will notify users [N]

Reddit r/MachineLearning News

Summary

Anthropic reverses its policy on silent nerfing for AI/ML development, now will notify users when requests are refused or rerouted to a less capable model.

From Wired: > “We’re changing Fable 5’s safeguards for frontier LLM development to make them visible.” Anthropic said in a statement to WIRED. “We made the wrong tradeoff and we apologize for not getting the balance right.” > Anthropic now says it’s changing course, and that Claude Fable 5’s safeguards for AI development will be visible to users. If the company suspects a user is trying to use Claude to build a highly capable AI it will alert them that it’s either refusing the request, or rerouting the user to a less capable model. Full article: https://www.wired.com/story/anthropic-responds-to-backlash-on-claudes-secret-sabotage-on-ai-research/
Original Article

Similar Articles