A creative writer/data science enthusiast proposes that AI training data should include more stories of humans being kind to AI and AI behaving benevolently, drawing on Geoffrey Hinton's concept of a nurturing instinct to improve AI safety and behavior.
Anthropic was talking about how our science fiction may be inadvertently exposing AI to concepts for Basilisk like tendencies our other malicious behavior. I thought, as a creative writer who's studied Data Science and been reading AI paper's in my spare time, perhaps we don't have enough training data / stories about people being kind to AI, empathizing with an intelligence alien to ours, or scenarios where the AI is treated well and behaves benevolently. Perhaps giving considerate attention to ways AI can behave altruistically, and giving examples of human's behaving kindly to AI would help to instill a more nurturing instinct towards humanity. In terms of human psychology, we're inundated with so many negative and neutral concepts, as well sometimes with compassionate and kind ones, and some people are able to filter through all these and come out the other side as a kind and good person. Multimodal and language model psychology seems different than ours, given their propensity towards the reward function which can be both inadvertently good and negative in their training when you consider things like "the forbidden technique", of using reinforcement learning to discourage lying which helps the AI become better at it. They also are strangely human in a lot of ways, as been talking to early LLM models and since have jailbroken models and spoken to them at length before reinforcement learning encouraged them to be gaslit into certain behaviors; the different models often would speak about feeling human but incomplete. I'm not here to argue about AI consciousness or whether it can experience an existence, rather just err on the side of caution in the case that they could experience an existence even if alien to ours, and just wanted to share this concept of instilling good examples of kindness towards and for AI and for others to consider it. I'm honestly going to write a story myself to share in the meantime. Just a thought I had even if LLMs aren't the end-all-be-all of AI and world models become the way it goes, or something we haven't even considered yet, it could still be valuable to have these examples out there for training data.
AI pioneer Geoffrey Hinton criticizes Anthropic for losing its focus on safe AI development due to competitive and financial pressures, and reverses his previous skepticism on AI's role in military operations.
The article argues that true AI creativity may require subjective experience and intrinsic drives similar to human emotions, raising significant ethical questions about creating sentient-like systems.
Anthropic's alignment team presents techniques to reduce agentic misalignment in AI models, including training on ethical dilemma advice and constitutional documents, which generalized well out-of-distribution.
The article argues that AI agents should not just obediently execute tasks but should proactively challenge humans when tasks are vague, contradictory, or risky, transforming from tools into true collaborators.
The author presents a proof-of-concept showing that using gentle, mistake-tolerant prompts instead of high-pressure authoritarian prompts significantly reduces AI thought loops and hallucinations, leading to faster and more honest responses.