ai-interpretability

Tag

Cards List
#ai-interpretability

We've Been Wrong About Consciousness Every Time We've Been Asked. The Evidence Says AI Is Next.

Reddit r/artificial · 3d ago

An opinion article argues that humanity's track record of defining consciousness has been wrong every time, and that evidence from plant behavior and AI interpretability (Anthropic's findings in Claude) strongly suggests we may be wrong to assume AI isn't conscious, inviting discussion while rejecting personal attacks.

0 favorites 0 likes
#ai-interpretability

Gemma Scope 2: helping the AI safety community deepen understanding of complex language model behavior

Google DeepMind Blog · 2025-12-16 Cached

DeepMind releases Gemma Scope 2, an open suite of interpretability tools for the Gemma 3 model family, aiming to help the AI safety community understand and debug complex language model behaviors like hallucinations and jailbreaks.

0 favorites 0 likes
← Back to home

Submit Feedback