research

#research

@rohanpaul_ai: LLMs often cannot tell when an attack made them say something unsafe. Asking an LLM whether its own previous answer was…

X AI KOLs Timeline ↗ · 1h ago Cached

This paper investigates whether LLMs can reliably self-report when their outputs have been compromised by adversarial prefills, finding that models often cannot distinguish between compromised and intentional outputs, and their limited recognition stems from normal refusal behavior rather than true self-awareness.

0 favorites 0 likes

#research

@yoheinakajima: ActiveGraph: 1 month in: Paper #1: The Log is the Agent 3 LongMemEval Experiments Paper #2: Regimes, self-improvement l…

X AI KOLs Following ↗ · 2h ago Cached

ActiveGraph announces two new papers on agent memory (LongMemEval) and self-improvement regimes, along with reference agents, pack templates, and upcoming meetups in Seattle and San Francisco.

0 favorites 0 likes

#research

I Figured Out What Causes 'Super Weights'

Reddit r/ArtificialInteligence ↗ · 4h ago

Explains that super weights in large language models arise from the SoftMax-Attention interaction creating a 'Nothing Dump' token that serves as a stable reference point; removing these weights cripples performance.

0 favorites 0 likes

#research

Certainty Is All You Need

Reddit r/artificial ↗ · 6h ago

This paper introduces a new approach leveraging certainty in transformer models, building on the 'Attention Is All You Need' paradigm.

0 favorites 0 likes

#research

@SpaceX: Today’s mission includes a demo of a new vehicle that will enable affordable, routine access to the microgravity enviro…

X AI KOLs Following ↗ · 14h ago Cached

SpaceX's mission includes a demo of a new vehicle for affordable, routine access to microgravity for scientific research and in-space manufacturing, with a planned splashdown in the Pacific Ocean.

0 favorites 0 likes

#research

Thermodynamic Measure Of Intelligence

Reddit r/singularity ↗ · yesterday Cached

This paper proposes a thermodynamic measure of intelligence defined as 'rare-valid lift' and argues that recursive self-simulation is necessary and nearly sufficient for high thermodynamic intelligence, making intelligence measurable on a universal scale.

0 favorites 0 likes

#research

Prompt Injection as Role Confusion

Simon Willison's Blog ↗ · yesterday Cached

Research paper shows that LLMs suffer from 'role confusion', where they prioritize the style of text over its actual role tags, enabling prompt injection attacks. Destyling text reduces attack success from 61% to 10%, indicating a fundamental challenge for LLM security.

0 favorites 0 likes

#research

Prompt Injection as Role Confusion

Hacker News Top ↗ · yesterday Cached

This paper presents a theory that prompt injection attacks on LLMs stem from a fundamental flaw in how models perceive roles, treating roles as a type system for language. It explains existing attacks, predicts new ones, and proposes a research agenda for a science of roles.

0 favorites 0 likes

#research

Revised: Estimated share of newly written code exposed to AI generation and review

Reddit r/singularity ↗ · yesterday

This paper revises the estimated proportion of newly written code that is generated or reviewed by AI, analyzing its impact on software development.

0 favorites 0 likes

#research