Tag
This paper investigates whether LLMs can reliably self-report when their outputs have been compromised by adversarial prefills, finding that models often cannot distinguish between compromised and intentional outputs, and their limited recognition stems from normal refusal behavior rather than true self-awareness.
ActiveGraph announces two new papers on agent memory (LongMemEval) and self-improvement regimes, along with reference agents, pack templates, and upcoming meetups in Seattle and San Francisco.
Explains that super weights in large language models arise from the SoftMax-Attention interaction creating a 'Nothing Dump' token that serves as a stable reference point; removing these weights cripples performance.
This paper introduces a new approach leveraging certainty in transformer models, building on the 'Attention Is All You Need' paradigm.
SpaceX's mission includes a demo of a new vehicle for affordable, routine access to microgravity for scientific research and in-space manufacturing, with a planned splashdown in the Pacific Ocean.
This paper proposes a thermodynamic measure of intelligence defined as 'rare-valid lift' and argues that recursive self-simulation is necessary and nearly sufficient for high thermodynamic intelligence, making intelligence measurable on a universal scale.
Research paper shows that LLMs suffer from 'role confusion', where they prioritize the style of text over its actual role tags, enabling prompt injection attacks. Destyling text reduces attack success from 61% to 10%, indicating a fundamental challenge for LLM security.
This paper presents a theory that prompt injection attacks on LLMs stem from a fundamental flaw in how models perceive roles, treating roles as a type system for language. It explains existing attacks, predicts new ones, and proposes a research agenda for a science of roles.
This paper revises the estimated proportion of newly written code that is generated or reviewed by AI, analyzing its impact on software development.
This article explores how AI agents can automatically write and optimize their skill files using techniques like SkillOpt from Microsoft Research, which treats skill documents as trainable state and delivers significant performance improvements. It addresses the challenge of manual skill tuning and presents frameworks like GEPA and EvoSkill as evolutionary approaches.
Introduce 5 Codex Skills to improve research efficiency, including paper framework construction, image to PPT conversion, scientific diagram editing, academic writing assistance, and learning high-level paper structures, emphasizing turning repetitive processes into reusable skills.
Compiled the public quarterly reports, notes, and interviews of fund manager Zheng Xi into a structured corpus, and built it as a traceable skill across AI platforms for real data-driven investment research Q&A and fund analysis.
Computational complexity theorists argue that semiclassical gravity's non-linear dynamics would enable impossibly powerful computation, proving gravity must be quantized. The paper uses the Schrödinger-Newton equation to show that classical gravity coupled to quantum matter leads to computational contradictions.
Alisa Liu (alisawuffles), a UW NLP PhD in her final year and recipient of the OpenAI SuperAlignment Fellowship, announced she will be joining OpenAI next week. In her blog, she transparently detailed the entire job search process, including 46 recruiter screens and interviews with 11 top AI labs.
Anthropic demonstrates that AI systems can now perform world-modeling, as evidenced by the Fable standoff experiment.
autoarxiv lets you turn any arxiv paper into running code by simply changing the URL to autoarxiv.org. An AI agent from alphaXiv reads the paper, clones the repo, sets up dependencies, and runs a minimal reproduction to verify claims, logging everything live.
A new study reveals early results suggesting that AI is negatively affecting human skills, raising concerns about cognitive decline.
According to speculation, Anthropic's new model Mythos, after completing training in February this year, quietly changed the R&D rhythm, leading to a significant leap in AI capabilities over the past 5 months. Leading models are helping to train the next generation of models.
Researchers at Binghamton University used Shannon entropy to develop a mathematical method that solves Wordle puzzles with a 99% success rate, prioritizing informative guesses over likely answers.
This paper reports that slow breathing can modulate brain function and influence risk-taking behavior.