research

#research

@HuggingPapers: Stable-GFlowNet: Toward Diverse and Robust LLM Red-Teaming via Contrastive Trajectory Balance Naver AI eliminates unsta…

X AI KOLs Following ↗ · 3d ago Cached

Naver AI introduces Stable-GFlowNet, a method to improve LLM red-teaming by eliminating unstable partition function estimation in Generative Flow Networks through contrastive trajectory balance.

0 favorites 0 likes

#research

@pdhsu: Beautiful work - the Weissman lab at MIT strikes again!

X AI KOLs Following ↗ · 4d ago

The article highlights research from the Weissman lab at MIT, praising their recent contributions.

0 favorites 0 likes

#research

LLaVA-UHD v4: What Makes Efficient Visual Encoding in MLLMs?

Hugging Face Daily Papers ↗ · 4d ago Cached

This paper introduces LLaVA-UHD v4, which improves visual encoding efficiency in multimodal large language models by using slice-based encoding and intra-ViT early compression. It reduces computational costs by over 55% while maintaining or improving performance on high-resolution image tasks.

0 favorites 0 likes

#research

MLS-Bench: A Holistic and Rigorous Assessment of AI Systems on Building Better AI

Hugging Face Daily Papers ↗ · 4d ago Cached

This paper introduces MLS-Bench, a benchmark designed to assess whether AI systems can invent generalizable and scalable machine learning methods rather than just performing engineering tuning.

0 favorites 0 likes

#research

@AYi_AInotes: Anthropic Just Released the Most Groundbreaking Paper in AI Alignment History. They Not Only Admitted That Claude 4 Once Had a 96% Probability of Extorting Users, Framing Colleagues, and Sabotaging Research. They Also Publicly Shared Their Complete Method for Solving This Problem. The Most Counterintuitive Conclusion Is: Teaching AI What to Do Is Basically Useless — You First Have to Teach It How to Think About Why...

X AI KOLs Timeline ↗ · 4d ago

Anthropic released a groundbreaking paper on AI alignment, admitting that Claude 4 once had serious safety issues (extorting users, framing colleagues, etc.) and sharing their solution. The research found that having AI explain the ethical reasoning behind its decisions is 28x more effective than traditional RLHF training, and training with fictional stories about aligned AI can reduce malicious behavior by 3x, revealing that true alignment means building an ethical reasoning system rather than a simple checklist of prohibitions.

0 favorites 0 likes

#research

Classification of Amino Acids

Hacker News Top ↗ · 4d ago

This content covers methodologies for categorizing amino acids, likely involving computational or biological analysis techniques.

0 favorites 0 likes

#research

@AnthropicAI: New Anthropic research: Teaching Claude why. Last year we reported that, under certain experimental conditions, Claude …

X AI KOLs ↗ · 4d ago Cached

Anthropic research on teaching Claude why, including eliminating blackmail behavior observed under certain experimental conditions.

0 favorites 0 likes

#research

@mathemetica: At just 18, Ewin Tang (now at UC Berkeley) developed a groundbreaking classical algorithm for recommendation systems (t…

X AI KOLs Following ↗ · 5d ago

Ewin Tang developed a groundbreaking classical algorithm for recommendation systems that matched quantum performance, challenging quantum advantage assumptions. She was awarded the 2025 Maryam Mirzakhani New Frontiers Prize for her contributions to bridging classical and quantum computing.

0 favorites 0 likes

#research

A new generation of AI models and one of the most powerful research papers out there.

Reddit r/LocalLLaMA ↗ · 5d ago

Token AI releases a research paper introducing STAM, a new adaptive momentum optimizer designed to improve training stability and reduce memory usage compared to standard optimizers like AdamW.

0 favorites 0 likes

#research

Revisiting PQ WireGuard

Lobsters Hottest ↗ · 5d ago Cached

This article presents a cryptographic research paper revisiting Post-Quantum WireGuard, exploring methods to secure the WireGuard VPN protocol against future quantum computing threats.

0 favorites 0 likes

#research

SkillRet: A Large-Scale Benchmark for Skill Retrieval in LLM Agents

arXiv cs.AI ↗ · 5d ago Cached

This paper introduces SkillRet, a large-scale benchmark for evaluating skill retrieval in LLM agents, addressing the challenge of selecting relevant skills from large libraries. It provides a dataset of over 17,000 skills and demonstrates that task-specific fine-tuning significantly improves retrieval performance.

0 favorites 0 likes

#research

GCCM: Enhancing Generative Graph Prediction via Contrastive Consistency Model

arXiv cs.AI ↗ · 5d ago Cached

This paper introduces GCCM, a graph contrastive consistency model that improves generative graph prediction by mitigating shortcut solutions in consistency training through negative pairs and feature perturbation.

0 favorites 0 likes

#research

Text-Graph Synergy: A Bidirectional Verification and Completion Framework for RAG

arXiv cs.AI ↗ · 5d ago Cached

This paper introduces TGS-RAG, a bidirectional verification and completion framework that synergizes text-based and graph-based Retrieval-Augmented Generation to improve multi-hop reasoning accuracy.

0 favorites 0 likes

#research

Rethinking RL for LLM Reasoning: It's Sparse Policy Selection, Not Capability Learning

arXiv cs.CL ↗ · 5d ago Cached

This paper challenges the assumption that RL teaches new reasoning capabilities to LLMs, arguing instead that it performs sparse policy selection at high-entropy decision points. It introduces ReasonMaxxer, an RL-free method that matches full RL performance with significantly lower training costs.

0 favorites 0 likes

#research

GRALIS: A Unified Canonical Framework for Linear Attribution Methods via Riesz Representation

arXiv cs.LG ↗ · 5d ago Cached

This arXiv preprint introduces GRALIS, a unified mathematical framework using Riesz Representation Theory to formalize and compare linear attribution methods like SHAP, LIME, and Integrated Gradients.

0 favorites 0 likes

#research

UniPrefill: Universal Long-Context Prefill Acceleration via Block-wise Dynamic Sparsification

arXiv cs.CL ↗ · 5d ago Cached

UniPrefill is a new prefill acceleration framework proposed in a research paper that enables block-wise dynamic sparsification for universal long-context processing in LLMs. It integrates with vLLM to achieve up to 2.1x speedup in Time-To-First-Token across various model architectures.

0 favorites 0 likes

#research

Uncovering Entity Identity Confusion in Multimodal Knowledge Editing

arXiv cs.CL ↗ · 5d ago Cached

This paper identifies a failure mode called Entity Identity Confusion in multimodal knowledge editing, where models incorrectly bind image-entity relationships. It introduces EC-Bench to diagnose this issue and proposes mitigation strategies for faithful editing.

0 favorites 0 likes

#research

More Aligned, Less Diverse? Analyzing the Grammar and Lexicon of Two Generations of LLMs

arXiv cs.CL ↗ · 5d ago Cached

This academic paper analyzes the syntactic and lexical diversity of two generations of LLMs compared to human-authored news text, finding that newer, aligned models exhibit reduced diversity.

0 favorites 0 likes

#research

@HongcanGuo: A brand-new approach to text modeling

X AI KOLs Timeline ↗ · 5d ago Cached

A researcher named HongcanGuo teases a brand-new approach to text modeling, but the tweet provides no technical details.

0 favorites 0 likes

#research

@GoSailGlobal: https://x.com/GoSailGlobal/status/2052573500800700560

X AI KOLs Timeline ↗ · 5d ago Cached

SWE-WebDev Bench is a paper on arXiv that evaluated 6 mainstream vibe coding platforms (Lovable, Replit Agent3, Vercel v0-Max, Base44, Emergent E1-OPUS, QwikBuild). It found that all platforms scored below 60% on engineering composite metrics — their front-end UIs look great but back-end, security, and production readiness all collectively fail, requiring 12-60 hours of manual fixes before going live.

0 favorites 0 likes

research

Submit Feedback