research

#research

HMACE: Heterogeneous Multi-Agent Collaborative Evolution for Combinatorial Optimization

arXiv cs.AI ↗ · 4d ago Cached

This paper introduces HMACE, a heterogeneous multi-agent collaborative evolution framework that uses Large Language Models to automate heuristic design for NP-hard combinatorial optimization problems. It demonstrates improved quality-efficiency trade-offs over single-agent and multi-agent baselines on problems like TSP and BPP.

0 favorites 0 likes

#research

How Well Do LLMs Perform on the Simplest Long-Chain Reasoning Tasks: An Empirical Study on the Equivalence Class Problem

arXiv cs.AI ↗ · 4d ago Cached

This empirical study evaluates LLMs on the Equivalence Class Problem to assess long-chain reasoning capabilities, finding that non-reasoning models fail while reasoning models struggle with specific structural difficulties.

0 favorites 0 likes

#research

MIPIAD: Multilingual Indirect Prompt Injection Attack Defense with Qwen -- TF-IDF Hybrid and Meta-Ensemble Learning

arXiv cs.CL ↗ · 4d ago Cached

This paper presents MIPIAD, a multilingual defense framework against indirect prompt injection attacks using a hybrid of Qwen2.5-based classifiers and TF-IDF features with meta-ensemble learning. It demonstrates strong performance on English and Bangla benchmarks, achieving high F1 and AUROC scores while reducing cross-lingual gaps.

0 favorites 0 likes

#research

Benchmarked Yet Not Measured -- Generative AI Should be Evaluated Against Real-World Utility

arXiv cs.LG ↗ · 4d ago Cached

This paper argues that Generative AI evaluation should shift from static benchmarks to measuring real-world utility and human outcomes. It introduces the SCU-GenEval framework and supporting instruments to address the disconnect between benchmark performance and deployment success.

0 favorites 0 likes

#research

From 0-Order Selection to 2-Order Judgment: Combinatorial Hardening Exposes Compositional Failures in Frontier LLMs

arXiv cs.CL ↗ · 4d ago Cached

This paper introduces LogiHard, a framework that uses combinatorial hardening to expose compositional failures in frontier LLMs, demonstrating significant accuracy drops in logical reasoning tasks.

0 favorites 0 likes

#research

ProtSent: Protein Sentence Transformers

arXiv cs.LG ↗ · 4d ago Cached

This article introduces ProtSent, a contrastive fine-tuning framework for protein language models that improves embedding quality for downstream tasks like remote homology detection and structural retrieval.

0 favorites 0 likes

#research

MIND: Monge Inception Distance for Generative Models Evaluation

arXiv cs.LG ↗ · 4d ago Cached

This paper introduces MIND (Monge Inception Distance), a new metric for evaluating generative models that is more sample-efficient, faster, and robust than the standard Fréchet Inception Distance (FID).

0 favorites 0 likes

#research

Region4Web: Rethinking Observation Space Granularity for Web Agents

arXiv cs.CL ↗ · 4d ago Cached

This paper introduces Region4Web, a framework that improves web agent performance by organizing observation spaces into functional regions rather than individual elements. It demonstrates that this approach reduces observation length and increases task success rates on the WebArena benchmark.

0 favorites 0 likes

#research

MedExAgent: Training LLM Agents to Ask, Examine, and Diagnose in Noisy Clinical Environments

arXiv cs.CL ↗ · 4d ago Cached

The paper introduces MedExAgent, a framework that formalizes clinical diagnosis as a Partially Observable Markov Decision Process (POMDP) to handle noisy and incomplete information. It proposes a two-stage training pipeline combining supervised finetuning and reinforcement learning to improve diagnostic accuracy and cost-efficiency in medical LLMs.

0 favorites 0 likes

#research

Towards Closing the Autoregressive Gap in Language Modeling via Entropy-Gated Continuous Bitstream Diffusion

arXiv cs.CL ↗ · 4d ago Cached

This paper introduces a diffusion language model that treats text as a continuous process over binary bitstreams, using entropy-gated stochastic sampling to close the performance gap with autoregressive models. It achieves state-of-the-art results on LM1B and OWT benchmarks while reducing memory footprint.

0 favorites 0 likes

#research

Model Merging Scaling Laws in Large Language Models

Hugging Face Daily Papers ↗ · 4d ago Cached

This paper establishes empirical scaling laws for language model merging, identifying power-law relationships between model size, expert count, and performance to enable predictive planning for optimal model composition.

0 favorites 0 likes

#research

Signals: finding the most informative agent traces without LLM judges [R]

Reddit r/MachineLearning ↗ · 4d ago

Katanemo Labs introduces 'Signals,' a lightweight method for identifying informative agent traces without using LLM judges or GPUs, achieving higher efficiency in trajectory analysis.

0 favorites 0 likes

#research

@ylecun: BS. Attention was born in Montréal PyTorch in NYC. AlphaGo in London AlphaFold in London ESMFold in NYC Llama 1 in Pari…

X AI KOLs Following ↗ · 4d ago Cached

Yann LeCun disputes claims about Silicon Valley's dominance in AI innovation by listing key breakthroughs like Attention, PyTorch, and AlphaFold that originated in other locations such as Montreal, London, and Paris.

0 favorites 0 likes

#research

Cosmic Rays Are Quantum Computers' Kryptonite—Software might just solve the problem

Lobsters Hottest ↗ · 4d ago Cached

A new study reveals a software strategy to reduce cosmic ray-induced errors in superconducting quantum computers by nearly a half-million-fold, bringing failure rates from every 10 seconds down to less than once per month.

0 favorites 0 likes

#research

@0xLogicrw: Tilde Research found a hidden flaw in the Muon optimizer, used by leading models like DeepSeek V4, Kimi K2.5, and GLM-5: it causes over a quarter of MLP layer neurons to die permanently in early training. The team designed an alternative optimizer, Auro…

X AI KOLs Timeline ↗ · 5d ago

Tilde Research discovered a flaw in the Muon optimizer that leads to early death of MLP neurons and open-sourced an alternative, Aurora. While maintaining orthogonality, Aurora resolves the neuron death issue, significantly improving training efficiency.

0 favorites 0 likes

#research

Aurora: A Leverage-Aware Optimizer for Rectangular Matrices

Lobsters Hottest ↗ · 5d ago Cached

Tilde Research introduces Aurora, a new optimizer designed to prevent neuron death in MLP layers while maintaining orthogonality, achieving state-of-the-art results on nanoGPT benchmarks and 100x data efficiency on 1B models.

0 favorites 0 likes

#research

@HuggingPapers: Stable-GFlowNet: Toward Diverse and Robust LLM Red-Teaming via Contrastive Trajectory Balance Naver AI eliminates unsta…

X AI KOLs Following ↗ · 5d ago Cached

Naver AI introduces Stable-GFlowNet, a method to improve LLM red-teaming by eliminating unstable partition function estimation in Generative Flow Networks through contrastive trajectory balance.

0 favorites 0 likes

#research

@pdhsu: Beautiful work - the Weissman lab at MIT strikes again!

X AI KOLs Following ↗ · 5d ago

The article highlights research from the Weissman lab at MIT, praising their recent contributions.

0 favorites 0 likes

#research

LLaVA-UHD v4: What Makes Efficient Visual Encoding in MLLMs?

Hugging Face Daily Papers ↗ · 6d ago Cached

This paper introduces LLaVA-UHD v4, which improves visual encoding efficiency in multimodal large language models by using slice-based encoding and intra-ViT early compression. It reduces computational costs by over 55% while maintaining or improving performance on high-resolution image tasks.

0 favorites 0 likes

#research

MLS-Bench: A Holistic and Rigorous Assessment of AI Systems on Building Better AI

Hugging Face Daily Papers ↗ · 6d ago Cached

This paper introduces MLS-Bench, a benchmark designed to assess whether AI systems can invent generalizable and scalable machine learning methods rather than just performing engineering tuning.

0 favorites 0 likes

research

Submit Feedback