research

#research

@modal: On May 30th, we're partnering with @OpenAIDevs and @AntlerGlobal to host an Autoresearch Systems Hackathon to tackle pr…

X AI KOLs Following ↗ · yesterday Cached

Modal announces a partnership with OpenAI Devs and Antler Global to host an Autoresearch Systems Hackathon on May 30th targeting data and compute-intensive challenges.

0 favorites 0 likes

#research

Images in Sentences: Scaling Interleaved Instructions for Unified Visual Generation

Hugging Face Daily Papers ↗ · yesterday Cached

This paper introduces INSET, a unified multimodal model that embeds images as native vocabulary within textual instructions to improve handling of complex interleaved inputs for image generation and editing.

0 favorites 0 likes

#research

Long Video Generation (4 minute read)

TLDR AI ↗ · yesterday Cached

The article introduces A²RD, a novel architecture for generating consistent long videos using agentic autoregressive diffusion. It proposes a Retrieve–Synthesize–Refine–Update cycle and a new benchmark, LVBench-C, to address semantic drift in long-horizon video synthesis.

0 favorites 0 likes

#research

MEMOREPAIR: Barrier-First Cascade Repair in Agentic Memory

arXiv cs.AI ↗ · 2d ago Cached

This paper introduces MemoRepair, a barrier-first cascade repair contract for agentic memory that addresses the problem of stale derived artifacts when source data changes. Experiments demonstrate that MemoRepair significantly reduces invalidated memory exposure and repair costs compared to exhaustive repair methods.

0 favorites 0 likes

#research

HMACE: Heterogeneous Multi-Agent Collaborative Evolution for Combinatorial Optimization

arXiv cs.AI ↗ · 2d ago Cached

This paper introduces HMACE, a heterogeneous multi-agent collaborative evolution framework that uses Large Language Models to automate heuristic design for NP-hard combinatorial optimization problems. It demonstrates improved quality-efficiency trade-offs over single-agent and multi-agent baselines on problems like TSP and BPP.

0 favorites 0 likes

#research

How Well Do LLMs Perform on the Simplest Long-Chain Reasoning Tasks: An Empirical Study on the Equivalence Class Problem

arXiv cs.AI ↗ · 2d ago Cached

This empirical study evaluates LLMs on the Equivalence Class Problem to assess long-chain reasoning capabilities, finding that non-reasoning models fail while reasoning models struggle with specific structural difficulties.

0 favorites 0 likes

#research

MIPIAD: Multilingual Indirect Prompt Injection Attack Defense with Qwen -- TF-IDF Hybrid and Meta-Ensemble Learning

arXiv cs.CL ↗ · 2d ago Cached

This paper presents MIPIAD, a multilingual defense framework against indirect prompt injection attacks using a hybrid of Qwen2.5-based classifiers and TF-IDF features with meta-ensemble learning. It demonstrates strong performance on English and Bangla benchmarks, achieving high F1 and AUROC scores while reducing cross-lingual gaps.

0 favorites 0 likes

#research

Benchmarked Yet Not Measured -- Generative AI Should be Evaluated Against Real-World Utility

arXiv cs.LG ↗ · 2d ago Cached

This paper argues that Generative AI evaluation should shift from static benchmarks to measuring real-world utility and human outcomes. It introduces the SCU-GenEval framework and supporting instruments to address the disconnect between benchmark performance and deployment success.

0 favorites 0 likes

#research

From 0-Order Selection to 2-Order Judgment: Combinatorial Hardening Exposes Compositional Failures in Frontier LLMs

arXiv cs.CL ↗ · 2d ago Cached

This paper introduces LogiHard, a framework that uses combinatorial hardening to expose compositional failures in frontier LLMs, demonstrating significant accuracy drops in logical reasoning tasks.

0 favorites 0 likes

#research

ProtSent: Protein Sentence Transformers

arXiv cs.LG ↗ · 2d ago Cached

This article introduces ProtSent, a contrastive fine-tuning framework for protein language models that improves embedding quality for downstream tasks like remote homology detection and structural retrieval.

0 favorites 0 likes

#research

MIND: Monge Inception Distance for Generative Models Evaluation

arXiv cs.LG ↗ · 2d ago Cached

This paper introduces MIND (Monge Inception Distance), a new metric for evaluating generative models that is more sample-efficient, faster, and robust than the standard Fréchet Inception Distance (FID).

0 favorites 0 likes

#research

Region4Web: Rethinking Observation Space Granularity for Web Agents

arXiv cs.CL ↗ · 2d ago Cached

This paper introduces Region4Web, a framework that improves web agent performance by organizing observation spaces into functional regions rather than individual elements. It demonstrates that this approach reduces observation length and increases task success rates on the WebArena benchmark.

0 favorites 0 likes

#research

MedExAgent: Training LLM Agents to Ask, Examine, and Diagnose in Noisy Clinical Environments

arXiv cs.CL ↗ · 2d ago Cached

The paper introduces MedExAgent, a framework that formalizes clinical diagnosis as a Partially Observable Markov Decision Process (POMDP) to handle noisy and incomplete information. It proposes a two-stage training pipeline combining supervised finetuning and reinforcement learning to improve diagnostic accuracy and cost-efficiency in medical LLMs.

0 favorites 0 likes

#research

Towards Closing the Autoregressive Gap in Language Modeling via Entropy-Gated Continuous Bitstream Diffusion

arXiv cs.CL ↗ · 2d ago Cached

This paper introduces a diffusion language model that treats text as a continuous process over binary bitstreams, using entropy-gated stochastic sampling to close the performance gap with autoregressive models. It achieves state-of-the-art results on LM1B and OWT benchmarks while reducing memory footprint.

0 favorites 0 likes

#research

Model Merging Scaling Laws in Large Language Models

Hugging Face Daily Papers ↗ · 2d ago Cached

This paper establishes empirical scaling laws for language model merging, identifying power-law relationships between model size, expert count, and performance to enable predictive planning for optimal model composition.

0 favorites 0 likes

#research

Signals: finding the most informative agent traces without LLM judges [R]

Reddit r/MachineLearning ↗ · 2d ago

Katanemo Labs introduces 'Signals,' a lightweight method for identifying informative agent traces without using LLM judges or GPUs, achieving higher efficiency in trajectory analysis.

0 favorites 0 likes

#research

@ylecun: BS. Attention was born in Montréal PyTorch in NYC. AlphaGo in London AlphaFold in London ESMFold in NYC Llama 1 in Pari…

X AI KOLs Following ↗ · 3d ago Cached

Yann LeCun disputes claims about Silicon Valley's dominance in AI innovation by listing key breakthroughs like Attention, PyTorch, and AlphaFold that originated in other locations such as Montreal, London, and Paris.

0 favorites 0 likes

#research

Cosmic Rays Are Quantum Computers' Kryptonite—Software might just solve the problem

Lobsters Hottest ↗ · 3d ago Cached

A new study reveals a software strategy to reduce cosmic ray-induced errors in superconducting quantum computers by nearly a half-million-fold, bringing failure rates from every 10 seconds down to less than once per month.

0 favorites 0 likes

#research

@0xLogicrw: Tilde Research found a hidden flaw in the Muon optimizer, used by leading models like DeepSeek V4, Kimi K2.5, and GLM-5: it causes over a quarter of MLP layer neurons to die permanently in early training. The team designed an alternative optimizer, Auro…

X AI KOLs Timeline ↗ · 3d ago

Tilde Research discovered a flaw in the Muon optimizer that leads to early death of MLP neurons and open-sourced an alternative, Aurora. While maintaining orthogonality, Aurora resolves the neuron death issue, significantly improving training efficiency.

0 favorites 0 likes

#research

Aurora: A Leverage-Aware Optimizer for Rectangular Matrices

Lobsters Hottest ↗ · 3d ago Cached

Tilde Research introduces Aurora, a new optimizer designed to prevent neuron death in MLP layers while maintaining orthogonality, achieving state-of-the-art results on nanoGPT benchmarks and 100x data efficiency on 1B models.

0 favorites 0 likes

research

Submit Feedback