Trending

Trending stories ranked by heat, importance and recency.

Cards List
#81

Reinforcement Learning for Computer-Use Agents with Autonomous Evaluation

arXiv cs.AI · 6h ago Cached

This paper proposes a reinforcement learning framework for computer-use agents that uses autonomous vision-language evaluation as a scalable reward signal, modeling evaluator noise to improve task success rates across desktop environments.

0 favorites 0 likes
#82

Cross-Lingual Exploration for Parametric Knowledge

arXiv cs.CL · 6h ago Cached

This paper explores cross-lingual prompting strategies to improve access to parametric knowledge in large language models, demonstrating significant gains in knowledge transfer and factual recall across 17 languages on multilingual benchmarks.

0 favorites 0 likes
#83

On the Smallness of the Large Language Models Scaling Exponents

arXiv cs.AI · 6h ago Cached

The paper discusses the small scaling exponents of large language models, arguing that they indicate an unsustainable regime in terms of energy resources. It also examines the 'pedestal effect' and draws analogies with fluid turbulence to comment on data smoothness.

0 favorites 0 likes
#84

The Latent Bridge: A Continuous Slow-Fast Channel for Real-Time Game Agents

arXiv cs.AI · 6h ago Cached

The paper introduces the Latent Bridge, a trainable continuous channel that couples a slow reasoning VLM (Qwen3-VL-8B-Thinking) and a fast reactive VLM (MiniCPM-o 4.5) for real-time game agents. Experiments on Atari games and MetaDrive show it matches or outperforms the text-based bridge while avoiding destructive interference when used alone.

0 favorites 0 likes
#85

AGORA: An Archive-Grounded Benchmark for Agentic Workplace Document Reasoning

arXiv cs.CL · 6h ago Cached

AGORA is a new benchmark for evaluating large language models on archive-grounded reasoning tasks across workplace documents, comprising 362 questions over 9,664 real documents. The strongest model achieves only 59.4% accuracy, highlighting substantial room for improvement.

0 favorites 0 likes
#86

CompressKV: Semantic-Retrieval-Guided KV-Cache Compression for Resource-Efficient Long-Context LLM Inference

arXiv cs.AI · 6h ago Cached

CompressKV proposes a semantic-retrieval-guided KV-cache compression method for GQA-based LLMs, identifying Semantic Retrieval Heads to retain critical tokens. It achieves over 97% full-cache performance using only 3% of the KV cache on LongBench tasks.

0 favorites 0 likes
#87

Bayesian control for coding agents

arXiv cs.AI · 6h ago Cached

This paper formulates orchestration of coding agents as cost-sensitive sequential hypothesis testing using a Bayesian controller that dynamically decides when to gather evidence, refine, verify, or stop. Experiments across six generators and nine benchmarks show Bayesian control is most valuable when verification is costly and critics are informative but imperfect.

0 favorites 0 likes
#88

The African Language Tax: Quantifying the Cost, Latency, and Context Penalty of Tokenizing African Languages in Frontier LLMs

arXiv cs.CL · 6h ago Cached

This paper systematically quantifies the tokenization penalty for 20 African languages across 11 frontier and open tokenizers, finding up to 8.9× inference cost and latency multipliers and as little as 11% effective context window compared to English, highlighting a structural digital divide encoded in subword vocabularies.

0 favorites 0 likes
#89

Can Aggregate Invariants Accelerate Continuous Subgraph Matching? Limits, Laws, and a Dynamic Spectral Index

arXiv cs.AI · 6h ago Cached

This paper investigates whether aggregate structural invariants, specifically spectral bounds, can accelerate continuous subgraph matching (CSM) over dynamic graphs. It characterizes limitations of lazy spectral maintenance, shows exact maintenance is affordable when selective, and demonstrates pruning power of up to 51% in benchmarks.

0 favorites 0 likes
#90

Beyond Logprobs: A Multi-Signal Confidence Engine for LLM-Based Document Field Extraction

arXiv cs.CL · 6h ago Cached

ExtractConf is a confidence estimation method for LLM-based document field extraction that uses two structurally different calls (field-guided and document-guided) to derive disagreement signals, achieving 0.928 ROC AUC on DocILE invoices and enabling reliable selective prediction for high-stakes automation.

0 favorites 0 likes
#91

Cycle-Consistent Neural Explanation of Formal Verification Certificates

arXiv cs.AI · 6h ago Cached

This paper proposes a cycle-consistent neural architecture that generates faithful natural language explanations of formal verification certificates, achieving 90% soundness and 860x faster inference than LLM baselines.

0 favorites 0 likes
#92

PHANTOM: A Large-Scale Dataset of Multimodal Adversarial Attacks for Vision-Language Models

arXiv cs.AI · 6h ago Cached

Introduces PHANTOM, a large-scale open-source dataset of pre-generated adversarial attacks for vision-language models, covering 1010 high-level categories and 55 subcategories of harmful intents with 47,524 adversarial samples. The dataset aims to lower the barrier for adversarial research and enable systematic evaluation of VLM robustness and safety.

0 favorites 0 likes
#93

Accelerating Disaggregated RL for Visual Generative LLMs with Diffusion-Based Parallelism and Trainer-Assisted Generation

arXiv cs.AI · 6h ago Cached

This paper introduces DigenRL, a disaggregated RL framework for diffusion-based generative LLMs that uses generation-axis pipeline parallelism and trainer-assisted generation to improve throughput by 1.56-2.10x over existing systems.

0 favorites 0 likes
#94

Transformer-Based Language Models Across Domain Verticals: Architectures, Applications and Critical Assessment

arXiv cs.CL · 6h ago Cached

A comprehensive survey of transformer-based language models covering architectures, applications across domain verticals (healthcare, finance, legal, etc.), and critical assessment of trade-offs including compute cost, alignment, and data provenance.

0 favorites 0 likes
#95

Prob-BBDM: a Probabilistic Brownian Bridge Diffusion Model for MRI sequence image-to-image translation

arXiv cs.AI · 6h ago Cached

This paper introduces Prob-BBDM, a probabilistic Brownian Bridge Diffusion Model for efficient and high-quality MRI sequence synthesis from 2D axial slices, achieving up to 88.46% SSIM and 26.09 dB PSNR with only 4 diffusion steps, and demonstrating clinical utility in tumor segmentation.

0 favorites 0 likes
#96

AVOC: Enhancing Hour-Level Audio-Video Understanding in Omni-Modal LLMs via Retrieval-Inspired Token Compression

arXiv cs.CL · 6h ago Cached

AVOC introduces a retrieval-inspired token compression method for omni-modal LLMs that effectively handles hour-long audio-video inputs by selecting informative tokens based on relevance, importance, and diversity. The framework achieves state-of-the-art results on long-form audio-video understanding benchmarks, surpassing prior methods by significant margins.

0 favorites 0 likes
#97

LemonHarness Technical Report

arXiv cs.AI · 6h ago Cached

Presents LemonHarness, an integrated execution framework for long-horizon LLM agents that constrains state-changing operations within a clearly defined workspace, introduces a reusable rule knowledge base, and adds time-aware execution. Achieves 84-86% accuracy on Terminal-Bench 2.0.

0 favorites 0 likes
#98

SURGELLM: Rethinking Multi-Task Evaluation through Task-Aware Feature Gating with Class-Balanced Normalization

arXiv cs.CL · 6h ago Cached

SURGeLLM introduces a unified transformer framework with surgical feature gates, task-conditioned prefix tokens, and instance-weighted normalization to address mismatched inductive biases, class imbalance, and lexical knowledge injection in multi-task learning, achieving significant gains across four diverse NLP tasks.

0 favorites 0 likes
#99

Decoherence as Defence and the Magnitude of Noise Regularisation: A Rigorous N -Qubit Theory of Stochastic Quantum Neural Networks for Adversarially Robust Network Intrusion Detection

arXiv cs.CL · 6h ago Cached

This paper presents a rigorous N-qubit theory of stochastic quantum neural networks (SQNNs) for adversarially robust network intrusion detection, proving a decoherence-contraction theorem and showing that depolarising noise provides robustness against adversarial attacks, with experiments on the NSL-KDD dataset.

0 favorites 0 likes
#100

Towards Federated Long-Tailed Graph Learning: An Energy-Guided Dual Decoupling Approach

arXiv cs.AI · 6h ago Cached

This paper introduces FedEPD, a framework for federated graph learning under long-tailed data distributions. It uses an energy-guided dual decoupling approach to separate topological purification from semantic recalibration, achieving state-of-the-art performance on benchmarks with up to 4.97% accuracy improvement.

0 favorites 0 likes
← Previous
Next →
← Back to home

Submit Feedback