arxiv

Tag

Cards List
#arxiv

Optimistic Dual Averaging Unifies Modern Optimizers

arXiv cs.LG · 17h ago Cached

This paper introduces SODA, a generalization of Optimistic Dual Averaging that unifies various modern optimizers like Muon and Lion. It proposes a practical wrapper that improves performance across different scales without requiring additional hyperparameter tuning for weight decay.

0 favorites 0 likes
#arxiv

CORE: Cyclic Orthotope Relation Embedding for Knowledge Graph Completion

arXiv cs.LG · 17h ago Cached

This paper introduces CORE, a new knowledge graph completion model that uses cyclic orthotope relation embeddings on a torus manifold to address boundary constraints in region-based models. Experiments show competitive performance in link prediction tasks.

0 favorites 0 likes
#arxiv

Rank Is Not Capacity: Spectral Occupancy for Latent Graph Models

arXiv cs.LG · 17h ago Cached

This paper proposes Spectra, a method using spectral occupancy to analyze and control the realized capacity of latent graph models, arguing that rank is not equivalent to model capacity.

0 favorites 0 likes
#arxiv

HEPA: A Self-Supervised Horizon-Conditioned Event Predictive Architecture for Time Series

arXiv cs.LG · 17h ago Cached

This paper introduces HEPA, a self-supervised architecture for predicting rare critical events in time series using a Joint-Embedding Predictive Architecture (JEPA) pretraining strategy. It demonstrates superior performance across multiple domains with significantly fewer labeled data and tuned parameters compared to leading models.

0 favorites 0 likes
#arxiv

Newton's Lantern: A Reinforcement Learning Framework for Finetuning AC Power Flow Warm Start Models

arXiv cs.LG · 17h ago Cached

The article introduces Newton's Lantern, a reinforcement learning framework for finetuning warm start models to solve the AC power flow problem more efficiently, particularly near voltage collapse.

0 favorites 0 likes
#arxiv

Trust Region Inverse Reinforcement Learning: Explicit Dual Ascent using Local Policy Updates

arXiv cs.LG · 17h ago Cached

This paper introduces Trust Region Inverse Reinforcement Learning (TRIRL), a method that combines monotonic dual improvement with efficient local policy updates to outperform state-of-the-art imitation learning methods. It addresses the trade-off between stability and computational cost in IRL by using trust-region constraints.

0 favorites 0 likes
#arxiv

ACSAC: Adaptive Chunk Size Actor-Critic with Causal Transformer Q-Network

arXiv cs.LG · 17h ago Cached

This paper introduces ACSAC, a reinforcement learning method that uses an adaptive chunk size actor-critic algorithm with a causal Transformer Q-network to handle long-horizon, sparse-reward tasks. It demonstrates state-of-the-art performance on manipulation tasks by dynamically adjusting action chunk sizes based on state-dependent needs.

0 favorites 0 likes
#arxiv

SkillGen: Verified Inference-Time Agent Skill Synthesis

arXiv cs.LG · 17h ago Cached

This article introduces SkillGen, a multi-agent framework that synthesizes and verifies reusable inference-time skills for LLM agents by contrasting successful and failed trajectories. The method ensures skills are auditable and empirically verified for their net positive impact on agent performance.

0 favorites 0 likes
#arxiv

TMPO: Trajectory Matching Policy Optimization for Diverse and Efficient Diffusion Alignment

arXiv cs.LG · 17h ago Cached

This paper introduces Trajectory Matching Policy Optimization (TMPO), a method for aligning diffusion models that addresses reward hacking and visual mode collapse by matching trajectory-level reward distributions rather than maximizing scalar rewards.

0 favorites 0 likes
#arxiv

$\xi$-DPO: Direct Preference Optimization via Ratio Reward Margin

arXiv cs.LG · 17h ago Cached

This paper introduces xi-DPO, a novel preference optimization method that reformulates the objective to minimize distance to optimal ratio reward margins, addressing hyperparameter tuning challenges in SimPO. Experimental results show that xi-DPO outperforms existing methods on open benchmarks.

0 favorites 0 likes
#arxiv

LEAP: Unlocking dLLM Parallelism via Lookahead Early-Convergence Token Detection

arXiv cs.LG · 17h ago Cached

This paper introduces LEAP, a training-free method to accelerate inference in Diffusion Language Models (dLLMs) by detecting early-converging tokens, reducing denoising steps by 30% without losing accuracy.

0 favorites 0 likes
#arxiv

Hierarchical Multi-Scale Graph Neural Networks: Scalable Heterophilous Learning with Oversmoothing and Oversquashing Mitigation

arXiv cs.LG · 17h ago Cached

This paper introduces HMH, a hierarchical multi-scale Graph Neural Network framework designed to address oversmoothing and oversquashing in heterophilous graphs. It utilizes spectral filters with Haar bases to achieve scalable learning and improved performance on node and graph classification tasks.

0 favorites 0 likes
#arxiv

Probabilistic Calibration Is a Trainable Capability in Language Models

arXiv cs.CL · 17h ago Cached

This paper investigates whether probabilistic calibration in language models can be improved through fine-tuning, comparing soft-target and hard-target methods across 12 models. The results show that calibration is a trainable capability, though gains sometimes reduce downstream arithmetic reasoning capabilities.

0 favorites 0 likes
#arxiv

DiffScore: Text Evaluation Beyond Autoregressive Likelihood

arXiv cs.CL · 17h ago Cached

This paper introduces DiffScore, a text evaluation framework based on Masked Large Diffusion Language Models that addresses positional bias in autoregressive scoring by using masked reconstruction.

0 favorites 0 likes
#arxiv

Efficient LLM-based Advertising via Model Compression and Parallel Verification

arXiv cs.CL · 17h ago Cached

This paper presents an efficient LLM-based advertising framework using model compression and parallel verification, achieving over 1.8x speedup in real-world deployment at Baidu.

0 favorites 0 likes
#arxiv

BitLM: Unlocking Multi-Token Language Generation with Bitwise Continuous Diffusion

arXiv cs.CL · 17h ago Cached

This paper introduces BitLM, a language model that uses bitwise continuous diffusion to generate multiple tokens in parallel, aiming to overcome the sequential bottleneck of traditional autoregressive generation while preserving causal structure.

0 favorites 0 likes
#arxiv

Taming Extreme Tokens: Covariance-Aware GRPO with Gaussian-Kernel Advantage Reweighting

arXiv cs.CL · 17h ago Cached

This paper proposes a covariance-aware variant of Group Relative Policy Optimization (GRPO) that uses Gaussian-kernel advantage reweighting to stabilize training entropy and improve reasoning performance in large language models.

0 favorites 0 likes
#arxiv

An Empirical Study of Automating Agent Evaluation

arXiv cs.CL · 17h ago Cached

This paper introduces EvalAgent, a system that automates the evaluation of AI agents by encoding domain-specific expertise, addressing the limitations of standard coding assistants in this task. It also presents AgentEvalBench, a benchmark for testing evaluation pipelines, and demonstrates significant improvements in evaluation reliability.

0 favorites 0 likes
#arxiv

SOMA: Efficient Multi-turn LLM Serving via Small Language Model

arXiv cs.CL · 17h ago Cached

This paper introduces SOMA, a framework for efficient multi-turn LLM serving that uses small language models adapted via soft prompts and LoRA fine-tuning to reduce latency and cost.

0 favorites 0 likes
#arxiv

The Bicameral Model: Bidirectional Hidden-State Coupling Between Parallel Language Models

arXiv cs.CL · 17h ago Cached

This paper introduces the Bicameral Model, which couples two frozen language models through a trainable neural interface on their intermediate hidden states to enable continuous, concurrent coordination without serialized text exchanges. The approach demonstrates significant improvements in arithmetic and logic tasks by allowing an auxiliary model to operate tools in parallel with a primary model.

0 favorites 0 likes
Next →
← Back to home

Submit Feedback