Tag
This paper introduces SODA, a generalization of Optimistic Dual Averaging that unifies various modern optimizers like Muon and Lion. It proposes a practical wrapper that improves performance across different scales without requiring additional hyperparameter tuning for weight decay.
This paper introduces CORE, a new knowledge graph completion model that uses cyclic orthotope relation embeddings on a torus manifold to address boundary constraints in region-based models. Experiments show competitive performance in link prediction tasks.
This paper proposes Spectra, a method using spectral occupancy to analyze and control the realized capacity of latent graph models, arguing that rank is not equivalent to model capacity.
This paper introduces HEPA, a self-supervised architecture for predicting rare critical events in time series using a Joint-Embedding Predictive Architecture (JEPA) pretraining strategy. It demonstrates superior performance across multiple domains with significantly fewer labeled data and tuned parameters compared to leading models.
The article introduces Newton's Lantern, a reinforcement learning framework for finetuning warm start models to solve the AC power flow problem more efficiently, particularly near voltage collapse.
This paper introduces Trust Region Inverse Reinforcement Learning (TRIRL), a method that combines monotonic dual improvement with efficient local policy updates to outperform state-of-the-art imitation learning methods. It addresses the trade-off between stability and computational cost in IRL by using trust-region constraints.
This paper introduces ACSAC, a reinforcement learning method that uses an adaptive chunk size actor-critic algorithm with a causal Transformer Q-network to handle long-horizon, sparse-reward tasks. It demonstrates state-of-the-art performance on manipulation tasks by dynamically adjusting action chunk sizes based on state-dependent needs.
This article introduces SkillGen, a multi-agent framework that synthesizes and verifies reusable inference-time skills for LLM agents by contrasting successful and failed trajectories. The method ensures skills are auditable and empirically verified for their net positive impact on agent performance.
This paper introduces Trajectory Matching Policy Optimization (TMPO), a method for aligning diffusion models that addresses reward hacking and visual mode collapse by matching trajectory-level reward distributions rather than maximizing scalar rewards.
This paper introduces xi-DPO, a novel preference optimization method that reformulates the objective to minimize distance to optimal ratio reward margins, addressing hyperparameter tuning challenges in SimPO. Experimental results show that xi-DPO outperforms existing methods on open benchmarks.
This paper introduces LEAP, a training-free method to accelerate inference in Diffusion Language Models (dLLMs) by detecting early-converging tokens, reducing denoising steps by 30% without losing accuracy.
This paper introduces HMH, a hierarchical multi-scale Graph Neural Network framework designed to address oversmoothing and oversquashing in heterophilous graphs. It utilizes spectral filters with Haar bases to achieve scalable learning and improved performance on node and graph classification tasks.
This paper investigates whether probabilistic calibration in language models can be improved through fine-tuning, comparing soft-target and hard-target methods across 12 models. The results show that calibration is a trainable capability, though gains sometimes reduce downstream arithmetic reasoning capabilities.
This paper introduces DiffScore, a text evaluation framework based on Masked Large Diffusion Language Models that addresses positional bias in autoregressive scoring by using masked reconstruction.
This paper presents an efficient LLM-based advertising framework using model compression and parallel verification, achieving over 1.8x speedup in real-world deployment at Baidu.
This paper introduces BitLM, a language model that uses bitwise continuous diffusion to generate multiple tokens in parallel, aiming to overcome the sequential bottleneck of traditional autoregressive generation while preserving causal structure.
This paper proposes a covariance-aware variant of Group Relative Policy Optimization (GRPO) that uses Gaussian-kernel advantage reweighting to stabilize training entropy and improve reasoning performance in large language models.
This paper introduces EvalAgent, a system that automates the evaluation of AI agents by encoding domain-specific expertise, addressing the limitations of standard coding assistants in this task. It also presents AgentEvalBench, a benchmark for testing evaluation pipelines, and demonstrates significant improvements in evaluation reliability.
This paper introduces SOMA, a framework for efficient multi-turn LLM serving that uses small language models adapted via soft prompts and LoRA fine-tuning to reduce latency and cost.
This paper introduces the Bicameral Model, which couples two frozen language models through a trainable neural interface on their intermediate hidden states to enable continuous, concurrent coordination without serialized text exchanges. The approach demonstrates significant improvements in arithmetic and logic tasks by allowing an auxiliary model to operate tools in parallel with a primary model.