Tag
This paper proposes Q-align DT, a framework that aligns return-to-go with Q-values to improve controllability and performance in offline reinforcement learning, achieving superior results on D4RL benchmarks.
The paper introduces SeDT, a training-free inference-time method that improves LLM reliability in multi-turn conversations by annotating conversation history with cumulative relevance scores from three signals, achieving up to +37.7% performance gains on the Lost-in-Conversation benchmark.
This paper introduces Guide, a framework that combines a Decision Transformer with Q-value guidance and an inverse dynamics module to balance exploration and safety in automated bidding for digital advertising, demonstrating effectiveness on public datasets and simulated auctions.