Tag
InfoMem introduces a reward mechanism for training chunk-wise memory agents that evaluates final-memory utility using answer-conditioned information gain, improving long-context memory-agent performance under the same RL framework.
CorVer is a lightweight, corpus-grounded reward mechanism that uses Wikipedia co-occurrence statistics to provide efficient sentence-level feedback for reinforcement learning in factual question answering, outperforming neural verifiers while training 4.8 to 8.4x faster.
Geo-Align presents a reinforcement learning framework for camera-controlled video re-rendering that improves generalization through scale-aware perceptual rewards and metric 3D estimation for camera trajectory extraction.