Tag
This paper investigates memory-efficient meta-reinforcement learning architectures for adaptive safety-critical control in adversarial spacecraft proximity operations, finding that state space models like Mamba with PPO achieve superior task completion, safety, and fuel savings compared to LSTM and GRU.
The paper introduces Self-Consolidating Language Models (SCoL), a framework that uses meta-reinforcement learning to write current context into model weights for continual knowledge incorporation. It demonstrates improved acquisition and retention over baselines in both QA and long-context consolidation tasks.
OpenAI researchers introduce E-MAML and E-RL², two meta-reinforcement learning algorithms designed to improve exploration in tasks where discovering optimal policies requires significant exploration. The work demonstrates these algorithms' effectiveness on novel environments including Krazy World and maze tasks.