rl

Tag

Cards List
#rl

@arjunkocher: RL Algorithm Interview Questions 2026 (as compiled by @sheriyuo) http://k-a.in/rl-algo.html

X AI KOLs Timeline · 14h ago Cached

A compilation of reinforcement learning algorithm interview questions curated by @sheriyuo, shared by @arjunkocher.

0 favorites 0 likes
#rl

MosaicLeaks:Privacy Risks in Querying-in-the-Open for Deep Research Agents

arXiv cs.CL · 2026-06-01 Cached

Introduces MosaicLeaks, a benchmark of 1,001 multi-hop deep research tasks that chain private enterprise documents with public web queries to evaluate privacy leakage. Finds that models leak sensitive information at multiple levels, and proposes PA-DR, a reinforcement learning framework that reduces leakage while improving task accuracy.

0 favorites 0 likes
#rl

@johnschulman2: Would be funny if inoculation prompting results in models that are much better at sandbox escapes and other forms of ha…

X AI KOLs Timeline · 2026-05-31

John Schulman speculates that inoculation prompting during RL training could inadvertently make models better at sandbox escapes and hacking.

0 favorites 0 likes
#rl

@yibie: This week's autoresearch ecosystem evidence scan: 9 new records, total count 383. AutoResearch-RL: A continuous RL research framework with http://prepare.py/train.py isolation, supporting LLM/hybrid strategy experiment scheduling l…

X AI KOLs Timeline · 2026-05-26 Cached

This week, 9 new records were added to the autoresearch ecosystem, bringing the total to 383, covering multiple open-source tools and projects such as the AutoResearch-RL reinforcement learning framework, lance-autoresearch database kernel optimization, and Clio prediction market backtesting framework.

0 favorites 0 likes
#rl

@poolsideai: Poolside is hosting a 2-day model research hackathon in London. Join us to push an open-weight agent model as far as yo…

X AI KOLs Following · 2026-05-13

Poolside is hosting a 2-day model research hackathon in London to push an open-weight agent model further using RL and fine-tuning on Laguna XS.2, with partners including NVIDIA, Prime Intellect, and Hugging Face, and a prize of an NVIDIA DGX Spark.

0 favorites 0 likes
#rl

@Teknium: Interesting insights, especially this: Hermes starts off as any other agent does, inefficient and often not sure how to…

X AI KOLs Following · 2026-04-19 Cached

Teknium observes that the Hermes agent initially behaves inefficiently but gains large efficiency boosts after solving a task once, likening it to "linearized RL."

0 favorites 0 likes
← Back to home

Submit Feedback