Tag
An OpenAI research preview explores learning from how people interact with their computers beyond chat, accompanied by a new arxiv paper on the topic.
WildFeedback is a novel framework that leverages in-situ user feedback from actual LLM conversations to automatically create preference datasets for aligning language models with human preferences, addressing scalability and bias issues in traditional annotation-based alignment methods.
OpenAI trained language models to write critiques of text summaries, helping human evaluators spot flaws more effectively — a step toward scalable oversight of AI systems on difficult tasks. The work explores how AI-assisted feedback can improve human evaluation quality as a proof of concept for alignment research.
OpenAI presents a scalable alignment technique using hierarchical summarization of entire books with human feedback, demonstrating how models can be trained to act in accordance with human intentions on complex, difficult-to-evaluate tasks.
OpenAI demonstrates a technique for improving language model summarization by training a reward model on human preferences and fine-tuning models with reinforcement learning, achieving significant quality improvements that generalize across datasets. This work advances model alignment through human feedback at scale, with applications beyond summarization.
OpenAI demonstrates fine-tuning GPT-2 (774M parameters) using human preference feedback for text continuation and summarization tasks, requiring 5k labels for stylistic tasks and 60k for summarization, with models achieving 86-88% human preference rates though revealing labeler heuristic exploitation.
OpenAI presents iterated amplification, a method for training AI systems on complex tasks by recursively decomposing them into smaller subtasks that humans can judge and solve, building up training signals from scratch through iterative composition.
OpenAI releases RL-Teacher, an open-source tool for training AI systems through human feedback instead of hand-crafted reward functions, with applications to safe AI development and complex reinforcement learning problems.
OpenAI presents a method for training AI agents using human preference feedback, where an agent learns reward functions from human comparisons of behavior trajectories and uses reinforcement learning to optimize for the inferred goals. The approach demonstrates strong sample efficiency, requiring less than 1000 bits of human feedback to train an agent to perform a backflip.