Tag
This paper proposes a unified geometric framework for understanding concept learning and neuron interpretation in sparse autoencoders, formalizing concepts as sets and defining detection, separation, and approximation. It provides error bounds, capacity constraints, and links to formal concept analysis, with experiments on synthetic data.
OpenAI presents a technique using energy functions to enable agents to learn and extract abstract concepts (visual, spatial, temporal, social) from tasks, then transfer these concepts to solve related tasks in different domains without retraining. The approach uses energy-based models with neural networks to perform both generation and recognition of concepts.
OpenAI presents a machine teaching approach where a teacher neural network learns to select the most illustrative examples to teach a student network to recognize concepts, producing interpretable results by grounding examples in human-understandable properties rather than arbitrary feature encodings.