Tag
This research paper investigates using preference optimization (ORPO, AlphaPO) on small language models like Llama-3.2-3B and Qwen-3-4B to align them with Stoic philosophy using micro-datasets. The study finds that while 300 examples can effectively encode Stoic virtues, small models still struggle with outward-facing cosmopolitan duties.