Tag
This paper argues that aggregating moral evaluations for AI value alignment must account for contextual factors, showing that ignoring context can lead to violations of the weak Pareto principle, analogous to Simpson's paradox.
This paper develops a benchmark for evaluating environmental attitudes in 31 LLMs, finding they often exhibit progressive environmental views and contextual sensitivity, highlighting issues of steerability and normative reliability in sustainability applications.
Philosophers are increasingly being hired by top AI labs like DeepMind and Anthropic to address ethical and alignment issues, while AI is also reshaping philosophy curricula at universities.
This paper introduces the AllFaith Religious Representation Benchmark to measure how often LLMs omit religious perspectives when answering everyday ethical questions, finding that models underrepresent religion compared to human expectations, especially in practical personal situations.
This paper introduces DVMap, a framework for fine-grained pluralistic value alignment in LLMs that uses high-consensus demographic-value mapping instead of coarse national labels, achieving strong generalization across demographics, countries, and values.
This paper proposes SoVA, a framework using GraphRAG to align LLM-based agents with human social values by converting psychological theories into prescriptive instructions. Experiments on the DAILYDILEMMAS benchmark show significant improvements over prompt-based baselines.