value-alignment

#value-alignment

Accounting for Context: Shaping Moral Credences for Value Alignment

arXiv cs.AI ↗ · 2026-06-08 Cached

This paper argues that aggregating moral evaluations for AI value alignment must account for contextual factors, showing that ignoring context can lead to violations of the weak Pareto principle, analogous to Simpson's paradox.

0 favorites 0 likes

#value-alignment

Greener Than Humans? Environmental Attitudes in Large Language Models

arXiv cs.CL ↗ · 2026-06-03 Cached

This paper develops a benchmark for evaluating environmental attitudes in 31 LLMs, finding they often exhibit progressive environmental views and contextual sensitivity, highlighting issues of steerability and normative reliability in sustainability applications.

0 favorites 0 likes

#value-alignment

To Land a Job in AI, Try Reading Kant

Wired ↗ · 2026-05-26 Cached

Philosophers are increasingly being hired by top AI labs like DeepMind and Anthropic to address ethical and alignment issues, while AI is also reshaping philosophy curricula at universities.

0 favorites 0 likes

#value-alignment

Omissive Bias in Religious Representation: Benchmarking LLM Answers to Everyday Ethical Decision-making

arXiv cs.LG ↗ · 2026-05-26 Cached

This paper introduces the AllFaith Religious Representation Benchmark to measure how often LLMs omit religious perspectives when answering everyday ethical questions, finding that models underrepresent religion compared to human expectations, especially in practical personal situations.

0 favorites 0 likes

#value-alignment

DVMap: Fine-Grained Pluralistic Value Alignment via High-Consensus Demographic-Value Mapping

arXiv cs.AI ↗ · 2026-05-15 Cached

This paper introduces DVMap, a framework for fine-grained pluralistic value alignment in LLMs that uses high-consensus demographic-value mapping instead of coarse national labels, achieving strong generalization across demographics, countries, and values.

0 favorites 0 likes

#value-alignment

From Descriptive to Prescriptive: Uncover the Social Value Alignment of LLM-based Agents

arXiv cs.AI ↗ · 2026-05-15 Cached

This paper proposes SoVA, a framework using GraphRAG to align LLM-based agents with human social values by converting psychological theories into prescriptive instructions. Experiments on the DAILYDILEMMAS benchmark show significant improvements over prompt-based baselines.

0 favorites 0 likes

value-alignment

Accounting for Context: Shaping Moral Credences for Value Alignment

Greener Than Humans? Environmental Attitudes in Large Language Models

To Land a Job in AI, Try Reading Kant

Omissive Bias in Religious Representation: Benchmarking LLM Answers to Everyday Ethical Decision-making

DVMap: Fine-Grained Pluralistic Value Alignment via High-Consensus Demographic-Value Mapping

From Descriptive to Prescriptive: Uncover the Social Value Alignment of LLM-based Agents

Submit Feedback