social-science

#social-science

Validating LLMs in social science: Epistemic threats and emerging norms

arXiv cs.CL ↗ · 2026-07-10 Cached

This paper analyzes validation practices for using LLMs as measurement instruments in social science, identifying epistemic threats and proposing emerging norms for robust validation.

0 favorites 0 likes

#social-science

From Blueprint to Reality: Modeling and Applying Putnam's Social Capital Theory with LLM-based Multi-agent Simulations

arXiv cs.CL ↗ · 2026-07-08 Cached

This paper introduces SocaSim, an LLM-based multi-agent simulation framework that models and applies Putnam's Social Capital Theory, enabling micro-level causal pathway analysis and human-agent alignment in collective-action scenarios.

0 favorites 0 likes

#social-science

Silicon Sampling via Cross-Survey Transfer

arXiv cs.AI ↗ · 2026-07-07 Cached

Proposes cross-survey transfer as a rigorous evaluation framework for LLM-based human survey simulation, finding that zero-shot LLMs achieve 52% accuracy on unseen items.

0 favorites 0 likes

#social-science

@Phoenixyin13: Top 10 Skills & Tools for Social Science Research! 1. Auto-Empirical-Research-Skills - Stanford team's self-developed 23k+ empirical research Agent Skills all-in-one package https://github.com/brycewan…

X AI KOLs Timeline ↗ · 2026-06-30 Cached

This article recommends the top 10 skills and tools for social science research, including Auto-Empirical-Research-Skills developed by the Stanford team, for using AI agents to conduct empirical research and write papers.

0 favorites 0 likes

#social-science

Beyond the Mean: Three-Axis Fidelity for Aligning LLM-Based Survey Simulators from Small Pilot Data

arXiv cs.CL ↗ · 2026-06-30 Cached

This paper introduces a three-axis fidelity framework (structural, marginal, individual) to evaluate how well LLMs can simulate survey responses from small pilot data. Using a COVID-19 misinformation survey, it compares prompting, rectification, and fine-tuning approaches, finding that fine-tuning offers balanced fidelity but with variation across subsamples.

0 favorites 0 likes

#social-science

Correct codes for the wrong reasons? validating LLMs as measurement instruments for theoretical constructs

arXiv cs.CL ↗ · 2026-06-30 Cached

This paper examines the gap between reliability and construct validity when using LLMs as coding instruments for theoretical constructs, and proposes grain calibration as a method to decompose constructs into clause-level components for more valid measurement.

0 favorites 0 likes

#social-science

@XAMTO_AI: Cranking out a top-tier journal paper in 20 minutes — this is no longer just talk.

X AI KOLs Timeline ↗ · 2026-06-19 Cached

Stanford REAP and CoPaper.AI have released Auto-Empirical Research Skills (AERS), an open-source toolkit with over 23,000 agent skills that automates the entire empirical research pipeline for social sciences, from topic selection to journal submission.

0 favorites 0 likes

#social-science

(Human) Attention Is (Still) All You Need: Human oversight makes AI-assisted social science reliable

arXiv cs.AI ↗ · 2026-06-12 Cached

This paper proposes that reliability in AI-assisted social science research depends on decision architecture—how cognitive labor is divided between humans and machines. Through a pre-specified factorial experiment, the authors show that an unconstrained multi-agent baseline fails in 72% of runs, while one organized with three architectural commitments (LLMs restricted to reasoning, deterministic data/estimation, and three human decision gates) fails in only 16%.

0 favorites 0 likes

#social-science

AI Coding Agents in Social Science: Methodologically Diverse, Empirically Consistent, Interpretively Vulnerable

arXiv cs.CL ↗ · 2026-06-11 Cached

This paper evaluates LLM-based coding agents (Claude Code and Codex) in social science analysis, finding they match or exceed human methodological diversity while remaining vulnerable to interpretation bias through verdict-layer manipulation.

0 favorites 0 likes

#social-science

AI Coding Agents Can Reproduce Social Science Findings

arXiv cs.CL ↗ · 2026-06-11 Cached

This paper introduces SocSci-Repro-Bench, a benchmark of 221 tasks to evaluate AI coding agents' ability to reproduce social science findings from original data and code. It finds that frontier agents like Claude Code and Codex can reproduce a large share of results, with Claude substantially outperforming Codex, and that results are not primarily driven by memorization.

0 favorites 0 likes

#social-science

LifeSentence: Language models can encode human life course trajectories from longitudinal panel data

arXiv cs.CL ↗ · 2026-06-11 Cached

LifeSentence finetunes a 24B-parameter language model on structured natural-language records from a longitudinal panel study (SOEP), achieving superior prediction of life outcomes and enabling counterfactual queries about human biographies.

0 favorites 0 likes

#social-science

When Better Codebooks Are Not Enough: Predictive Performance and Behavioral Reliability in LLM Political Event Coding

arXiv cs.CL ↗ · 2026-06-08 Cached

This paper studies whether expert codebooks for political event coding become more effective when operationalized into LLM-friendly forms, and finds that while performance improves, behavioral reliability under controlled perturbations does not fully translate.

0 favorites 0 likes

#social-science

A Comparative Evaluation of Structural Topic Models and BERTopic for Short, Open-Ended Survey Responses

arXiv cs.CL ↗ · 2026-05-25 Cached

This paper compares Structural Topic Models (STM) and BERTopic for analyzing short, open-ended survey responses, finding that BERTopic with contextual augmentation yields better topic coherence and interpretability, while STM offers stronger support for inferential covariate analysis.

0 favorites 0 likes

#social-science

Teaching AI Through Benchmark Construction: QuestBench as a Course-Based Practice for Accountable Knowledge Work

arXiv cs.AI ↗ · 2026-05-22 Cached

This paper presents QuestBench, a benchmark built by students to evaluate deep research systems across humanities and social science domains. Results show that even advanced systems like GPT-5.5 pass only 57.58% of questions, highlighting failures in trustworthiness.

0 favorites 0 likes

#social-science

Personality Engineering with AI Agents: A New Methodology for Negotiation Research

arXiv cs.AI ↗ · 2026-05-22 Cached

Introduces 'personality engineering,' a methodology using AI agents to parameterize, manipulate, and evaluate negotiator personality based on the interpersonal circumplex, enabling controlled experiments in negotiation theory.

0 favorites 0 likes

#social-science

Can Large Language Models Revolutionize Survey Research? Experiments with Disaster Preparedness Responses

arXiv cs.AI ↗ · 2026-05-20 Cached

This paper presents a five-stage framework integrating large language models into survey research, addressing declining response rates, sample bias, and fraudulent completions. Using 2024 Hurricane Milton survey data, the authors propose a theory-informed LLM (A-TLM) that outperforms classical imputation methods in missing-data scenarios and demonstrates manageable hallucination risk through grounded refusal.

0 favorites 0 likes

#social-science

Change My View? The Dynamics of Persuasion and Polarization in Online Discourse

arXiv cs.CL ↗ · 2026-05-12 Cached

This paper uses large language models to analyze persuasion dynamics and polarization in Reddit's r/ChangeMyView, finding that empathetic alignment increases belief change while frontal refutation diminishes it.

0 favorites 0 likes

#social-science

Designing Synthetic Discussion Generation Systems: A Case Study for Online Facilitation

arXiv cs.CL ↗ · 2026-04-20 Cached

This paper introduces Synthetic Discussion Generation (SDG), a novel NLP framework for creating simulated discussions to enable cost-effective pilot experiments in social science research. The authors demonstrate that smaller quantized models (7B-8B parameters) can produce effective simulations at 44x lower cost than proprietary models like GPT, and apply this framework to evaluate LLM facilitators in online discussions.

0 favorites 0 likes

#social-science

Scaling social science research

OpenAI Blog ↗ · 2026-02-13 Cached

OpenAI releases GABRIEL, an open-source toolkit that uses GPT to convert unstructured qualitative data (text, images) into quantitative measurements for social scientists and economists. The tool enables researchers to analyze large-scale qualitative datasets more efficiently by automating repetitive labeling tasks while preserving the richness of human data.

0 favorites 0 likes

#social-science

AI safety needs social scientists

OpenAI Blog ↗ · 2019-02-19 Cached

OpenAI argues that AI safety research on value alignment requires social scientists to help address how human cognitive biases and inconsistencies affect the data used to train AI systems. The organization proposes human-only experiments as a method to uncover alignment problems before deploying machine learning solutions.

0 favorites 0 likes

social-science

Submit Feedback