empirical-study

#empirical-study

Explanation Fairness in Large Language Models: An Empirical Analysis of Disparities in How LLMs Justify Decisions Across Demographic Groups

arXiv cs.CL ↗ · yesterday Cached

This paper introduces the Explanation Fairness Taxonomy (EFT) to analyze disparities in how LLMs justify decisions across demographic groups, finding significant biases in explanation quality and tone despite balanced decisions.

0 favorites 0 likes

#empirical-study

Feature Repulsion and Spectral Lock-in: An Empirical Study of Two-Layer Network Grokking

arXiv cs.LG ↗ · yesterday Cached

This empirical study validates theoretical findings on feature repulsion and spectral lock-in during the grokking phenomenon in two-layer neural networks, demonstrating how activation functions influence the transition from memorization to generalization.

0 favorites 0 likes

#empirical-study

The Many Faces of On-Policy Distillation: Pitfalls, Mechanisms, and Fixes

Hugging Face Daily Papers ↗ · 2d ago Cached

This paper presents a comprehensive empirical study on on-policy distillation for large language models, identifying failure mechanisms like distribution mismatch and optimization instability, and proposing fixes such as stop-gradient objectives and RLVR-adapted teachers.

0 favorites 0 likes

#empirical-study

More Is Not Always Better: Cross-Component Interference in LLM Agent Scaffolding

arXiv cs.AI ↗ · 5d ago Cached

This paper challenges the assumption that adding more scaffolding components to LLM agents always improves performance, demonstrating through systematic experiments that cross-component interference often leads to degradation. The study finds that simpler, task-specific subsets of components frequently outperform fully equipped 'all-in' agents across various model scales.

0 favorites 0 likes

#empirical-study

SWE-chat: Coding Agent Interactions From Real Users in the Wild

Hugging Face Daily Papers ↗ · 2026-04-22 Cached

SWE-chat introduces a 6,000-session dataset of real-world coding agent interactions, revealing that only 44% of agent-generated code survives in commits and highlighting inefficiencies and security issues in current AI-assisted development.

0 favorites 0 likes

#empirical-study

Agent READMEs: An Empirical Study of Context Files for Agentic Coding

Papers with Code Trending ↗ · 2025-11-17 Cached

This paper presents the first large-scale empirical study of agent context files (READMEs) used in agentic coding tools, analyzing their structure, maintenance patterns, and content. It highlights that while functional context is well-covered, non-functional requirements like security and performance are rarely specified.

0 favorites 0 likes

#empirical-study

Scaling laws for neural language models

OpenAI Blog ↗ · 2020-01-23 Cached

Foundational empirical study demonstrating power-law scaling relationships between language model performance and model size, dataset size, and compute budget, with implications for optimal training allocation and sample efficiency.

0 favorites 0 likes

empirical-study

Explanation Fairness in Large Language Models: An Empirical Analysis of Disparities in How LLMs Justify Decisions Across Demographic Groups

Feature Repulsion and Spectral Lock-in: An Empirical Study of Two-Layer Network Grokking

The Many Faces of On-Policy Distillation: Pitfalls, Mechanisms, and Fixes

More Is Not Always Better: Cross-Component Interference in LLM Agent Scaffolding

SWE-chat: Coding Agent Interactions From Real Users in the Wild

Agent READMEs: An Empirical Study of Context Files for Agentic Coding

Scaling laws for neural language models

Submit Feedback