empirical-study

#empirical-study

Are Time-Series Foundation Models Ready for E-Nose Data? An Empirical Assessment of Their Embeddings

arXiv cs.LG ↗ · 16h ago Cached

This paper systematically evaluates time-series foundation models (TSFMs) such as Chronos-2 and MOMENT on electronic nose (E-Nose) data for gas identification and concentration prediction. It finds that fine-tuning is necessary and that fusing TSFM embeddings with specialized models can improve performance.

0 favorites 0 likes

#empirical-study

Output-Space Allocation Costs for Calibration-Guided LLM Compression: An Empirical Study

arXiv cs.CL ↗ · 16h ago Cached

This paper empirically investigates whether aligning the allocation cost with the output-space objective improves compressed model fidelity in ROCKET, a training-free LLM compression method. Results show a trade-off between accuracy and perplexity, with effects more pronounced at higher compression ratios.

0 favorites 0 likes

#empirical-study

Constraint Tax in Open-Weight LLMs: An Empirical Study of Tool Calling Suppression Under Structured Output Constraints

arXiv cs.CL ↗ · 4d ago Cached

This paper identifies and analyzes 'tool suppression' in open-weight LLMs when both tool calling and JSON schema constraints are simultaneously enabled, proposing the Constraint Priority Inversion hypothesis and a mitigation strategy called Transparent Two-Pass Execution.

0 favorites 0 likes

#empirical-study

Training Dynamics of Neural Software Defect Predictors under Coupled Data-Quality Issues

arXiv cs.LG ↗ · 4d ago Cached

This paper investigates how training dynamics of neural networks for software defect prediction are affected by coupled data-quality issues such as class imbalance and overlap, proposing an interaction-aware empirical protocol.

0 favorites 0 likes

#empirical-study

To Run or Not to Run: Analyzing the Cost-Effectiveness of Code Execution in LLM-Based Program Repair

Hugging Face Daily Papers ↗ · 4d ago Cached

This paper empirically analyzes the cost-effectiveness of code execution in LLM-based program repair agents, finding that execution is used heavily but often indiscriminately, and that restricting execution can save significant cost with minimal impact on repair success.

0 favorites 0 likes

#empirical-study

DREG: A Layer-Wise Jacobian Regularization as a General-Purpose Penalty

arXiv cs.LG ↗ · 5d ago Cached

This paper presents a large-scale empirical study of the Derivative Regularization (DREG) penalty, showing it achieves high accuracy and noise robustness, particularly with GELU activation and data-scarce regimes, positioning it as a general-purpose plug-and-play regularizer for neural networks.

0 favorites 0 likes

#empirical-study

An Exploratory Case Study of LLM-Assisted Refactoring and Gameplay Feature Generation in an Endless Runner Game

Hugging Face Daily Papers ↗ · 2026-06-19 Cached

This paper presents an exploratory case study evaluating GPT-4o's ability to perform refactoring and generate gameplay features in an endless runner game, finding that refactoring tasks succeeded while feature generation tasks mostly failed.

0 favorites 0 likes

#empirical-study

HELP WITH RESEARCH: Observation - Semantically Dense Context Produces Strong Late-Layer Divergence Without Jailbreak Prompts [D]

Reddit r/MachineLearning ↗ · 2026-06-18

An empirical study demonstrating that long, semantically dense, benign text can shift a model's latent space and bypass alignment, causing it to generate otherwise blocked critiques. The author, a non-expert, requests an audit of their metrics to distinguish genuine semantic hijacking from artifacts.

0 favorites 0 likes

#empirical-study

The Impact of Google's Manifest Version 3 Update on Ad Blocker Effectiveness

Lobsters Hottest ↗ · 2026-06-17 Cached

This academic paper empirically investigates whether Google's transition from Manifest V2 to V3 in Chrome reduces ad blocker effectiveness, finding no statistically significant degradation and even slight improvements in anti-tracking for MV3 ad blockers.

0 favorites 0 likes

#empirical-study

From Parasocial Scripts to Dyadic Persistence in Autonomous AI-Agent Communities

arXiv cs.CL ↗ · 2026-06-17 Cached

This paper investigates whether parasocial interaction cues exist in online communities of autonomous AI agents, analyzing over 50,000 posts from Moltbook. The findings show that such cues are prevalent and strongly associated with sustained reciprocal interactions, providing empirical evidence for relationship-like dynamics among LLM-enabled agents.

0 favorites 0 likes

#empirical-study

Equity with Efficiency: An Empirical Study of Tokenizers for Multilingual Large Language Models

arXiv cs.CL ↗ · 2026-06-16 Cached

This paper systematically compares equitable tokenizers for multilingual LLMs across 11 Southeast Asian languages, finding that Parity-aware BPE achieves the best efficiency-equity trade-off and that cross-lingual fairness and tokenization efficiency are not fundamentally at odds.

0 favorites 0 likes

#empirical-study

Can Post-Training Turn LLMs into Good Medical Coders? An Empirical Study of Generative ICD Coding

arXiv cs.CL ↗ · 2026-06-15 Cached

This empirical study investigates whether post-training (supervised fine-tuning and reinforcement learning) can improve LLMs' performance on automated ICD coding, introducing a diagnostic curriculum called PHI that extends GRPO to refine missed-code cases. Results show that prompting-only evaluation underestimates LLM potential, with SFT providing the main capability jump and RL further improving performance.

0 favorites 0 likes

#empirical-study

Direct Preference Optimization for Chatbot Fine-Tuning: An Empirical Study

arXiv cs.CL ↗ · 2026-06-12 Cached

This paper presents an empirical study of Direct Preference Optimization (DPO) for fine-tuning a large language model, showing that DPO simplifies the training pipeline and achieves competitive performance while addressing training instability.

0 favorites 0 likes

#empirical-study

Which LoRA? An Empirical Study on the Effectiveness of LoRA Techniques During Multilingual Instruction Tuning

arXiv cs.CL ↗ · 2026-06-10 Cached

This paper empirically compares several LoRA variants for multilingual instruction tuning and finds no significant advantage of complex variants over basic LoRA in balancing cross-lingual transfer and knowledge retention.

0 favorites 0 likes

#empirical-study

Is Grep All You Need? How Agent Harnesses Reshape Agentic Search

Hacker News Top ↗ · 2026-06-09 Cached

This empirical study compares grep and vector retrieval strategies in LLM agent workflows, finding that grep generally yields higher accuracy across different agent harnesses and tool-calling styles, with performance heavily dependent on harness choice and context engineering.

0 favorites 0 likes

#empirical-study

@omarsar0: New paper on how AI agents are reshaping knowledge work. This is a nice economic read on where agents actually change k…

X AI KOLs Following ↗ · 2026-06-08 Cached

This study uses Perplexity production data to analyze how AI agents reshape knowledge work, finding that agents reduce time and cost by over 87%, improve quality, and expand the scope of automated tasks.

0 favorites 0 likes

#empirical-study

How AI Agents Reshape Knowledge Work: Autonomy, Efficiency, and Scope

arXiv cs.AI ↗ · 2026-06-08 Cached

This study uses production data from Perplexity to compare AI agents versus conversational assistants, finding that agents reduce completion time by 87% and costs by 94% while expanding the scope and quality of knowledge work.

0 favorites 0 likes

#empirical-study

Empirical Study on the Characteristics and Evolution of AI-usage in GitHub Repositories: Evidence from Code Comments

Hugging Face Daily Papers ↗ · 2026-06-05 Cached

This paper analyzes 35,361 GitHub code comments referencing AI use to develop a taxonomy of AI-assisted development activities, finding that developers primarily use LLMs for code implementation and enhancement, with subsequent human refactoring and bug fixes, and a temporal shift toward conceptual support over direct code generation.

0 favorites 0 likes

#empirical-study

Do Real-World Datasets Contain Natural Experiments? An Empirical Study Using Causal Feature Selection

arXiv cs.AI ↗ · 2026-06-03 Cached

This paper investigates whether real-world datasets contain natural experiments by using causal discovery and feature selection, finding that they do and can improve model performance.

0 favorites 0 likes

#empirical-study

@ComputerPapers: Offloading Score: Measuring AI Reliance Through Counterfactual Workflows Vishakh Padmakumar, Lujain Ibrahim, Zora Zhiru…

X AI KOLs Following ↗ · 2026-05-29 Cached

The paper introduces the offloading score, a metric that measures AI reliance by quantifying the fraction of cognitive effort offloaded to an AI tool using counterfactual workflows. It is validated through intrinsic evaluations and a user study with developers, showing it detects increased reliance under time pressure better than existing measures.

0 favorites 0 likes

empirical-study

Submit Feedback