Dimension-Level Intent Fidelity Evaluation for Large Language Models: Evidence from Structured Prompt Ablation

arXiv cs.CL Papers

Summary

This paper introduces a dimension-level evaluation method for measuring intent fidelity in large language models using structured prompt ablation.

arXiv:2605.14517v1 Announce Type: new Abstract: Holistic evaluation scores capture overall output quality but do not distinguish whether a model reproduced the structural form of a user's request from whether it preserved the user's specific intent. We propose a dimension-level intent fidelity evaluation framework, applied here through a structured prompt ablation study across 2,880 outputs spanning three languages, three task domains, and six LLMs, that separately measures structural recovery and intent fidelity for each semantic dimension. This framework reveals a systematic structural-fidelity split: among Chinese-language outputs with complete paired scores, 25.7% received perfect holistic alignment scores (GA=5) while exhibiting measurable dimensional intent deficits; among English-language outputs, this proportion rose to 58.6%. Human evaluation confirmed that these split-zone outputs represent genuine quality deficits and that dimensional fidelity scores track human judgements more reliably than holistic scores do. A public-private decomposition of 2,520 ablation cells characterises when models successfully compensate for missing intent and when they fail, while proxy annotation distinguishes prior inferability from default recoverability. A weight-perturbation experiment shows that moderate misalignment is typically absorbed, whereas severe dimensional inversion is consistently harmful. These findings demonstrate that dimension-level intent fidelity evaluation is a necessary complement to holistic assessment when evaluating LLM outputs for user-specific tasks.
Original Article
View Cached Full Text

Cached at: 05/15/26, 06:22 AM

# Dimension-Level Intent Fidelity Evaluation for Large Language Models: Evidence from Structured Prompt Ablation
Source: [https://arxiv.org/abs/2605.14517](https://arxiv.org/abs/2605.14517)
Bibliographic Tools

## Bibliographic and Citation Tools

Bibliographic Explorer Toggle

Code, Data, Media

## Code, Data and Media Associated with this Article

Demos

## Demos

Related Papers

## Recommenders and Search Tools

About arXivLabs

## arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website\.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy\. arXiv is committed to these values and only works with partners that adhere to them\.

Have an idea for a project that will add value for arXiv's community?[**Learn more about arXivLabs**](https://info.arxiv.org/labs/index.html)\.

Similar Articles

IntentGrasp: A Comprehensive Benchmark for Intent Understanding

Hugging Face Daily Papers

This paper introduces IntentGrasp, a comprehensive benchmark for evaluating large language models' intent understanding capabilities, revealing poor performance across 20 tested models. It proposes Intentional Fine-Tuning (IFT) as a solution, which significantly improves model performance and demonstrates strong cross-domain generalizability.

Non-linear Interventions on Large Language Models

arXiv cs.CL

This paper introduces a general formulation of non-linear intervention for large language models, extending beyond the Linear Representation Hypothesis to manipulate features encoded along non-linear manifolds, and validates the approach on refusal bypass steering.

Persona-Assigned Large Language Models Exhibit Human-Like Motivated Reasoning

arXiv cs.CL

This paper investigates whether assigning personas to large language models induces human-like motivated reasoning, finding that persona-assigned LLMs show up to 9% reduced veracity discernment and are up to 90% more likely to evaluate scientific evidence in ways congruent with their induced political identity, with prompt-based debiasing largely ineffective.

Towards Intrinsic Interpretability of Large Language Models: A Survey of Design Principles and Architectures

arXiv cs.CL

A comprehensive survey reviewing recent advances in intrinsic interpretability for Large Language Models, categorizing approaches into five design paradigms: functional transparency, concept alignment, representational decomposability, explicit modularization, and latent sparsity induction. The paper addresses the challenge of building transparency directly into model architectures rather than relying on post-hoc explanation methods.