Dimension-Level Intent Fidelity Evaluation for Large Language Models: Evidence from Structured Prompt Ablation
Summary
This paper introduces a dimension-level evaluation method for measuring intent fidelity in large language models using structured prompt ablation.
View Cached Full Text
Cached at: 05/15/26, 06:22 AM
# Dimension-Level Intent Fidelity Evaluation for Large Language Models: Evidence from Structured Prompt Ablation Source: [https://arxiv.org/abs/2605.14517](https://arxiv.org/abs/2605.14517) Bibliographic Tools ## Bibliographic and Citation Tools Bibliographic Explorer Toggle Code, Data, Media ## Code, Data and Media Associated with this Article Demos ## Demos Related Papers ## Recommenders and Search Tools About arXivLabs ## arXivLabs: experimental projects with community collaborators arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website\. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy\. arXiv is committed to these values and only works with partners that adhere to them\. Have an idea for a project that will add value for arXiv's community?[**Learn more about arXivLabs**](https://info.arxiv.org/labs/index.html)\.
Similar Articles
IntentGrasp: A Comprehensive Benchmark for Intent Understanding
This paper introduces IntentGrasp, a comprehensive benchmark for evaluating large language models' intent understanding capabilities, revealing poor performance across 20 tested models. It proposes Intentional Fine-Tuning (IFT) as a solution, which significantly improves model performance and demonstrates strong cross-domain generalizability.
Non-linear Interventions on Large Language Models
This paper introduces a general formulation of non-linear intervention for large language models, extending beyond the Linear Representation Hypothesis to manipulate features encoded along non-linear manifolds, and validates the approach on refusal bypass steering.
Persona-Assigned Large Language Models Exhibit Human-Like Motivated Reasoning
This paper investigates whether assigning personas to large language models induces human-like motivated reasoning, finding that persona-assigned LLMs show up to 9% reduced veracity discernment and are up to 90% more likely to evaluate scientific evidence in ways congruent with their induced political identity, with prompt-based debiasing largely ineffective.
Towards Intrinsic Interpretability of Large Language Models: A Survey of Design Principles and Architectures
A comprehensive survey reviewing recent advances in intrinsic interpretability for Large Language Models, categorizing approaches into five design paradigms: functional transparency, concept alignment, representational decomposability, explicit modularization, and latent sparsity induction. The paper addresses the challenge of building transparency directly into model architectures rather than relying on post-hoc explanation methods.
Decomposing the Basic Abilities of Large Language Models: Mitigating Cross-Task Interference in Multi-Task Instruct-Tuning
This paper proposes Badit, a method that decomposes large language model parameters into orthogonal high-singular-value LoRA experts to mitigate cross-task interference during multi-task instruction tuning.