llm-robustness

#llm-robustness

Truth or Sophistry? LoFa: A Benchmark for LLM Robustness Against Logical Fallacies

arXiv cs.CL ↗ · yesterday Cached

Introduces LoFa, a comprehensive benchmark to evaluate LLM robustness against logical fallacies in persuasive contexts, featuring a multi-agent pipeline and a multi-round debate framework.

0 favorites 0 likes

#llm-robustness

LPDS: Evaluating LLM Robustness Through Logic-Preserving Difficulty Scaling

arXiv cs.LG ↗ · 2026-05-18 Cached

Introduces LPDS, a framework to systematically evaluate LLM robustness by scaling difficulty of logic-preserving variations, finding that performance drops up to 5x compared to random sampling and that training on harder variations improves robustness.

0 favorites 0 likes

#llm-robustness

The Text Uncanny Valley: Non-Monotonic Performance Degradation in LLM Information Retrieval

arXiv cs.CL ↗ · 2026-05-11 Cached

This research paper investigates the 'Text Uncanny Valley,' a phenomenon where LLM performance in information retrieval tasks degrades non-monotonically as word-boundary corruption increases. The authors propose a mode transition hypothesis to explain this U-shaped performance curve and demonstrate its relevance to real-world noisy text inputs.

0 favorites 0 likes

#llm-robustness

Information Theoretic Adversarial Training of Large Language Models

arXiv cs.LG ↗ · 2026-05-08 Cached

This paper introduces WARDEN, a distributionally robust adversarial training framework for large language models that uses f-divergence to dynamically reweight adversarial examples, significantly reducing attack success rates while maintaining computational efficiency.

0 favorites 0 likes

#llm-robustness

Fragile Thoughts: How Large Language Models Handle Chain-of-Thought Perturbations

arXiv cs.CL ↗ · 2026-04-20 Cached

This paper presents a comprehensive empirical evaluation of how large language models handle corruptions in chain-of-thought reasoning steps, testing 13 models across 5 perturbation types (MathError, UnitConversion, Sycophancy, SkippedSteps, ExtraSteps) on mathematical reasoning tasks. The findings reveal heterogeneous vulnerability patterns with implications for deploying LLMs in multi-stage reasoning pipelines.

0 favorites 0 likes

#llm-robustness

Why Fine-Tuning Encourages Hallucinations and How to Fix It

arXiv cs.CL ↗ · 2026-04-20 Cached

This paper investigates how supervised fine-tuning (SFT) increases hallucinations in LLMs by causing knowledge degradation and proposes a self-distillation-based method to mitigate this issue while preserving pre-existing factual knowledge. The authors identify semantic interference among overlapping representations as the primary mechanism behind SFT-induced hallucinations and demonstrate solutions including parameter freezing and self-distillation.

0 favorites 0 likes

llm-robustness

Truth or Sophistry? LoFa: A Benchmark for LLM Robustness Against Logical Fallacies

LPDS: Evaluating LLM Robustness Through Logic-Preserving Difficulty Scaling

The Text Uncanny Valley: Non-Monotonic Performance Degradation in LLM Information Retrieval

Information Theoretic Adversarial Training of Large Language Models

Fragile Thoughts: How Large Language Models Handle Chain-of-Thought Perturbations

Why Fine-Tuning Encourages Hallucinations and How to Fix It

Submit Feedback