natural-language-inference

#natural-language-inference

BioDivergence: A Benchmark and Evaluation Framework for Hidden Contextual Contradictions in Biomedical Abstracts

arXiv cs.CL ↗ · 4d ago Cached

Introduces BioDivergence, a benchmark and evaluation framework for detecting context-conditioned contradictions in biomedical abstracts, featuring a six-class conflict taxonomy and a silver dataset of 11,865 claim pairs.

0 favorites 0 likes

#natural-language-inference

SHALA-LLM: Smartly Handling Ambiguous Labels in Aligning LLMs

arXiv cs.LG ↗ · 2026-06-05 Cached

SHALA-LLM is a reinforcement learning framework that enables LLMs to learn directly from annotator distributions and dynamically prioritize highly ambiguous samples during alignment, improving agreement with human label distributions and classification performance.

0 favorites 0 likes

#natural-language-inference

Multi-Granularity Reasoning for Natural Language Inference

arXiv cs.CL ↗ · 2026-06-05 Cached

Proposes a Multi-Granularity Reasoning Network (MGRN) that explicitly leverages hierarchical semantic features for natural language inference, outperforming strong baselines on multiple benchmarks.

0 favorites 0 likes

#natural-language-inference

SEA-NLI: Natural Language Inference as a Lens into Southeast Asian Cultural Understanding

arXiv cs.CL ↗ · 2026-06-03 Cached

Introduces SEA-NLI, a culturally grounded NLI benchmark covering eight Southeast Asian countries, revealing low performance of LLMs on culturally specific knowledge, especially in languages and science/technology. Shows that culture-aware prompting helps but chain-of-thought offers limited gains.

0 favorites 0 likes

#natural-language-inference

Sample-Size Scaling of the African Languages NLI Evaluation

arXiv cs.CL ↗ · 2026-06-03 Cached

This paper examines the effect of labeled data size on natural language inference performance for 16 African languages using the AfriXNLI benchmark. The results show that scaling behavior is language-sensitive and often non-monotonic, challenging the common assumption of monotonic improvement, and emphasizing the need for language-specific dataset creation and stronger multilingual strategies.

0 favorites 0 likes

#natural-language-inference

Same Patient, Different Words, Different Diagnosis? Evaluating Semantic Stability in Clinical LLMs

arXiv cs.CL ↗ · 2026-06-01 Cached

This paper proposes a semantic verification framework using Natural Language Inference (NLI) to evaluate the sensitivity of clinical LLMs to meaning-preserving prompt variations, introducing metrics such as MVS, ΔC, and WCI. Results show that domain specialization does not consistently improve robustness, with both domain-specific and general-purpose models showing mixed performance.

0 favorites 0 likes

#natural-language-inference

LLMBridge: An LLM Pipeline for End-to-end Referential Bridging Resolution in English

arXiv cs.CL ↗ · 2026-05-29 Cached

LLMBridge introduces an LLM-based pipeline for end-to-end referential bridging resolution, achieving state-of-the-art performance on three English datasets. The system combines heuristic pre/post-processing with LLM natural language inference.

0 favorites 0 likes

#natural-language-inference

Product-of-Experts Training Reduces Dataset Artifacts in Natural Language Inference

arXiv cs.CL ↗ · 2026-04-22 Cached

This paper proposes Product-of-Experts (PoE) training to reduce dataset artifacts in Natural Language Inference, downweighting examples where biased models are overconfident. PoE nearly preserves accuracy on SNLI (89.10% vs. 89.30%) while reducing bias reliance by ~4.85 percentage points.

0 favorites 0 likes

#natural-language-inference

When Informal Text Breaks NLI: Tokenization Failure, Distribution Shift, and Targeted Mitigations

arXiv cs.CL ↗ · 2026-04-21 Cached

This paper investigates how informal text (slang, emoji, Gen-Z filler tokens) degrades NLI accuracy in ELECTRA-small and RoBERTa-large models, identifying two distinct failure mechanisms—tokenization failure (emoji mapped to [UNK]) and distribution shift (out-of-domain noise tokens)—and proposes targeted mitigations that recover accuracy without harming clean-text performance.

0 favorites 0 likes

natural-language-inference

Submit Feedback