Tag
This paper presents an overview of the QIAS 2026 shared task on Islamic inheritance reasoning, evaluating LLMs on multi-step legal and numerical reasoning using the MAWARITH benchmark.
This paper presents the participation of team PSL in the QIAS 2026 Shared Task on Arabic Islamic inheritance reasoning, comparing commercial and open-source large language models. Results show commercial models (e.g., Gemini 2.5 Flash) significantly outperform open-source models in structured legal reasoning with multi-step dependencies.
This paper introduces EP-HUBO, a quantum-inspired method that treats evidence selection in chain-of-thought reasoning as a combinatorial optimization problem, significantly improving performance on legal reasoning benchmarks like MMLU-Pro law and LEXam by allowing minority-but-correct hypotheses to override noisy majorities.
This paper introduces a relevance-sensitive evaluation suite for legal AI, demonstrating that LLMs are overly sensitive to legally irrelevant perturbations, and proposes LexGuard, an adversarial multi-agent framework using formal reasoning to improve legal reasoning reliability.
This paper empirically studies LLMs' legal reasoning in tax law, showing that data contamination inflates performance and that neuro-symbolic hybrid systems offer more reliable and robust generalization than monolithic LLMs.
This paper identifies a systematic gap between legal interpretation and formal logic in AI legal reasoning, proposes a neuro-symbolic approach to bridge it, and demonstrates substantial label shifts when re-annotating legal NLI data under strict formal entailment.
This paper presents Qatar University's multi-stage QLoRA fine-tuning approach on Qwen3-4B for Arabic Islamic inheritance reasoning, achieving 90% MIR-E score through domain adaptation on Islamic fatwa records followed by task-specific training on 12,000 structured inheritance cases, matching commercial systems like Gemini-2.5-flash with minimal computational resources.
VLegal-Bench is a cognitively grounded benchmark for evaluating large language models on Vietnamese legal reasoning tasks, containing 10,450 expert-annotated samples designed to address the gap in legal benchmarks for civil law systems. The benchmark assesses multiple levels of legal understanding through question answering, multi-step reasoning, and scenario-based problem solving, providing a replicable framework for evaluating LLMs in non-English, codified legal contexts.