QIAS 2026: Overview of the Shared Task on Islamic Inheritance Reasoning

arXiv cs.CL 06/15/26, 04:00 AM Papers
islamic-inheritance legal-reasoning large-language-models benchmark arabic-nlp reasoning shared-task
Summary
This paper presents an overview of the QIAS 2026 shared task on Islamic inheritance reasoning, evaluating LLMs on multi-step legal and numerical reasoning using the MAWARITH benchmark.
arXiv:2606.13756v1 Announce Type: new Abstract: This paper presents a comprehensive overview of the QIAS 2026 shared task, organized as part of the OSACT7 Workshop and co-located with LREC 2026. The shared task was designed to evaluate the ability of large language models to perform complex reasoning in the religious and legal domain of Islamic inheritance. Unlike conventional question-answering benchmarks, QIAS 2026 focuses on end-to-end reasoning from natural language cases, requiring systems to perform the full inheritance calculation process, from identifying the eligible heirs to assigning the correct share to each beneficiary. To support this evaluation, the task was based on the MAWARITH benchmark, a dataset of $12{,}500$ Arabic inheritance cases annotated with intermediate reasoning steps and final answers. System submissions were evaluated using MIR-E, a multi-step metric that measures performance across the main stages of inheritance reasoning. A total of $16$ teams participated in the shared task, investigating a range of approaches, including prompting-based methods, retrieval-augmented generation, and fine-tuning strategies. The results show that Islamic inheritance remains a highly challenging benchmark for current language models, especially in stages that require precise legal interpretation and structured numerical reasoning. This overview summarizes the task design, dataset, evaluation framework, participating systems, and main results.
Original Article
View Cached Full Text
Cached at: 06/15/26, 08:56 AM
# QIAS 2026: Overview of the Shared Task on Islamic Inheritance Reasoning
Source: [https://arxiv.org/html/2606.13756](https://arxiv.org/html/2606.13756)
###### Abstract

This paper presents a comprehensive overview of the QIAS 2026 shared task, organized as part of the OSACT7 Workshop and co\-located with LREC 2026\. The shared task was designed to evaluate the ability of large language models to perform complex reasoning in the religious and legal domain of Islamic inheritance\. Unlike conventional question\-answering benchmarks, QIAS 2026 focuses on end\-to\-end reasoning from natural language cases, requiring systems to perform the full inheritance calculation process, from identifying the eligible heirs to assigning the correct share to each beneficiary\. To support this evaluation, the task was based on the MAWARITH benchmark, a dataset of

12,50012\{,\}500Arabic inheritance cases annotated with intermediate reasoning steps and final answers\. System submissions were evaluated using MIR\-E, a multi\-step metric that measures performance across the main stages of inheritance reasoning\. A total of

1616teams participated in the shared task, investigating a range of approaches, including prompting\-based methods, retrieval\-augmented generation, and fine\-tuning strategies\. The results show that Islamic inheritance remains a highly challenging benchmark for current language models, especially in stages that require precise legal interpretation and structured numerical reasoning\. This overview summarizes the task design, dataset, evaluation framework, participating systems, and main results\.

Keywords:multi\-step reasoning, Islamic inheritance reasoning, Arabic language processing

\\NAT@set@cites

QIAS 2026: Overview of the Shared Task on Islamic Inheritance Reasoning

Abdessalam BOUCHEKIF1, Somaya ELTANBOULY1, Samer RASHWANI1,Shahd GABEN1, Mutaz AL\-KHATIB1, Heba SBAHI1,Emad MOHAMED2, Mohammed GHALY11Hamad bin Khalifa University, Qatar2Nazarbayev University, Kazakhstan\{abouchekif,seltanbouly,srashwani,sgaben,malkhatib,hsbahi,mghaly\}@hbku\.edu\.qaemad\.mohamed@nu\.edu\.kzAbstract content

## 1\. Introduction

Large language models \(LLMs\) have recently achieved strong performance across a wide range of natural language processing tasks, including question answering, summarization, and complex text generation\. Their success has been particularly evident in tasks that benefit from broad linguistic coverage and large\-scale pretraining\. However, LLMs still face important challenges in specialized domains that require precise reasoning, structured decision\-making, and strict adherence to domain\-specific rules\. These limitations become more apparent in tasks that require a sequence of dependent reasoning steps, where an error at an early stage can propagate and compromise the final answer\. This issue is especially important in religious and legal domains, where reasoning is not only knowledge\-intensive but also constrained by formal principles and interpretive traditions\.

In Islamic studies, and particularly in Islamic law, systems must reason over authoritative and highly structured sources such as the Qur’an, Hadith, and juristic writings\. They must also operate within a framework based on clear legal principles and, in some cases, different interpretations across schools of thought\. As a result, evaluating LLMs in such contexts requires benchmarks that go beyond surface\-level question answering and instead test the ability of models to produce precise and faithful reasoning\. Islamic inheritance law \(‘ilm al\-mawārīth\) serves as an effective testbed for evaluating reasoning capabilities\. This domain of Islamic jurisprudence requires multi\-step legal and numerical reasoning to resolve cases\. A valid solution must identify eligible heirs, determine which heirs are excluded or blocked, assign appropriate shares, assess the need for adjustments, and calculate the final distribution\. The process is governed by strict jurisprudential rules, and certain cases present additional complexities such as‘awlandradd\. Due to its combination of legal interpretation, structured reasoning, and precise calculation, Islamic inheritance law offers a highly informative benchmark for assessing the reasoning abilities of modern language models\. To support research in this direction, recent work introduced MAWARITHBouchekifet al\.\([2026](https://arxiv.org/html/2606.13756#biba.bib1)\), a large\-scale benchmark of 12,500 Arabic inheritance cases designed for end\-to\-end reasoning from natural language descriptions\. The benchmark includes detailed intermediate reasoning steps and final answers, making it possible to evaluate not only the correctness of the final output but also the validity of the reasoning process itself\. This resource created an opportunity to move beyond standard multiple\-choice evaluation and toward a more realistic setting in which systems must solve inheritance cases as humans would read and understand them\.

The QIAS 2026 Shared Task was designed to evaluate whether participating systems can perform end\-to\-end Islamic inheritance reasoning in Arabic\. It focuses on solving full cases from natural language, covering the complete reasoning process from heir identification to final share allocation\. It also examines whether recent reasoning\-oriented LLMs, such as Gemini, GPT, DeepSeek, Fanar, and Qwen, can transfer their strong performance on mathematical and synthetic benchmarks to the more complex domain of structured legal and religious reasoning\.

In this paper, we present an overview of the QIAS 2026 Shared Task\. We describe the task, the MAWARITH benchmark used in the evaluation, the MIR\-E multi\-step evaluation metric, the participating systems, and the main results obtained by the submitted approaches\. We also discuss key lessons learned from the shared task and highlight the main challenges that current systems still face in Islamic inheritance reasoning\.

## 2\. Task Description

The QIAS 2026 Shared Task focuses on the end\-to\-end automation of Islamic inheritance reasoning \(‘ilm al\-mawārīth\) from natural language\. The task requires systems to process Arabic inheritance cases and generate a complete, structured solution\. For each input question, systems must produce a detailed step\-by\-step reasoning trace \(<think\>\), followed by a concise final answer \(<answer\>\)\.

The task is formulated as a sequence of dependent reasoning stages\. Systems are expected to identify and explicitly report the following components:

1. 1\.Identification of all mentioned heirs, including determining which are eligible or excluded based on applicable blocking rules \(hajb\)\.
2. 2\.Assignment of the correct legal shares \(furūḍ\) to entitled heirs according to classical Islamic inheritance jurisprudence, following the majority opinion \(al\-jumhūr\)\.
3. 3\.Determination of whether a global adjustment to the distribution is required\.
4. 4\.Computation of the final estate distribution, including cases where adjustments such as‘awl\(proportional reduction\) orradd\(return of the residue\) apply\.

Participants must submit outputs in a structured format \(e\.g\., JSON\) that captures these reasoning steps\. This representation enables fine\-grained evaluation and allows analysis of different error types, such as legal reasoning errors versus numerical computation errors\.

## 3\. Data

The QIAS 2026 data come from the MAWARITH benchmark introduced byBouchekifet al\.\([2026](https://arxiv.org/html/2606.13756#biba.bib1)\)\. It contains 12,500 cases written in Arabic and follows the majority opinion in Islamic inheritance law \(al\-jumhūr\)\. Table[1](https://arxiv.org/html/2606.13756#S3.T1)shows the distribution of cases across the training and test splits, as well as the complexity categories covered in the benchmark\.

Table 1:Distribution of inheritance cases by legal complexityEach case describes a complete inheritance situation in natural language\. The system must identify the heirs mentioned in the case, determine which heirs are excluded by blocking rules, assign the correct legal shares, decide whether an adjustment is needed, and compute the final distribution of the estate\. In this way, the benchmark does not test only final answers, but the full reasoning process required to solve an inheritance case correctly\. The dataset covers a wide range of family relations found in classical Islamic inheritance law, including parents, children, spouses, siblings, grandparents, uncles, nephews, and other extended relatives\. TheMAWARITHdataset was built in several stages\. First, inheritance cases were generated using the Almawarith calculator, which allows users to define heirs and their numbers through a structured interface and then produces the corresponding shares\. This step provided a reliable foundation and helped ensure the correctness of the legal and numerical outcomes\. Since the goal ofMAWARITHis to evaluate reasoning from natural language, these structured cases were then rewritten as fluent Arabic inheritance questions, closer to real user queries\. The outputs were then reviewed and enriched by an expert in Islamic studies, who added detailed legal and numerical explanations for each case\. These explanations cover the main stages of inheritance reasoning, including heir identification, blocking rules, share assignment, and adjustment cases such as‘awlandraddwhen needed\. To improve consistency, the expert\-written explanations were standardized with the support of Gemini\-2\.5\-Flash, while keeping the legal reasoning unchanged\. Finally, the dataset was carefully validated to ensure consistency between the question, the reasoning steps, and the final inheritance shares\.

### 3\.1\. MIR\-E: Mawarith Inheritance Reasoning Evaluation

Table 2:Affiliations of teams that participated in the test phase and submitted a paper to QIAS 2026\.The QIAS 2026 shared task uses MIR\-E \(Mawarith Inheritance Reasoning Evaluation\)Bouchekifet al\.\([2026](https://arxiv.org/html/2606.13756#biba.bib1)\), a weighted multi\-stage metric for evaluating both intermediate reasoning steps and final outputs in Islamic inheritance problems\. Unlike standard evaluation based only on the final answer, MIR\-E provides a more fine\-grained assessment by scoring the main stages of the reasoning process\. It includes four components:

1. 1\.Heirs and Blocking \(ShS\_\{h\}\):evaluates whether the model correctly identifies the effective heirs, the blocked heirs, and their counts\.
2. 2\.Share Assignment \(SsS\_\{s\}\):measures whether the assigned shares for the eligible heirs are correct\.
3. 3\.Adjustment \(SaS\_\{a\}\):checks whether the model predicts the correct adjustment type \(none,‘awl, orradd\), and is scored only if the previous two stages are fully correct\.
4. 4\.Final Distribution \(SfS\_\{f\}\):evaluates whether the model produces the correct final distribution after the full inheritance process\.

## 4\. Results and Discussion

Table 3:Official leaderboard results for QIAS 2026 for teams that participated in the test phase and submitted a paperA total of 16 teams participated in the final phase\. Table 2 summarizes the affiliations of the teams that participated in the test phase and submitted a paper\. We provide a baseline implementation using Fanar\-Sadiq, a modern Arabic large language model accessible via API\. This baseline relies exclusively on prompting techniques, without any fine\-tuning\. The goal is to provide a simple yet effective reference point for evaluating model performance\. The dataset and baseline code are publicly available online\.111[https://gitlab\.com/islamgpt1/qias\_shared\_task\_2026](https://gitlab.com/islamgpt1/qias_shared_task_2026)Overall, the submitted systems illustrate three main methodological directions:\(i\)end\-to\-end reasoning with prompting alone,\(ii\)fine\-tuned models specialized for the task, and\(iii\)hybrid pipelines that combine LLM\-based language understanding with deterministic symbolic reasoning\. This diversity of approaches makes the shared task especially useful for comparing different strategies for Arabic legal reasoning\. The participating systems explored different ways of solving Islamic inheritance cases\. The most common approach explored by participants was prompting\-based end\-to\-end reasoning with large language models\. Team PSLMouhoub and Bouchekif \([2026](https://arxiv.org/html/2606.13756#biba.bib44)\)followed this setting and evaluated several models, including Gemini 2\.5 Flash, Qwen3\-32B, GPT\-oss\-120B, Llama\-3\.3\-70B, Fanar\-Sadiq, and Fanar\-C\-2\-27B\. Team KMSAlkhamis \([2026](https://arxiv.org/html/2606.13756#biba.bib43)\)explored a similar direction and also evaluated Gemini 2\.5 Pro and Mistral\. Overall, the two teams reported similar performance trends: commercial models were generally more reliable, while open\-weight models showed weaker results on this task\.

Team CVPDSwailehet al\.\([2026](https://arxiv.org/html/2606.13756#biba.bib40)\), which achieved the best result in the shared task, proposed a RAG\-based pipeline designed to generate outputs that match the MIR\-E evaluation format\. Their system built a knowledge base from synthetic legal question\-answer pairs, with the option to include books, then retrieved relevant context for each Arabic inheritance question before generating a single JSON output\. This output followed the required schema and included eligible heirs, blocked heirs, legal shares, the adjustment type when applicable, and the finaltaṣīldistribution\. The system also included parsing and validation steps to ensure compatibility with the official evaluator\. This approach showed that combining retrieval, controlled generation, and structured output constraints can be highly effective for this task\.

Team SilahKurdiet al\.\([2026](https://arxiv.org/html/2606.13756#biba.bib45)\)investigated three strategies: retrieval\-augmented generation based on a curated rule base, supervised fine\-tuning of large language models, and a combination of fine\-tuning and retrieval\. Their experiments showed that fine\-tuning alone performed better than retrieval\-based approaches, with the best results obtained by a fine\-tuned Fanar model\. This suggests that task\-specific fine\-tuning can be an effective approach for Islamic inheritance reasoning\. Team QU\-NLPAlsmadi \([2026](https://arxiv.org/html/2606.13756#biba.bib41)\)proposed a multi\-stage Quantized Low\-Rank Adaptation \(QLoRA\) strategy\. Their approach involved initial domain adaptation on a corpus of Islamic fatwas to capture jurisprudential reasoning patterns, followed by task\-specific fine\-tuning on structured inheritance cases to optimize JSON\-formatted output\. This methodology allowed a relatively small 4B parameter model to achieve competitive performance, highlighting the effectiveness of specialized training strategies for complex legal reasoning tasks\. Team AGS\-KSUSidaoui \([2026](https://arxiv.org/html/2606.13756#biba.bib52)\)also explored a fine\-tuning approach using Qwen2\.5\-3B, similar to the strategy adopted by Team QU\-NLP\. However, this fine\-tuned model achieved a relatively low MIR\-E score of0\.300\.30\. By comparison, their prompting\-based configuration using GPT\-5\.4 Thinking reached 0\.84, yielding a substantially stronger result\.

Finally, some teams explicitly separated natural\-language understanding from legal computation\. Team SimplicityAlmansour \([2026](https://arxiv.org/html/2606.13756#biba.bib42)\)proposed a two\-stage neuro\-symbolic pipeline in which a commercial LLM was used only for Arabic information extraction\. The extracted heirs were mapped to a standardized set of legal heir categories and then passed to a symbolic rule\-based component that carried out blocking, share assignment, and final calculation according to the rules of‘ilm al\-mawārīth\. This design reflects a clear division of labor between the LLM and the symbolic module\.

The final performance of the submitted systems is summarized in Table[3](https://arxiv.org/html/2606.13756#S4.T3), where Team CVPD achieved the top ranking \(0\.935\) using a RAG pipeline based on the Qwen\-9B model\. Notably, the QU\-NLP system’s multi\-stage QLoRA approach achieved a competitive MIR\-E score of 0\.907\. This performance is particularly significant, as it closely matches the 0\.901 score of the commercial Gemini\-2\.5\-Flash model \(as reported inBouchekifet al\.\([2026](https://arxiv.org/html/2606.13756#biba.bib1)\)\)\. These results demonstrate that, while RAG remains a powerful tool for precision, specialized domain adaptation can enable smaller open\-weight models, such as the 4B\-parameter Qwen3, to achieve performance levels close to those of commercial models on this task\.

## 5\. Conclusions and Future Work

In this paper, we presented the QIAS 2026 Shared Task, designed to evaluate the ability of large language models to solve Islamic inheritance cases end\-to\-end, including case understanding, heir identification, rule application, and final share distribution\. The results of the participating systems revealed a clear gap between commercial and open\-weight models, with commercial systems generally achieving stronger performance on this challenging reasoning task\. At the same time, the findings suggest that fine\-tuning LLMs on MAWARITH data can substantially improve performance, while RAG also proved beneficial for some systems\.

As a future direction, we plan to organize a follow\-up edition of the shared task focusing on more complex inheritance cases, including pregnancy\-related cases, multiple deaths, the missing person \(mafqūd\), and the intersex heir \(khunthā\)\. We also plan to focus on fine\-tuning approaches for small Arabic LLMs\. To ensure a fairer comparison across participants, future editions may restrict the use of commercial LLMs and focus on open\-weight models that can be reproduced and compared under the same conditions\.

## 6\. Limitations

This work has several limitations\. First, QIAS 2026 focuses only on inheritance cases based on the majority opinion \(al\-jumhūr\)\. As a result, it does not represent the full diversity of juristic opinions in Islamic law\. Second, although the benchmark covers many cases, it still does not include all the complexity of real inheritance situations\. Some difficult or rare cases remain outside the current scope\.

Third, the MIR\-E metric evaluates structured reasoning outputs in a detailed and reproducible way, but it does not fully measure explanation quality, clarity, or usefulness for real users\.

Finally, some submitted systems use commercial models, which makes full reproducibility more difficult\. Future work should expand the legal coverage of the benchmark and give more attention to open and reproducible systems\.

## 7\. References

- U\. Abbas, M\. S\. Ahmad, M\. Ahmad, A\. Al\-Homaid, A\. Al\-Nuaimi, E\. Altinisik, E\. Asgari, S\. Chawla, S\. Chowdhury, F\. Dalvi,et al\.\(2026\)Fanar 2\.0: arabic generative AI stack\.arXiv preprint arXiv:2603\.16397\.Cited by:[Appendix A](https://arxiv.org/html/2606.13756#A1.p3.2)\.
- IslamicMMLU: a benchmark for evaluating llms on islamic knowledge\.arXiv preprint 2603\.23750\.External Links:[Link](https://arxiv.org/abs/2603.23750)Cited by:[Appendix A](https://arxiv.org/html/2606.13756#A1.p1.1)\.
- N\. AlDahoul and Y\. Zaki \(2025\)NYUAD at QIAS shared task: benchmarking the legal reasoning of LLMs in arabic islamic inheritance cases\.InProceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks,pp\. 861–866\.Cited by:[Appendix A](https://arxiv.org/html/2606.13756#A1.p3.2)\.
- O\. Alkhamis \(2026\)KMS at QIAS 2026: evaluating llm reasoning for islamic inheritance division\.InProceedings of the 7th Workshop on Open\-Source Arabic Corpora and Processing Tools \(OSACT7\), co\-located with LREC\-COLING 2026,Palma de Mallorca, Spain\.Cited by:[Table 2](https://arxiv.org/html/2606.13756#S3.T2.1.4.3.1.1.1),[§4](https://arxiv.org/html/2606.13756#S4.p1.1)\.
- M\. Almansour \(2026\)Simplicity at QIAS 2026: decoupling language extraction from mathematical logic in islamic inheritance law\.InProceedings of the 7th Workshop on Open\-Source Arabic Corpora and Processing Tools \(OSACT7\), co\-located with LREC\-COLING 2026,Palma de Mallorca, Spain\.Cited by:[Table 2](https://arxiv.org/html/2606.13756#S3.T2.1.3.2.1.1.1),[§4](https://arxiv.org/html/2606.13756#S4.p4.1)\.
- A\. Almasoud, S\. Al\-Ghamdi, R\. Alqifari, N\. Alfear, and H\. Al\-Khalifa \(2026\)MirathQA: a dataset for evaluating large language models on hanbali islamic inheritance reasoning tasks\.Data in Brief,pp\. 112589\.External Links:ISSN 2352\-3409,[Document](https://dx.doi.org/https%3A//doi.org/10.1016/j.dib.2026.112589),[Link](https://www.sciencedirect.com/science/article/pii/S2352340926001423)Cited by:[Appendix A](https://arxiv.org/html/2606.13756#A1.p3.2)\.
- S\. Alowaidi \(2025\)SEA\-Team at QIAS 2025: enhancing LLMs for question answering in islamic texts\.InProceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks,K\. Darwish, A\. Ali, I\. Abu Farha, S\. Touileb, I\. Zitouni, A\. Abdelali, S\. Al\-Ghamdi, S\. Alkhereyf, W\. Zaghouani, S\. Khalifa, B\. AlKhamissi, R\. Almatham, I\. Hamed, Z\. Alyafeai, A\. Alowisheq, G\. Inoue, K\. Mrini, and W\. Alshammari \(Eds\.\),Suzhou, China,pp\. 940–946\.External Links:[Link](https://aclanthology.org/2025.arabicnlp-sharedtasks.130/),[Document](https://dx.doi.org/10.18653/v1/2025.arabicnlp-sharedtasks.130),ISBN 979\-8\-89176\-356\-2Cited by:[Appendix A](https://arxiv.org/html/2606.13756#A1.p2.1)\.
- M\. Alsmadi \(2026\)QU\-NLP at QIAS 2026: multi\-stage QLoRA fine\-tuning for arabic islamic inheritance reasoning\.InProceedings of the 7th Workshop on Open\-Source Arabic Corpora and Processing Tools \(OSACT7\), co\-located with LREC\-COLING 2026,Palma de Mallorca, Spain\.Cited by:[Table 2](https://arxiv.org/html/2606.13756#S3.T2.1.5.4.1.1.1),[§4](https://arxiv.org/html/2606.13756#S4.p3.1)\.
- R\. Anil, S\. Borgeaud, J\. Alayrac, J\. Yu, R\. Soricut, J\. Schalkwyk, A\. M\. Dai, A\. Hauth, K\. Millican,et al\.\(2023\)Gemini: a family of highly capable multimodal models\.arXiv preprint arXiv:2312\.11805\.Cited by:[Appendix A](https://arxiv.org/html/2606.13756#A1.p3.2)\.
- S\. E\. Bekhouche, A\. Z\. Sellam, H\. Telli, C\. Distante, and A\. Hadid \(2025\)CVPD at QIAS 2025 shared task: an efficient encoder\-based approach for islamic inheritance reasoning\.InProceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks,pp\. 929–934\.External Links:[Link](https://arxiv.org/abs/2509.00457)Cited by:[Appendix A](https://arxiv.org/html/2606.13756#A1.p3.2)\.
- G\. Bhatia, H\. Mubarak, M\. Jarrar, G\. Mikros, F\. Zaraket, M\. Alhirthani, M\. Al\-Khatib, L\. Cochrane, K\. Darwish, R\. Yahiaoui,et al\.\(2026\)From RAG to agentic RAG for faithful islamic question answering\.arXiv preprint arXiv:2601\.07528\.Cited by:[Appendix A](https://arxiv.org/html/2606.13756#A1.p1.1),[Appendix A](https://arxiv.org/html/2606.13756#A1.p2.1)\.
- A\. Bouchekif, S\. Gaben, S\. Rashwani, S\. Eltanbouly, M\. Al\-Khatib, H\. Sbahi, M\. Ghaly, and E\. Mohamed \(2026\)MAWARITH: a dataset and benchmark for legal inheritance reasoning with LLMs\.arXiv preprint arXiv:2603\.07539\.External Links:[Link](https://arxiv.org/abs/2603.07539)Cited by:[§1](https://arxiv.org/html/2606.13756#S1.p2.1),[§3\.1](https://arxiv.org/html/2606.13756#S3.SS1.p1.1),[§3](https://arxiv.org/html/2606.13756#S3.p1.1),[§4](https://arxiv.org/html/2606.13756#S4.p5.1)\.
- A\. Bouchekif, S\. Rashwani, E\. S\. A\. Mohamed, M\. Alkhatib, H\. Sbahi, S\. Gaben, W\. Zaghouani, A\. Erbad, and M\. Ghaly \(2025a\)QIAS 2025: overview of the shared task on islamic inheritance reasoning and knowledge assessment\.InProceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks,pp\. 851–860\.External Links:[Link](https://aclanthology.org/2025.arabicnlp-sharedtasks.117/)Cited by:[Appendix A](https://arxiv.org/html/2606.13756#A1.p3.2)\.
- A\. Bouchekif, S\. Rashwani, H\. Sbahi, S\. Gaben, M\. Al\-Khatib, and M\. Ghaly \(2025b\)Assessing large language models on islamic legal reasoning: evidence from inheritance law evaluation\.InProceedings of The Third Arabic Natural Language Processing Conference,pp\. 246–257\.External Links:[Link](https://aclanthology.org/2025.arabicnlp-main.20/)Cited by:[Appendix A](https://arxiv.org/html/2606.13756#A1.p1.1),[Appendix A](https://arxiv.org/html/2606.13756#A1.p3.2)\.
- I\. Chaabane, P\. Khanna, S\. Mohmad, S\. Frikha, S\. Hu, A\. Abubaker, R\. Alami, M\. Lubinets, M\. E\. A\. Seddik, H\. Hacid,et al\.\(2026\)Falcon\-H1R: pushing the reasoning frontiers with a hybrid model for efficient test\-time scaling\.arXiv preprint arXiv:2601\.02346\.Cited by:[Appendix A](https://arxiv.org/html/2606.13756#A1.p3.2)\.
- K\. Cobbe, V\. Kosaraju, M\. Bavarian, M\. Chen, H\. Jun, L\. Kaiser, M\. Plappert, J\. Tworek, J\. Hilton, R\. Nakano, C\. Hesse, and J\. Schulman \(2021\)Training verifiers to solve math word problems\.CoRRabs/2110\.14168\.External Links:[Link](https://arxiv.org/abs/2110.14168),2110\.14168Cited by:[Appendix A](https://arxiv.org/html/2606.13756#A1.p3.2)\.
- DeepSeek AI \(2024\)DeepSeek\-R1: incentivizing reasoning capability in large language models\.Technical reportDeepSeek AI\.Note:Technical ReportCited by:[Appendix A](https://arxiv.org/html/2606.13756#A1.p3.2)\.
- E\. Elrefai, M\. Lotfy Elrefai, and A\. Hassan Esmail \(2025\)Gumball at QIAS 2025: arabic LLM automated reasoning in islamic inheritance\.InProceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks,K\. Darwish, A\. Ali, I\. Abu Farha, S\. Touileb, I\. Zitouni, A\. Abdelali, S\. Al\-Ghamdi, S\. Alkhereyf, W\. Zaghouani, S\. Khalifa, B\. AlKhamissi, R\. Almatham, I\. Hamed, Z\. Alyafeai, A\. Alowisheq, G\. Inoue, K\. Mrini, and W\. Alshammari \(Eds\.\),Suzhou, China,pp\. 953–959\.External Links:[Link](https://aclanthology.org/2025.arabicnlp-sharedtasks.132/),[Document](https://dx.doi.org/10.18653/v1/2025.arabicnlp-sharedtasks.132),ISBN 979\-8\-89176\-356\-2Cited by:[Appendix A](https://arxiv.org/html/2606.13756#A1.p3.2)\.
- D\. Hendrycks, C\. Burns, S\. Kadavath, A\. Arora, S\. Basart, E\. Tang, D\. Song, and J\. Steinhardt \(2021\)Measuring mathematical problem solving with the MATH dataset\.External Links:2103\.03874,[Link](https://arxiv.org/abs/2103.03874)Cited by:[Appendix A](https://arxiv.org/html/2606.13756#A1.p3.2)\.
- S\. Hossain and H\. Afli \(2025\)ADAPT–MTU HAI at QIAS2025: dual\-expert LLM fine\-tuning and constrained decoding for arabic islamic inheritance reasoning\.InProceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks,K\. Darwish, A\. Ali, I\. Abu Farha, S\. Touileb, I\. Zitouni, A\. Abdelali, S\. Al\-Ghamdi, S\. Alkhereyf, W\. Zaghouani, S\. Khalifa, B\. AlKhamissi, R\. Almatham, I\. Hamed, Z\. Alyafeai, A\. Alowisheq, G\. Inoue, K\. Mrini, and W\. Alshammari \(Eds\.\),Suzhou, China,pp\. 923–928\.External Links:[Link](https://aclanthology.org/2025.arabicnlp-sharedtasks.127/),[Document](https://dx.doi.org/10.18653/v1/2025.arabicnlp-sharedtasks.127),ISBN 979\-8\-89176\-356\-2Cited by:[Appendix A](https://arxiv.org/html/2606.13756#A1.p3.2)\.
- G\. Kurdi, H\. Justanieah, and H\. Justanieah \(2026\)Silah at QIAS 2026: fine\-tuning vs\. retrieval\-augmented generation for islamic inheritance reasoning\.InProceedings of the 7th Workshop on Open\-Source Arabic Corpora and Processing Tools \(OSACT7\), co\-located with LREC\-COLING 2026,Palma de Mallorca, Spain\.Cited by:[Table 2](https://arxiv.org/html/2606.13756#S3.T2.1.7.6.1.1.1),[§4](https://arxiv.org/html/2606.13756#S4.p3.1)\.
- A\. Mohammad \(2025\)QU\-NLP at QIAS 2025 shared task: a two\-phase llm fine\-tuning and retrieval\-augmented generation approach for islamic inheritance reasoning\.InProceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks,pp\. 892–898\.Cited by:[Appendix A](https://arxiv.org/html/2606.13756#A1.p3.2)\.
- M\. Motasim Hamed, N\. Ghneim, and R\. Sonbol \(2025\)HIAST at QIAS 2025: retrieval\-augmented LLMs with top\-hit web evidence for arabic islamic reasoning QA\.InProceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks,Suzhou, China,pp\. 883–891\.External Links:[Link](https://aclanthology.org/2025.arabicnlp-sharedtasks.122/),[Document](https://dx.doi.org/10.18653/v1/2025.arabicnlp-sharedtasks.122),ISBN 979\-8\-89176\-356\-2Cited by:[Appendix A](https://arxiv.org/html/2606.13756#A1.p3.2)\.
- M\. L\. Mouhoub and C\. Bouchekif \(2026\)PSL at QIAS 2026: which models perform better in arabic inheritance reasoning?\.InProceedings of the 7th Workshop on Open\-Source Arabic Corpora and Processing Tools \(OSACT7\), co\-located with LREC\-COLING 2026,Palma de Mallorca, Spain\.Cited by:[Table 2](https://arxiv.org/html/2606.13756#S3.T2.1.6.5.1.1.1),[§4](https://arxiv.org/html/2606.13756#S4.p1.1)\.
- H\. Mubarak, R\. Malhas, W\. Mansour, A\. Mohamed, M\. Fawzi, M\. Hawasly, T\. Elsayed, K\. M\. Darwish, and W\. Magdy \(2025\)IslamicEval 2025: the first shared task of capturing LLMs hallucination in islamic content\.InProceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks,pp\. 480–493\.Cited by:[Appendix A](https://arxiv.org/html/2606.13756#A1.p1.1)\.
- Y\. Noureldien, H\. Suliman, F\. Attallah, A\. Mohamed, and S\. Abdalla \(2025\)Athar at QIAS2025: LLM\-based question answering systems for islamic inheritance and classical islamic knowledge\.InProceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks,K\. Darwish, A\. Ali, I\. Abu Farha, S\. Touileb, I\. Zitouni, A\. Abdelali, S\. Al\-Ghamdi, S\. Alkhereyf, W\. Zaghouani, S\. Khalifa, B\. AlKhamissi, R\. Almatham, I\. Hamed, Z\. Alyafeai, A\. Alowisheq, G\. Inoue, K\. Mrini, and W\. Alshammari \(Eds\.\),Suzhou, China,pp\. 914–922\.External Links:[Link](https://aclanthology.org/2025.arabicnlp-sharedtasks.126/),[Document](https://dx.doi.org/10.18653/v1/2025.arabicnlp-sharedtasks.126),ISBN 979\-8\-89176\-356\-2Cited by:[Appendix A](https://arxiv.org/html/2606.13756#A1.p2.1)\.
- N\. X\. Phuc and T\. Đ\. Văn \(2025\)PuxAI at QIAS 2025: multi\-agent retrieval\-augmented generation for islamic inheritance and knowledge reasoning\.InProceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks,pp\. 905–913\.Cited by:[Appendix A](https://arxiv.org/html/2606.13756#A1.p1.1)\.
- J\. R’baiti, C\. El Hachimi, Y\. Hmamouche, and A\. El Fallah Seghrouchni \(2025\)MorAI at QIAS 2025: collaborative LLM via voting and retrieval\-augmented generation for solving complex inheritance problems\.InProceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks,K\. Darwish, A\. Ali, I\. Abu Farha, S\. Touileb, I\. Zitouni, A\. Abdelali, S\. Al\-Ghamdi, S\. Alkhereyf, W\. Zaghouani, S\. Khalifa, B\. AlKhamissi, R\. Almatham, I\. Hamed, Z\. Alyafeai, A\. Alowisheq, G\. Inoue, K\. Mrini, and W\. Alshammari \(Eds\.\),Suzhou, China,pp\. 947–952\.External Links:[Link](https://aclanthology.org/2025.arabicnlp-sharedtasks.131/),[Document](https://dx.doi.org/10.18653/v1/2025.arabicnlp-sharedtasks.131),ISBN 979\-8\-89176\-356\-2Cited by:[Appendix A](https://arxiv.org/html/2606.13756#A1.p3.2)\.
- W\. Shen, Z\. Yang, C\. Li, Z\. Lu, M\. Peng, H\. Sun, Y\. Shi, S\. Liao, S\. Lai, B\. Zhang,et al\.\(2025\)QwenLong\-L1\.5: post\-training recipe for long\-context reasoning and memory management\.Cited by:[Appendix A](https://arxiv.org/html/2606.13756#A1.p3.2)\.
- H\. G\. Sidaoui \(2026\)AGS\-KSU at QIAS 2026: a comparative study of prompting and LLM approaches for structured islamic inheritance reasoning\.InProceedings of the 7th Workshop on Open\-Source Arabic Corpora and Processing Tools \(OSACT7\), co\-located with LREC\-COLING 2026,Palma de Mallorca, Spain\.Cited by:[Table 2](https://arxiv.org/html/2606.13756#S3.T2.1.8.7.1.1.1),[§4](https://arxiv.org/html/2606.13756#S4.p3.1)\.
- A\. Singh, A\. Fry, A\. Perelman, A\. Tart, A\. Ganesh, A\. El\-Kishky, A\. McLaughlin, A\. Low, A\. Ostrow, A\. Ananthram,et al\.\(2025\)OpenAI GPT\-5 system card\.arXiv preprint arXiv:2601\.03267\.Cited by:[Appendix A](https://arxiv.org/html/2606.13756#A1.p3.2)\.
- W\. Swaileh, M\. Zighem, H\. Telli, S\. E\. Bekhouche, A\. Z\. Sellam, and F\. Dornaika \(2026\)CVPD at QIAS 2026: RAG\-guided LLM reasoning for al\-mawarith share calculation and heir allocation\.InProceedings of the 7th Workshop on Open\-Source Arabic Corpora and Processing Tools \(OSACT7\), co\-located with LREC\-COLING 2026,Palma de Mallorca, Spain\.Cited by:[Table 2](https://arxiv.org/html/2606.13756#S3.T2.1.2.1.1.1.1),[§4](https://arxiv.org/html/2606.13756#S4.p2.1)\.
- J\. Wei, X\. Wang, D\. Schuurmans, M\. Bosma, F\. Xia, E\. Chi, Q\. V\. Le, D\. Zhou,et al\.\(2022\)Chain\-of\-thought prompting elicits reasoning in large language models\.Advances in Neural Information Processing Systems35,pp\. 24824–24837\.Cited by:[Appendix A](https://arxiv.org/html/2606.13756#A1.p3.2)\.
- J\. Woo, F\. H\. Chaleshtori, A\. Marasović, and K\. Marino \(2025\)BriefMe: a legal NLP benchmark for assisting with legal briefs\.arXiv preprint arXiv:2506\.06619\.Cited by:[Appendix A](https://arxiv.org/html/2606.13756#A1.p3.2)\.
- O\. F\. Zaki \(2025\)CIS\-RG at QIAS 2025 shared task: approaches for enhancing performance of LLM on islamic legal reasoning and its mathematical calculations\.InProceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks,pp\. 935–939\.Cited by:[Appendix A](https://arxiv.org/html/2606.13756#A1.p3.2)\.

- U\. Abbas, M\. S\. Ahmad, M\. Ahmad, A\. Al\-Homaid, A\. Al\-Nuaimi, E\. Altinisik, E\. Asgari, S\. Chawla, S\. Chowdhury, F\. Dalvi,et al\.\(2026\)Fanar 2\.0: arabic generative AI stack\.arXiv preprint arXiv:2603\.16397\.Cited by:[Appendix A](https://arxiv.org/html/2606.13756#A1.p3.2)\.
- A\. Abdelaal, M\. N\. A\. Haffar, M\. Fawzi, and W\. Magdy \(2026\)IslamicMMLU: a benchmark for evaluating llms on islamic knowledge\.arXiv preprint 2603\.23750\.External Links:[Link](https://arxiv.org/abs/2603.23750)Cited by:[Appendix A](https://arxiv.org/html/2606.13756#A1.p1.1)\.
- N\. AlDahoul and Y\. Zaki \(2025\)NYUAD at QIAS shared task: benchmarking the legal reasoning of LLMs in arabic islamic inheritance cases\.InProceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks,pp\. 861–866\.Cited by:[Appendix A](https://arxiv.org/html/2606.13756#A1.p3.2)\.
- O\. Alkhamis \(2026\)KMS at QIAS 2026: evaluating llm reasoning for islamic inheritance division\.InProceedings of the 7th Workshop on Open\-Source Arabic Corpora and Processing Tools \(OSACT7\), co\-located with LREC\-COLING 2026,Palma de Mallorca, Spain\.Cited by:[Table 2](https://arxiv.org/html/2606.13756#S3.T2.1.4.3.1.1.1),[§4](https://arxiv.org/html/2606.13756#S4.p1.1)\.
- M\. Almansour \(2026\)Simplicity at QIAS 2026: decoupling language extraction from mathematical logic in islamic inheritance law\.InProceedings of the 7th Workshop on Open\-Source Arabic Corpora and Processing Tools \(OSACT7\), co\-located with LREC\-COLING 2026,Palma de Mallorca, Spain\.Cited by:[Table 2](https://arxiv.org/html/2606.13756#S3.T2.1.3.2.1.1.1),[§4](https://arxiv.org/html/2606.13756#S4.p4.1)\.
- A\. Almasoud, S\. Al\-Ghamdi, R\. Alqifari, N\. Alfear, and H\. Al\-Khalifa \(2026\)MirathQA: a dataset for evaluating large language models on hanbali islamic inheritance reasoning tasks\.Data in Brief,pp\. 112589\.External Links:ISSN 2352\-3409,[Document](https://dx.doi.org/https%3A//doi.org/10.1016/j.dib.2026.112589),[Link](https://www.sciencedirect.com/science/article/pii/S2352340926001423)Cited by:[Appendix A](https://arxiv.org/html/2606.13756#A1.p3.2)\.
- S\. Alowaidi \(2025\)SEA\-Team at QIAS 2025: enhancing LLMs for question answering in islamic texts\.InProceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks,K\. Darwish, A\. Ali, I\. Abu Farha, S\. Touileb, I\. Zitouni, A\. Abdelali, S\. Al\-Ghamdi, S\. Alkhereyf, W\. Zaghouani, S\. Khalifa, B\. AlKhamissi, R\. Almatham, I\. Hamed, Z\. Alyafeai, A\. Alowisheq, G\. Inoue, K\. Mrini, and W\. Alshammari \(Eds\.\),Suzhou, China,pp\. 940–946\.External Links:[Link](https://aclanthology.org/2025.arabicnlp-sharedtasks.130/),[Document](https://dx.doi.org/10.18653/v1/2025.arabicnlp-sharedtasks.130),ISBN 979\-8\-89176\-356\-2Cited by:[Appendix A](https://arxiv.org/html/2606.13756#A1.p2.1)\.
- M\. Alsmadi \(2026\)QU\-NLP at QIAS 2026: multi\-stage QLoRA fine\-tuning for arabic islamic inheritance reasoning\.InProceedings of the 7th Workshop on Open\-Source Arabic Corpora and Processing Tools \(OSACT7\), co\-located with LREC\-COLING 2026,Palma de Mallorca, Spain\.Cited by:[Table 2](https://arxiv.org/html/2606.13756#S3.T2.1.5.4.1.1.1),[§4](https://arxiv.org/html/2606.13756#S4.p3.1)\.
- R\. Anil, S\. Borgeaud, J\. Alayrac, J\. Yu, R\. Soricut, J\. Schalkwyk, A\. M\. Dai, A\. Hauth, K\. Millican,et al\.\(2023\)Gemini: a family of highly capable multimodal models\.arXiv preprint arXiv:2312\.11805\.Cited by:[Appendix A](https://arxiv.org/html/2606.13756#A1.p3.2)\.
- S\. E\. Bekhouche, A\. Z\. Sellam, H\. Telli, C\. Distante, and A\. Hadid \(2025\)CVPD at QIAS 2025 shared task: an efficient encoder\-based approach for islamic inheritance reasoning\.InProceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks,pp\. 929–934\.External Links:[Link](https://arxiv.org/abs/2509.00457)Cited by:[Appendix A](https://arxiv.org/html/2606.13756#A1.p3.2)\.
- G\. Bhatia, H\. Mubarak, M\. Jarrar, G\. Mikros, F\. Zaraket, M\. Alhirthani, M\. Al\-Khatib, L\. Cochrane, K\. Darwish, R\. Yahiaoui,et al\.\(2026\)From RAG to agentic RAG for faithful islamic question answering\.arXiv preprint arXiv:2601\.07528\.Cited by:[Appendix A](https://arxiv.org/html/2606.13756#A1.p1.1),[Appendix A](https://arxiv.org/html/2606.13756#A1.p2.1)\.
- A\. Bouchekif, S\. Gaben, S\. Rashwani, S\. Eltanbouly, M\. Al\-Khatib, H\. Sbahi, M\. Ghaly, and E\. Mohamed \(2026\)MAWARITH: a dataset and benchmark for legal inheritance reasoning with LLMs\.arXiv preprint arXiv:2603\.07539\.External Links:[Link](https://arxiv.org/abs/2603.07539)Cited by:[§1](https://arxiv.org/html/2606.13756#S1.p2.1),[§3\.1](https://arxiv.org/html/2606.13756#S3.SS1.p1.1),[§3](https://arxiv.org/html/2606.13756#S3.p1.1),[§4](https://arxiv.org/html/2606.13756#S4.p5.1)\.
- A\. Bouchekif, S\. Rashwani, E\. S\. A\. Mohamed, M\. Alkhatib, H\. Sbahi, S\. Gaben, W\. Zaghouani, A\. Erbad, and M\. Ghaly \(2025a\)QIAS 2025: overview of the shared task on islamic inheritance reasoning and knowledge assessment\.InProceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks,pp\. 851–860\.External Links:[Link](https://aclanthology.org/2025.arabicnlp-sharedtasks.117/)Cited by:[Appendix A](https://arxiv.org/html/2606.13756#A1.p3.2)\.
- A\. Bouchekif, S\. Rashwani, H\. Sbahi, S\. Gaben, M\. Al\-Khatib, and M\. Ghaly \(2025b\)Assessing large language models on islamic legal reasoning: evidence from inheritance law evaluation\.InProceedings of The Third Arabic Natural Language Processing Conference,pp\. 246–257\.External Links:[Link](https://aclanthology.org/2025.arabicnlp-main.20/)Cited by:[Appendix A](https://arxiv.org/html/2606.13756#A1.p1.1),[Appendix A](https://arxiv.org/html/2606.13756#A1.p3.2)\.
- I\. Chaabane, P\. Khanna, S\. Mohmad, S\. Frikha, S\. Hu, A\. Abubaker, R\. Alami, M\. Lubinets, M\. E\. A\. Seddik, H\. Hacid,et al\.\(2026\)Falcon\-H1R: pushing the reasoning frontiers with a hybrid model for efficient test\-time scaling\.arXiv preprint arXiv:2601\.02346\.Cited by:[Appendix A](https://arxiv.org/html/2606.13756#A1.p3.2)\.
- K\. Cobbe, V\. Kosaraju, M\. Bavarian, M\. Chen, H\. Jun, L\. Kaiser, M\. Plappert, J\. Tworek, J\. Hilton, R\. Nakano, C\. Hesse, and J\. Schulman \(2021\)Training verifiers to solve math word problems\.CoRRabs/2110\.14168\.External Links:[Link](https://arxiv.org/abs/2110.14168),2110\.14168Cited by:[Appendix A](https://arxiv.org/html/2606.13756#A1.p3.2)\.
- DeepSeek AI \(2024\)DeepSeek\-R1: incentivizing reasoning capability in large language models\.Technical reportDeepSeek AI\.Note:Technical ReportCited by:[Appendix A](https://arxiv.org/html/2606.13756#A1.p3.2)\.
- E\. Elrefai, M\. Lotfy Elrefai, and A\. Hassan Esmail \(2025\)Gumball at QIAS 2025: arabic LLM automated reasoning in islamic inheritance\.InProceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks,K\. Darwish, A\. Ali, I\. Abu Farha, S\. Touileb, I\. Zitouni, A\. Abdelali, S\. Al\-Ghamdi, S\. Alkhereyf, W\. Zaghouani, S\. Khalifa, B\. AlKhamissi, R\. Almatham, I\. Hamed, Z\. Alyafeai, A\. Alowisheq, G\. Inoue, K\. Mrini, and W\. Alshammari \(Eds\.\),Suzhou, China,pp\. 953–959\.External Links:[Link](https://aclanthology.org/2025.arabicnlp-sharedtasks.132/),[Document](https://dx.doi.org/10.18653/v1/2025.arabicnlp-sharedtasks.132),ISBN 979\-8\-89176\-356\-2Cited by:[Appendix A](https://arxiv.org/html/2606.13756#A1.p3.2)\.
- D\. Hendrycks, C\. Burns, S\. Kadavath, A\. Arora, S\. Basart, E\. Tang, D\. Song, and J\. Steinhardt \(2021\)Measuring mathematical problem solving with the MATH dataset\.External Links:2103\.03874,[Link](https://arxiv.org/abs/2103.03874)Cited by:[Appendix A](https://arxiv.org/html/2606.13756#A1.p3.2)\.
- S\. Hossain and H\. Afli \(2025\)ADAPT–MTU HAI at QIAS2025: dual\-expert LLM fine\-tuning and constrained decoding for arabic islamic inheritance reasoning\.InProceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks,K\. Darwish, A\. Ali, I\. Abu Farha, S\. Touileb, I\. Zitouni, A\. Abdelali, S\. Al\-Ghamdi, S\. Alkhereyf, W\. Zaghouani, S\. Khalifa, B\. AlKhamissi, R\. Almatham, I\. Hamed, Z\. Alyafeai, A\. Alowisheq, G\. Inoue, K\. Mrini, and W\. Alshammari \(Eds\.\),Suzhou, China,pp\. 923–928\.External Links:[Link](https://aclanthology.org/2025.arabicnlp-sharedtasks.127/),[Document](https://dx.doi.org/10.18653/v1/2025.arabicnlp-sharedtasks.127),ISBN 979\-8\-89176\-356\-2Cited by:[Appendix A](https://arxiv.org/html/2606.13756#A1.p3.2)\.
- G\. Kurdi, H\. Justanieah, and H\. Justanieah \(2026\)Silah at QIAS 2026: fine\-tuning vs\. retrieval\-augmented generation for islamic inheritance reasoning\.InProceedings of the 7th Workshop on Open\-Source Arabic Corpora and Processing Tools \(OSACT7\), co\-located with LREC\-COLING 2026,Palma de Mallorca, Spain\.Cited by:[Table 2](https://arxiv.org/html/2606.13756#S3.T2.1.7.6.1.1.1),[§4](https://arxiv.org/html/2606.13756#S4.p3.1)\.
- A\. Mohammad \(2025\)QU\-NLP at QIAS 2025 shared task: a two\-phase llm fine\-tuning and retrieval\-augmented generation approach for islamic inheritance reasoning\.InProceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks,pp\. 892–898\.Cited by:[Appendix A](https://arxiv.org/html/2606.13756#A1.p3.2)\.
- M\. Motasim Hamed, N\. Ghneim, and R\. Sonbol \(2025\)HIAST at QIAS 2025: retrieval\-augmented LLMs with top\-hit web evidence for arabic islamic reasoning QA\.InProceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks,Suzhou, China,pp\. 883–891\.External Links:[Link](https://aclanthology.org/2025.arabicnlp-sharedtasks.122/),[Document](https://dx.doi.org/10.18653/v1/2025.arabicnlp-sharedtasks.122),ISBN 979\-8\-89176\-356\-2Cited by:[Appendix A](https://arxiv.org/html/2606.13756#A1.p3.2)\.
- M\. L\. Mouhoub and C\. Bouchekif \(2026\)PSL at QIAS 2026: which models perform better in arabic inheritance reasoning?\.InProceedings of the 7th Workshop on Open\-Source Arabic Corpora and Processing Tools \(OSACT7\), co\-located with LREC\-COLING 2026,Palma de Mallorca, Spain\.Cited by:[Table 2](https://arxiv.org/html/2606.13756#S3.T2.1.6.5.1.1.1),[§4](https://arxiv.org/html/2606.13756#S4.p1.1)\.
- H\. Mubarak, R\. Malhas, W\. Mansour, A\. Mohamed, M\. Fawzi, M\. Hawasly, T\. Elsayed, K\. M\. Darwish, and W\. Magdy \(2025\)IslamicEval 2025: the first shared task of capturing LLMs hallucination in islamic content\.InProceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks,pp\. 480–493\.Cited by:[Appendix A](https://arxiv.org/html/2606.13756#A1.p1.1)\.
- Y\. Noureldien, H\. Suliman, F\. Attallah, A\. Mohamed, and S\. Abdalla \(2025\)Athar at QIAS2025: LLM\-based question answering systems for islamic inheritance and classical islamic knowledge\.InProceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks,K\. Darwish, A\. Ali, I\. Abu Farha, S\. Touileb, I\. Zitouni, A\. Abdelali, S\. Al\-Ghamdi, S\. Alkhereyf, W\. Zaghouani, S\. Khalifa, B\. AlKhamissi, R\. Almatham, I\. Hamed, Z\. Alyafeai, A\. Alowisheq, G\. Inoue, K\. Mrini, and W\. Alshammari \(Eds\.\),Suzhou, China,pp\. 914–922\.External Links:[Link](https://aclanthology.org/2025.arabicnlp-sharedtasks.126/),[Document](https://dx.doi.org/10.18653/v1/2025.arabicnlp-sharedtasks.126),ISBN 979\-8\-89176\-356\-2Cited by:[Appendix A](https://arxiv.org/html/2606.13756#A1.p2.1)\.
- N\. X\. Phuc and T\. Đ\. Văn \(2025\)PuxAI at QIAS 2025: multi\-agent retrieval\-augmented generation for islamic inheritance and knowledge reasoning\.InProceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks,pp\. 905–913\.Cited by:[Appendix A](https://arxiv.org/html/2606.13756#A1.p1.1)\.
- J\. R’baiti, C\. El Hachimi, Y\. Hmamouche, and A\. El Fallah Seghrouchni \(2025\)MorAI at QIAS 2025: collaborative LLM via voting and retrieval\-augmented generation for solving complex inheritance problems\.InProceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks,K\. Darwish, A\. Ali, I\. Abu Farha, S\. Touileb, I\. Zitouni, A\. Abdelali, S\. Al\-Ghamdi, S\. Alkhereyf, W\. Zaghouani, S\. Khalifa, B\. AlKhamissi, R\. Almatham, I\. Hamed, Z\. Alyafeai, A\. Alowisheq, G\. Inoue, K\. Mrini, and W\. Alshammari \(Eds\.\),Suzhou, China,pp\. 947–952\.External Links:[Link](https://aclanthology.org/2025.arabicnlp-sharedtasks.131/),[Document](https://dx.doi.org/10.18653/v1/2025.arabicnlp-sharedtasks.131),ISBN 979\-8\-89176\-356\-2Cited by:[Appendix A](https://arxiv.org/html/2606.13756#A1.p3.2)\.
- W\. Shen, Z\. Yang, C\. Li, Z\. Lu, M\. Peng, H\. Sun, Y\. Shi, S\. Liao, S\. Lai, B\. Zhang,et al\.\(2025\)QwenLong\-L1\.5: post\-training recipe for long\-context reasoning and memory management\.Cited by:[Appendix A](https://arxiv.org/html/2606.13756#A1.p3.2)\.
- H\. G\. Sidaoui \(2026\)AGS\-KSU at QIAS 2026: a comparative study of prompting and LLM approaches for structured islamic inheritance reasoning\.InProceedings of the 7th Workshop on Open\-Source Arabic Corpora and Processing Tools \(OSACT7\), co\-located with LREC\-COLING 2026,Palma de Mallorca, Spain\.Cited by:[Table 2](https://arxiv.org/html/2606.13756#S3.T2.1.8.7.1.1.1),[§4](https://arxiv.org/html/2606.13756#S4.p3.1)\.
- A\. Singh, A\. Fry, A\. Perelman, A\. Tart, A\. Ganesh, A\. El\-Kishky, A\. McLaughlin, A\. Low, A\. Ostrow, A\. Ananthram,et al\.\(2025\)OpenAI GPT\-5 system card\.arXiv preprint arXiv:2601\.03267\.Cited by:[Appendix A](https://arxiv.org/html/2606.13756#A1.p3.2)\.
- W\. Swaileh, M\. Zighem, H\. Telli, S\. E\. Bekhouche, A\. Z\. Sellam, and F\. Dornaika \(2026\)CVPD at QIAS 2026: RAG\-guided LLM reasoning for al\-mawarith share calculation and heir allocation\.InProceedings of the 7th Workshop on Open\-Source Arabic Corpora and Processing Tools \(OSACT7\), co\-located with LREC\-COLING 2026,Palma de Mallorca, Spain\.Cited by:[Table 2](https://arxiv.org/html/2606.13756#S3.T2.1.2.1.1.1.1),[§4](https://arxiv.org/html/2606.13756#S4.p2.1)\.
- J\. Wei, X\. Wang, D\. Schuurmans, M\. Bosma, F\. Xia, E\. Chi, Q\. V\. Le, D\. Zhou,et al\.\(2022\)Chain\-of\-thought prompting elicits reasoning in large language models\.Advances in Neural Information Processing Systems35,pp\. 24824–24837\.Cited by:[Appendix A](https://arxiv.org/html/2606.13756#A1.p3.2)\.
- J\. Woo, F\. H\. Chaleshtori, A\. Marasović, and K\. Marino \(2025\)BriefMe: a legal NLP benchmark for assisting with legal briefs\.arXiv preprint arXiv:2506\.06619\.Cited by:[Appendix A](https://arxiv.org/html/2606.13756#A1.p3.2)\.
- O\. F\. Zaki \(2025\)CIS\-RG at QIAS 2025 shared task: approaches for enhancing performance of LLM on islamic legal reasoning and its mathematical calculations\.InProceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks,pp\. 935–939\.Cited by:[Appendix A](https://arxiv.org/html/2606.13756#A1.p3.2)\.

## Appendix ARelated Work

Large language models have recently been applied to a wide range of Islamic knowledge tasks, including Quranic question answeringBhatiaet al\.\([2026](https://arxiv.org/html/2606.13756#biba.bib12)\), knowledge retrievalMubaraket al\.\([2025](https://arxiv.org/html/2606.13756#biba.bib11)\); Phuc and Văn \([2025](https://arxiv.org/html/2606.13756#biba.bib9)\), and the analysis of hallucinations in Islamic contentMubaraket al\.\([2025](https://arxiv.org/html/2606.13756#biba.bib11)\)\. A recent related benchmark isIslamicMMLUAbdelaalet al\.\([2026](https://arxiv.org/html/2606.13756#biba.bib5)\), which evaluates LLMs on broad Islamic knowledge across the Quran, Hadith, and Fiqh through a large multiple\-choice benchmark\. These studies show that LLMs perform well on knowledge retrieval and basic understanding when answers rely on direct textual matching\. However, they often hallucinate and show clear limitations on tasks that require structured reasoning or deep domain knowledge\. InBouchekifet al\.\([2025b](https://arxiv.org/html/2606.13756#biba.bib16)\), the authors report that several models, such as LLaMA and ALLaM, frequently cite non\-existent Quranic verses or fabricate Hadith references\. As a result, the generated conclusions are not only incorrect, but are also supported by false religious evidence\. This behavior raises serious concerns for religious and legal applications, where correctness depends not only on the final answer but also on the authenticity and reliability of the cited sources\.

To mitigate hallucinations, recent work has explored Retrieval\-Augmented Generation \(RAG\) approaches\. While RAG improves access to relevant information and enhances factual faithfulness and citation accuracyBhatiaet al\.\([2026](https://arxiv.org/html/2606.13756#biba.bib12)\); Noureldienet al\.\([2025](https://arxiv.org/html/2606.13756#biba.bib18)\); Alowaidi \([2025](https://arxiv.org/html/2606.13756#biba.bib19)\), it remains insufficient for answering questions that require multi\-step reasoning\. This limitation has motivated the development of reasoning\-oriented models that explicitly support multi\-step inference\. In this context, recent research has increasingly focused on such models that aim to move beyond surface\-level text generation and promote more reliable reasoning\.

Models such aso3,GPT\-5Singhet al\.\([2025](https://arxiv.org/html/2606.13756#biba.bib2)\),Gemini\-2\.5\(Anilet al\.,[2023](https://arxiv.org/html/2606.13756#biba.bib4)\),Gemini3,DeepSeek\-R1\(DeepSeek AI,[2024](https://arxiv.org/html/2606.13756#biba.bib8)\), along with open models such asFanar\-C\-2\-27BAbbaset al\.\([2026](https://arxiv.org/html/2606.13756#biba.bib50)\),Falcon\-H1RChaabaneet al\.\([2026](https://arxiv.org/html/2606.13756#biba.bib51)\),Fanar\-SadiqAbbaset al\.\([2026](https://arxiv.org/html/2606.13756#biba.bib50)\)andQwen3\(Shenet al\.,[2025](https://arxiv.org/html/2606.13756#biba.bib7)\)illustrate this trend by promoting more consistent multi\-step inference through instruction tuning and large\-scale pretraining\. Evaluations of reasoning\-oriented language models have largely focused on mathematical and logical benchmarks, on which these models have achieved strong results, particularly in arithmetic reasoning, symbolic manipulation, and competition\-style mathematicsCobbeet al\.\([2021](https://arxiv.org/html/2606.13756#biba.bib20)\); Hendryckset al\.\([2021](https://arxiv.org/html/2606.13756#biba.bib21)\); Weiet al\.\([2022](https://arxiv.org/html/2606.13756#biba.bib22)\)\. Beyond mathematical and logical benchmarks, recent work has begun to investigate LLM reasoning in legally grounded settings by evaluating models on legal benchmarks such as BRIEFMEWooet al\.\([2025](https://arxiv.org/html/2606.13756#biba.bib24)\), which require structured argumentation and rule\-based reasoning\. The authors show thatGPT\-4ocan outperform human annotators on argument summarization by producing clear and coherent summaries\. Even within the Islamic domain, inheritance law has received growing attention as a challenging testbed for LLM reasoningAlDahoul and Zaki \([2025](https://arxiv.org/html/2606.13756#biba.bib13)\); R’baitiet al\.\([2025](https://arxiv.org/html/2606.13756#biba.bib29)\); Zaki \([2025](https://arxiv.org/html/2606.13756#biba.bib28)\)\. In particular, QIAS 2025222https://sites\.google\.com/view/qias2025/Bouchekifet al\.\([2025a](https://arxiv.org/html/2606.13756#biba.bib15)\)was a shared task dedicated to Islamic inheritance law \(‘ilm al\-mawārīth\), focusing on the evaluation of large language models under strict, rule\-based legal and numerical constraints, using a benchmark of2,2002,200MCQs\. A similar MCQs benchmark isMirathQAAlmasoudet al\.\([2026](https://arxiv.org/html/2606.13756#biba.bib30)\), built from13941394inheritance cases\. Studies report that commercially deployed \(e\.g\., Gemini and ChatGPT\), reasoning\-oriented models consistently outperform non\-reasoning or general\-purpose models on benchmarks requiring multi\-step inference and structured reasoningBouchekifet al\.\([2025b](https://arxiv.org/html/2606.13756#biba.bib16)\); Mohammad \([2025](https://arxiv.org/html/2606.13756#biba.bib14)\); Bekhoucheet al\.\([2025](https://arxiv.org/html/2606.13756#biba.bib25)\); Motasim Hamedet al\.\([2025](https://arxiv.org/html/2606.13756#biba.bib17)\); Hossain and Afli \([2025](https://arxiv.org/html/2606.13756#biba.bib26)\)\. Additionally,Elrefaiet al\.\([2025](https://arxiv.org/html/2606.13756#biba.bib27)\)show that a fine\-tuned Qwen3 model achieved top\-ranked performance on the QIAS 2025 shared task\. However, this evaluation setup does not allow assessing whether models truly reason correctly\. Models were required to select a single correct answer among six options, without any evaluation of the validity of their intermediate reasoning steps or the correctness of the legal justifications leading to that choice\. Moreover,Bouchekifet al\.\([2025b](https://arxiv.org/html/2606.13756#biba.bib16)\)shows that even when a model selects the correct answer, the underlying reasoning can still be incorrect or legally invalid\. In contrast, the present shared task requires models to perform end\-to\-end inheritance reasoning, explicitly generating intermediate reasoning steps, applying jurisprudential rules, and computing the final inheritance shares\.
QIAS 2026: Overview of the Shared Task on Islamic Inheritance Reasoning

Similar Articles

QU-NLP at QIAS 2026: Multi-Stage QLoRA Fine-Tuning for Arabic Islamic Inheritance Reasoning

Which Models Perform Better in Inheritance Reasoning?

SAHM: A Benchmark for Arabic Financial and Shari'ah-Compliant Reasoning

The Periodic Table of LLM Reasoning: A Structured Survey of Reasoning Paradigms, Methods, and Failure Modes

A2RBench: An Automatic Paradigm for Formally Verifiable Abstract Reasoning Benchmark Generation

Submit Feedback

Similar Articles

QU-NLP at QIAS 2026: Multi-Stage QLoRA Fine-Tuning for Arabic Islamic Inheritance Reasoning
Which Models Perform Better in Inheritance Reasoning?
SAHM: A Benchmark for Arabic Financial and Shari'ah-Compliant Reasoning
The Periodic Table of LLM Reasoning: A Structured Survey of Reasoning Paradigms, Methods, and Failure Modes
A2RBench: An Automatic Paradigm for Formally Verifiable Abstract Reasoning Benchmark Generation