ReaORE: Reasoning-Guided Progressive Open Relation Extraction Empowered by Large Reasoning Models
Summary
Proposes ReaORE, a reasoning-guided framework for open relation extraction that progressively filters and predicts relations via coarse-to-fine reasoning, outperforming existing baselines on two datasets.
View Cached Full Text
Cached at: 06/26/26, 05:20 AM
# ReaORE: Reasoning-Guided Progressive Open Relation Extraction Empowered by Large Reasoning Models
Source: [https://arxiv.org/html/2606.26986](https://arxiv.org/html/2606.26986)
Xin Lin1,2††thanks:Equal contribution\.Liang Zhang211footnotemark:1Guoqi Ma2Hongyao Tu2Jinsong Su1,2††thanks:Corresponding author\. 1National Institute for Data Science in Health and Medicine, Xiamen University, China 2School of Informatics, Xiamen University \{linxin1,lzhang\}@stu\.xmu\.edu\.cn, jssu@xmu\.edu\.cn
###### Abstract
Open Relation Extraction \(OpenRE\) requires model to extract unseen relation between head and tail entities from unstructured text for real\-world applications\. The core challenge of OpenRE lies in achieving reliable generalization to unseen relation types\. Current OpenRE approaches either employ clustering techniques, which cannot generate relation labels and suffer from poor generalization, or rely on direct relation label generation via Large Language Models \(LLMs\), which lack sufficient discriminative capacity to distinguish easily confused relations\. To address these limitations, we proposeReasoning\-guided progressiveOpenRE\(ReaORE\), a framework for performing relation extraction through coarse\-to\-fine relation reasoning\. Specifically, ReaORE consists of two key stages: \(i\) relation filtering, which reasons over multiple aspects to understand relations and instances, yielding an initial relation set, and further supplements and filters relations via embedding\-based similarity to ensure the target relation is included; \(ii\) relation prediction, which aims to predict the target relations from the above set via fine\-grained comparative reasoning to better distinguish easily confused relations\. Extensive experiments on two widely used OpenRE datasets demonstrate that ReaORE outperforms existing baselines\.111Our code will be released upon acceptance\.
ReaORE: Reasoning\-Guided Progressive Open Relation Extraction Empowered by Large Reasoning Models
Xin Lin1,2††thanks:Equal contribution\.Liang Zhang211footnotemark:1Guoqi Ma2Hongyao Tu2Jinsong Su1,2††thanks:Corresponding author\.1National Institute for Data Science in Health and Medicine, Xiamen University, China2School of Informatics, Xiamen University\{linxin1,lzhang\}@stu\.xmu\.edu\.cn, jssu@xmu\.edu\.cn
## 1Introduction
Figure 1:Comparison of open relation extraction methods\.Relation Extraction \(RE\) aims to identify potential relations between head and tail entities within unstructured textDiaz\-Garcia and Lopez \([2025](https://arxiv.org/html/2606.26986#bib.bib3)\)\. As a fundamental information extraction task, it provides essential relational fact information that support various downstream applications, such as knowledge base constructionJi and Grishman \([2011](https://arxiv.org/html/2606.26986#bib.bib12)\)and question answering systemsYu et al\. \([2017](https://arxiv.org/html/2606.26986#bib.bib25)\)\. Traditional RE is inherently limited to predefined relation types, failing to generalize to the ever\-emerging relations in real\-world scenarios\. To address this, Open Relation Extraction \(OpenRE\)Zhao et al\. \([2023](https://arxiv.org/html/2606.26986#bib.bib28)\); Tu et al\. \([2025](https://arxiv.org/html/2606.26986#bib.bib19)\)has been widely studied, aiming to extract novel relation types unseen during training\. Nevertheless, since target relation types are unseen during training, models lack sufficient discriminative features to characterize them, resulting in poor generalization to unseen relations\.
Existing OpenRE methods can be broadly categorized into two categories: clustering\-based methods and LLM\-based direct relation label generation methods\. The former typically encodes instances and some important related information into embeddings within a latent spaceHu et al\. \([2020](https://arxiv.org/html/2606.26986#bib.bib8)\); Zhao et al\. \([2021](https://arxiv.org/html/2606.26986#bib.bib27),[2023](https://arxiv.org/html/2606.26986#bib.bib28)\); Wang et al\. \([2025a](https://arxiv.org/html/2606.26986#bib.bib21),[2024](https://arxiv.org/html/2606.26986#bib.bib20)\)\. These embeddings are subsequently clustered based on their relative distances, with each cluster representing a potentially unseen relation, as illustrated in Figure[1](https://arxiv.org/html/2606.26986#S1.F1)\(a\)\. However, assigning semantic labels to each cluster requires labor\-intensive post\-processing, making these methods essentially perform relation clustering rather than relation extraction\. Moreover, their heavy reliance on training data leads to poor generalization\. Given the powerful LLMs’ capabilities across diverse natural language processing tasksXu et al\. \([2024](https://arxiv.org/html/2606.26986#bib.bib23)\), they hold considerable potential to provide OpenRE with both prior knowledge and semantic understanding\. More importantly, unlike traditional clustering\-based methods, the generative nature of LLMs enables them to directly predict relations in natural language formsTu et al\. \([2025](https://arxiv.org/html/2606.26986#bib.bib19)\), as illustrated in Figure[1](https://arxiv.org/html/2606.26986#S1.F1)\(b\)\. However, direct relation generation prevents the model from comprehensively understanding instances and relations from multiple perspectives, resulting in insufficient discriminative capacity to distinguish easily confused relations\.
Large Reasoning Models \(LRMs\)Guo et al\. \([2025](https://arxiv.org/html/2606.26986#bib.bib4)\); Jaech et al\. \([2024](https://arxiv.org/html/2606.26986#bib.bib11)\); Team \([2024](https://arxiv.org/html/2606.26986#bib.bib18)\)are characterized by their ability to generate explicit and extensive long reasoning chains, which endow models with stronger capacities for analysis, comprehension, and inference, as demonstrated across complex tasks such as mathematical reasoning and code generation\.Huang et al\. \([2025](https://arxiv.org/html/2606.26986#bib.bib9)\); Pelaccia et al\. \([2025](https://arxiv.org/html/2606.26986#bib.bib16)\); Wang et al\. \([2025b](https://arxiv.org/html/2606.26986#bib.bib22)\)\. Leveraging this deep understanding capability, LRMs can progressively analyze instances and relations from coarse to fine, achieving more accurate comprehension of relation semantics and instance characteristics\. More critically, reasoning represents a generalizable capability: by genuinely understanding relation semantics, LRMs can produce well\-grounded predictions for unseen relation types, which precisely addresses the core generalization requirement of OpenRE\. Furthermore, even for easily confused relations, reasoning enables the model to explicitly derive judgment evidence through in\-depth analysis, yielding reliable predictions\. Therefore, incorporating reasoning into OpenRE represents a highly promising research direction\.
In this paper, we propose a reasoning\-guided progressive OpenRE framework \(ReaORE\) that performs coarse\-to\-fine relation extraction through explicit relation reasoning\. In ReaORE, the first stage is relation filtering, which finds relations compatible with the input instance by coarse\-grained reasoning about global semantics and entity types, and further supplements and filters them by embedding\-based similarity\. The second stage is relation prediction, which performs fine\-grained contrastive reasoning, explicitly generating exclusive judgment evidence to accurately distinguish easily confused relations and determine the final prediction\.
In summary, our main contributions are as follows:
- •We propose ReaORE, a reasoning\-based progressive OpenRE framework that enhances the model’s generalization capability to unseen relation types through reasoning\.
- •We introduce a fine\-grained contrastive reasoning strategy to effectively distinguish easily confused relations, thereby achieving better relation generalization\.
- •Extensive experiments on two widely used OpenRE benchmark datasets demonstrate that ReaORE significantly outperforms existing baseline methods, providing a viable solution for empowering OpenRE with deliberate reasoning\.
Figure 2:Overview of ReaORE Framework\. ReaORE follows a coarse\-to\-fine process: it first filters unseen relations through matching\-based reasoningℳ1\\mathcal\{M\}\_\{1\}and embedding modelℳ2\\mathcal\{M\}\_\{2\}, and then predicts the final relation via fine\-grained contrastive reasoningℳ3\\mathcal\{M\}\_\{3\}\.ℳ1\\mathcal\{M\}\_\{1\}andℳ3\\mathcal\{M\}\_\{3\}are fine\-tuned with long\-reasoning\-chain data, whereasℳ2\\mathcal\{M\}\_\{2\}is training\-free\.
## 2Task Formulation
Before presenting our framework, we first introduce the preliminary notations\. Letℛk\\mathcal\{R\}\_\{k\}denote the set of known relations observed during training andℛu\\mathcal\{R\}\_\{u\}denote the set of unseen relations used for evaluation, whereℛk∩ℛu=∅\\mathcal\{R\}\_\{k\}\\cap\\mathcal\{R\}\_\{u\}=\\emptyset\. During training, the model can access labeled instances associated with relations inℛk\\mathcal\{R\}\_\{k\}\. At test time, given an unlabeled dataset𝒟=\{xi\}i=1N\\mathcal\{D\}=\\\{x\_\{i\}\\\}\_\{i=1\}^\{N\}, each instancexi=⟨si,hi,ti⟩x\_\{i\}=\\langle s\_\{i\},h\_\{i\},t\_\{i\}\\rangleconsists of a sentencesis\_\{i\}, a head entityhih\_\{i\}, and a tail entitytit\_\{i\}\. The goal is to predict a relation labelr^i∈ℛu\\hat\{r\}\_\{i\}\\in\\mathcal\{R\}\_\{u\}that best describes the semantic relation expressed betweenhih\_\{i\}andtit\_\{i\}insis\_\{i\}\.
For each relation extraction, we do not provide additional information such as instance descriptions or supplementary entity information\. ReaORE leverages the prior knowledge and semantic understanding of LRM to infer relation semantics and head and tail entity constraints from the input context\. This formulation reflects a real\-world OpenRE setting where new, unknown relations emerge continuously, but there are no labeled training instances provided for these unseen relations, and no supplementary knowledge beyond the original instance text to aid analysis\. Since clustering\-based methods and direct relation label generation methods often suffer from limited generalization and unreliable predictions, ReaORE first derives a compact candidate setℛc\\mathcal\{R\}\_\{c\}through matching\-based reasoning and embedding\-based relation reranking, and then performs fine\-grained contrastive reasoning overℛc\\mathcal\{R\}\_\{c\}to determine the final prediction\.
## 3Methodology
In this section, we present ReaORE, a reasoning\-guided progressive framework for OpenRE that extracts unseen relations through coarse\-to\-fine relation reasoning\. Rather than directly generating a relation label or clustering instances, ReaORE decomposes prediction into two stages: relation filtering and relation prediction\. Relation filtering first identifies relations compatible with the input instance through coarse\-grained reasoning and then refinesℛc\\mathcal\{R\}\_\{c\}via embedding\-based reranking to improve relation coverage\. Relation prediction further applies fine\-grained contrastive reasoning overℛc\\mathcal\{R\}\_\{c\}, producing explicit judgment evidence for the final decision\. Figure[2](https://arxiv.org/html/2606.26986#S1.F2)provides an overview of the framework\.
### 3\.1Relation Filtering
Relation filtering aims to preliminarily constructℛc\\mathcal\{R\}\_\{c\}over unseen relation types, making it compact while preserving high coverage\. It contains two complementary steps: matching\-based reasoning first provides coarse\-grained binary judgments, and relation reranking then uses embedding\-space similarity to recover plausible relations that may be missed by the reasoning model\.
Matching\-Based Reasoning\.For each input instancexi=⟨si,hi,ti⟩x\_\{i\}=\\langle s\_\{i\},h\_\{i\},t\_\{i\}\\rangle, the matching\-based inference modelℳ1\\mathcal\{M\}\_\{1\}first performs query analysis on the sentencesis\_\{i\}to better understand its meaning and then analyzes the head and tail entity types within the given context\. This step makes the input evidence more explicit, ensuring that subsequent relation matching is based on sentence facts and entity types, not just surface label similarity\.
After query analysis,ℳ1\\mathcal\{M\}\_\{1\}continues to generate reasoning that performs pattern matching for each relationrr\. The model first interprets the natural\-language meaning ofrrand infers its expected head\- and tail\-side argument constraints\. It then judges whetherrrmatchesxix\_\{i\}from three aspects\.Semantic matchingdetermines whether the relation meaning is explicitly supported by the facts insis\_\{i\}; the model must identify the key evidence supporting or contradictingrr, or state what information is missing\.Head\-entity type matchingchecks whetherhih\_\{i\}satisfies the head\-side argument constraint ofrr, whiletail\-entity type matchingchecks whethertit\_\{i\}satisfies the tail\-side argument constraint\. Each matching result is represented by a Boolean value, where 1 denotes a match and 0 denotes a mismatch, and every judgment is accompanied by an explanatory rationale\. Figure[3](https://arxiv.org/html/2606.26986#S3.F3)\(a\) gives an abbreviated example of this matching\-based reasoning record\.
The total matching score for relationrris computed asS\(r\)=Ss\(r\)\+Sh\(r\)\+St\(r\)S\(r\)=S\_\{s\}\(r\)\+S\_\{h\}\(r\)\+S\_\{t\}\(r\), whereSs\(r\),Sh\(r\),St\(r\)S\_\{s\}\(r\),S\_\{h\}\(r\),S\_\{t\}\(r\)∈\\in\{0,1\}\\\{0,1\\\}denote the binary judgments for semantic matching, head\-entity type matching, and tail\-entity type matching, respectively\. A higherS\(r\)S\(r\)means that more matching aspects support relationrrbeing expressed byxix\_\{i\}\. Based on these multi\-aspect judgments,ℳ1\\mathcal\{M\}\_\{1\}produces the initial relation tiers in the relation filtering stage:ℛ1=\{r∣S\(r\)<2\}\\mathcal\{R\}\_\{1\}=\\\{r\\mid S\(r\)<2\\\},ℛ2=\{r∣S\(r\)=2\}\\mathcal\{R\}\_\{2\}=\\\{r\\mid S\(r\)=2\\\}, andℛ3=\{r∣S\(r\)=3\}\\mathcal\{R\}\_\{3\}=\\\{r\\mid S\(r\)=3\\\}\.

\(a\) Matching\-based Reasoning Record\.

\(b\) Comparative Reasoning Record\.
Figure 3:Examples of the structured reasoning records produced by ReaORE\. \(a\) The matching\-based reasoning record analyzes the query and evaluates relations from semantic, head\-entity type, and tail\-entity type perspectives\. \(b\) The comparative reasoning record summarizes pairwise comparisons overℛc\\mathcal\{R\}\_\{c\}and the final decision\.Relation Reranking\.Afterℳ1\\mathcal\{M\}\_\{1\}produces the score\-based relation tiers, some plausible relations may still be placed in lower tiers because the sentence expresses the relation implicitly or the matching evidence is incomplete\. To improve target\-relation coverage, ReaORE applies the embedding modelℳ2\\mathcal\{M\}\_\{2\}to supplement the tiers through embedding\-based similarity\. As shown in Figure[2](https://arxiv.org/html/2606.26986#S1.F2),ℛi\\mathcal\{R\}\_\{i\}denotes one of the score\-based tiers produced byℳ1\\mathcal\{M\}\_\{1\}, wherei∈\{1,2,3\}i\\in\\\{1,2,3\\\}\. For a source tier𝒮\\mathcal\{S\},ℳ2\\mathcal\{M\}\_\{2\}encodes the input instancexix\_\{i\}and all relations in𝒮\\mathcal\{S\}\. It partitions𝒮\\mathcal\{S\}into relation clusters𝒞\(𝒮\)=\{C1,…,CK\}\\mathcal\{C\}\(\\mathcal\{S\}\)=\\\{C\_\{1\},\\ldots,C\_\{K\}\\\}based on relation embeddings, where eachCj⊆𝒮C\_\{j\}\\subseteq\\mathcal\{S\}is a set of relation labels\. Let𝐦j\\mathbf\{m\}\_\{j\}denote the centroid embedding of clusterCjC\_\{j\}, computed by mean pooling the embeddings of relations inCjC\_\{j\}\. The cluster most similar toxix\_\{i\}is selected by:
k∗\(𝒮\)=argmaxjcos\(𝐦j,𝐞xi\),k^\{\*\}\(\\mathcal\{S\}\)=\\arg\\max\_\{j\}\\cos\(\\mathbf\{m\}\_\{j\},\\mathbf\{e\}\_\{x\_\{i\}\}\),\(1\)where𝐞xi\\mathbf\{e\}\_\{x\_\{i\}\}is the embedding of the input instance\. We denote the selected relation cluster asC∗\(𝒮\)=Ck∗\(𝒮\)C^\{\*\}\(\\mathcal\{S\}\)=C\_\{k^\{\*\}\(\\mathcal\{S\}\)\}\. ReaORE applies this operation progressively:
ℛ~2=ℛ2∪C∗\(ℛ1\),ℛ~3=ℛ3∪C∗\(ℛ~2\)\.\\widetilde\{\\mathcal\{R\}\}\_\{2\}=\\mathcal\{R\}\_\{2\}\\cup C^\{\*\}\(\\mathcal\{R\}\_\{1\}\),\\quad\\widetilde\{\\mathcal\{R\}\}\_\{3\}=\\mathcal\{R\}\_\{3\}\\cup C^\{\*\}\(\\widetilde\{\\mathcal\{R\}\}\_\{2\}\)\.\(2\)In Figure[2](https://arxiv.org/html/2606.26986#S1.F2), relation reranking is applied twice\. First,ℛ1\\mathcal\{R\}\_\{1\}is reranked byℳ2\\mathcal\{M\}\_\{2\}, and the selected clusterC∗\(ℛ1\)C^\{\*\}\(\\mathcal\{R\}\_\{1\}\)is merged withℛ2\\mathcal\{R\}\_\{2\}to obtainℛ~2\\widetilde\{\\mathcal\{R\}\}\_\{2\}\. Second,ℛ~2\\widetilde\{\\mathcal\{R\}\}\_\{2\}is reranked in the same way, and the selected clusterC∗\(ℛ~2\)C^\{\*\}\(\\widetilde\{\\mathcal\{R\}\}\_\{2\}\)is merged withℛ3\\mathcal\{R\}\_\{3\}to obtainℛ~3\\widetilde\{\\mathcal\{R\}\}\_\{3\}\. ReaORE then ranks the relations inℛ~3\\widetilde\{\\mathcal\{R\}\}\_\{3\}by their embedding similarity toxix\_\{i\}and keeps the top\-ξ\\xirelations asℛc\\mathcal\{R\}\_\{c\}forℳ3\\mathcal\{M\}\_\{3\}\.
### 3\.2Relation Prediction
After relation filtering,ℛc\\mathcal\{R\}\_\{c\}is passed to the contrastive reasoning modelℳ3\\mathcal\{M\}\_\{3\}together with the input instancexix\_\{i\}\. Since these relations have already been filtered and are all relevant to the instance, they are often semantically close and difficult to distinguish, such as “*location*” vs\. “*located in the administrative territorial entity*” or “*field of work*” vs\. “*occupation*”\. A one\-step label decision is therefore unreliable, because the model may select a broadly related relation without comparing it with more precise alternatives\.
As shown in Figure[2](https://arxiv.org/html/2606.26986#S1.F2),ℳ3\\mathcal\{M\}\_\{3\}performs fine\-grained comparative reasoning overℛc\\mathcal\{R\}\_\{c\}\. For each relation pair\(ra,rb\)\(r\_\{a\},r\_\{b\}\),ℳ3\\mathcal\{M\}\_\{3\}judges which relation is better supported by the sentencesis\_\{i\}and the entity pair\(hi,ti\)\(h\_\{i\},t\_\{i\}\)\. Each comparison follows three criteria: \(1\) Evidence Verification, which checks whether the sentence provides explicit evidence for the relation; \(2\) Semantic Granularity, which prefers the more specific relation when two relations are both plausible; and \(3\) Contextual Alignment, which selects the relation that best matches the event or phrase expressed insis\_\{i\}\. The output is stored as a comparative reasoning record\. ItsPairwise Comparison Logrecords the compared pair, the winner, and the supporting reason for each comparison, while itsFinal Decision Summaryreports the best relation and the decision logic\. Figure[3](https://arxiv.org/html/2606.26986#S3.F3)\(b\) illustrates this record with abbreviated pairwise rationales, and complete traces are provided in Appendix[E](https://arxiv.org/html/2606.26986#A5)\.
This stage turns relation prediction from direct label generation into explicit comparison overℛc\\mathcal\{R\}\_\{c\}\. By requiringℳ3\\mathcal\{M\}\_\{3\}to justify why one relation is preferred over alternatives, ReaORE better distinguishes easily confused relations and grounds the final answer in sentence evidence\.
### 3\.3Model Training
In ReaORE,ℳ1\\mathcal\{M\}\_\{1\}andℳ3\\mathcal\{M\}\_\{3\}denote two LRMs used for different reasoning stages\. To train their stage\-specific reasoning abilities, we construct separate reasoning\-chain data for matching\-based reasoning and fine\-grained comparative reasoning\. Forℳ1\\mathcal\{M\}\_\{1\}, the training target contains query analysis and relation\-wise judgments over semantic matching, head\-entity type matching, and tail\-entity type matching, enabling the model to construct relation tiers through multi\-aspect matching\. Forℳ3\\mathcal\{M\}\_\{3\}, the training target contains pairwise comparisons and a final decision summary, enabling the model to determine the final prediction through fine\-grained comparative reasoning\.
We construct the training data by using Gemini\-2\.5 Pro as the teacher model on labeled instances from the known relation setℛk\\mathcal\{R\}\_\{k\}only; no labeled instances fromℛu\\mathcal\{R\}\_\{u\}are used during training\. For each stage, the teacher model is prompted to generate the corresponding stage\-specific reasoning chain and answer\. The prompts define the task format and output constraints for the two reasoning stages\. During data construction, they are used to guide the teacher model in generating reasoning\-chain training examples; the complete prompts are provided in Appendix[D](https://arxiv.org/html/2606.26986#A4)\. We then apply rejection sampling and retain only examples whose reasoning output leads to the ground\-truth relation\. This procedure yields 1,800 high\-quality examples for matching\-based reasoning and 1,800 examples for fine\-grained comparative reasoning\. Finally, we perform Supervised Fine\-Tuning \(SFT\) separately on the two types of reasoning\-chain training data, producingℳ1\\mathcal\{M\}\_\{1\}andℳ3\\mathcal\{M\}\_\{3\}used in our experiments\.
## 4Experiments
### 4\.1Settings
Datasets\.We conduct experiments on two widely used relation extraction datasets: FewRelHan et al\. \([2018](https://arxiv.org/html/2606.26986#bib.bib5)\)and TACREDZhang et al\. \([2017](https://arxiv.org/html/2606.26986#bib.bib26)\)\. To ensure fair comparison with prior OpenRE studies, we follow the relation\-splitting protocol of ASCOREZhao et al\. \([2023](https://arxiv.org/html/2606.26986#bib.bib28)\)\. FewRel contains 80 relation types with 700 instances for each relation; the first 40 relations are used as known relations, and the remaining 40 are treated as unseen relations\. TACRED contains 41 relation types; the first 20 relations are used as known relations, while the remaining 21 form the unseen relation set\.
Metrics\.Following ASCOREZhao et al\. \([2023](https://arxiv.org/html/2606.26986#bib.bib28)\), we evaluate ReaORE with both clustering and classification metrics\. For clustering\-oriented evaluation, we report B3precision, recall, and F1Bagga and Baldwin \([1998](https://arxiv.org/html/2606.26986#bib.bib1)\), V\-measure homogeneity, completeness, and F1Rosenberg and Hirschberg \([2007](https://arxiv.org/html/2606.26986#bib.bib17)\), and Adjusted Rand Index \(ARI\)Hubert and Arabie \([1985](https://arxiv.org/html/2606.26986#bib.bib10)\)\. For relation classification, we report precision, recall, and Macro\-F1Opitz and Burst \([2019](https://arxiv.org/html/2606.26986#bib.bib15)\)\.
Baselines\.We compare ReaORE with three SLM\-based OpenRE methods and three LLM\-based methods\. The SLM\-based baselines include SelfOREHu et al\. \([2020](https://arxiv.org/html/2606.26986#bib.bib8)\), RoCOREZhao et al\. \([2021](https://arxiv.org/html/2606.26986#bib.bib27)\), and ASCOREZhao et al\. \([2023](https://arxiv.org/html/2606.26986#bib.bib28)\)\. The LLM\-based baselines include vanilla Qwen3\-8BYang et al\. \([2025](https://arxiv.org/html/2606.26986#bib.bib24)\), ORELLMWang et al\. \([2024](https://arxiv.org/html/2606.26986#bib.bib20)\), and LLM\-OREFTu et al\. \([2025](https://arxiv.org/html/2606.26986#bib.bib19)\)\. More details about the baselines are provided in Appendix[A\.1](https://arxiv.org/html/2606.26986#A1.SS1)\.
Implementation Details\.We use Qwen3\-8BYang et al\. \([2025](https://arxiv.org/html/2606.26986#bib.bib24)\)as the LRM backbone and standardize all LLM\-based baselines to the same backbone for fair comparison\. The reasoning models are fine\-tuned via LLaMA\-FactoryZheng et al\. \([2024](https://arxiv.org/html/2606.26986#bib.bib29)\)with LoRAHu et al\. \([2021](https://arxiv.org/html/2606.26986#bib.bib7)\)for 5 epochs\. We set the LoRA rank, scaling factor, and dropout to 64, 256, and 0\.1, respectively, with a maximum sequence length of 4,096 tokens\. Optimization uses AdamW with a learning rate of1×10−41\\times 10^\{\-4\}, a cosine scheduler, and a warmup ratio of 0\.1\. BGE\-M3Chen et al\. \([2024](https://arxiv.org/html/2606.26986#bib.bib2)\)is used as the training\-free embedding backbone, andξ\\xiis set to 5 by default\. We use vLLMKwon et al\. \([2023](https://arxiv.org/html/2606.26986#bib.bib13)\)on 2 A100 80GB GPUs; see Appendix[A\.2](https://arxiv.org/html/2606.26986#A1.SS2)\.
Table 1:Performance comparison on FewRel and TACRED datasets\. The best and second\-best results are marked inboldandunderlined, respectively\.
### 4\.2Main Results
Table[1](https://arxiv.org/html/2606.26986#S4.T1)reports the main results on FewRel and TACRED\. Overall, ReaORE achieves the best or tied\-best performance on the main clustering and classification metrics across both datasets, demonstrating the effectiveness of progressive reasoning for unseen relation extraction\.
On FewRel, SLM\-based methods remain competitive on clustering metrics, with RoCORE tying ReaORE on ARI and ASCORE achieving the second\-best V\-measure F1\. Among LLM\-based methods, LLM\-OREF obtains the highest B3F1, but its classification Macro\-F1is lower than ReaORE\. This indicates that a higher clustering score on a single metric does not necessarily translate into better semantic relation prediction\. ReaORE achieves the best V\-measure F1and classification Macro\-F1, and ties the best ARI, showing its effectiveness in predicting unseen relation types\.
On TACRED, SLM\-based baselines show larger performance gaps, especially on classification metrics, indicating that relation extraction becomes more difficult when relation distributions are more imbalanced\. LLM\-based methods improve over SLM\-based methods on several metrics, but their direct prediction or clustering\-oriented strategies still lag behind ReaORE\. ReaORE obtains the strongest overall performance on TACRED, achieving the best B3F1, V\-measure F1, ARI, and classification Macro\-F1\. These results suggest that the coarse\-to\-fine reasoning process helps ReaORE maintain reliable generalization under a more challenging unseen\-relation distribution\.
### 4\.3Ablation Study
Table 2:Ablation study results on FewRel and TACRED datasets\. B3, V\-M, and Cls\. denote B3F1, V\-Measure F1, and classification Macro\-F1\.The ablation study examines whether each part of ReaORE contributes to the intended coarse\-to\-fine reasoning process, with results shown in Table[2](https://arxiv.org/html/2606.26986#S4.T2)\. Due to space limitations, we report only F1scores for B3, V\-Measure, and classification metrics\. They show that ReaORE does not rely on a single strong LRM call; instead, its advantage comes from decomposing OpenRE into multi\-aspect relation filtering, embedding\-based relation supplementation and filtering, and fine\-grained contrastive reasoning with exclusive judgment evidence\.
Effect of the relation reranking\.To examine the effectiveness of relation reranking, we remove this step after matching\-based reasoning\. In this variant,ℳ1\\mathcal\{M\}\_\{1\}still produces the score\-based relation tiers, butℛc\\mathcal\{R\}\_\{c\}is constructed only from the preliminary matching results, typically using the most matched tierℛ3\\mathcal\{R\}\_\{3\}without embedding\-based supplementation\. This setting tests whether binary matching judgments alone are sufficient before fine\-grained comparative reasoning\. After removing relation reranking, classification Macro\-F1decreases from 84\.2 to 83\.2 on FewRel and from 71\.4 to 70\.9 on TACRED, while ARI also drops by 7\.2 and 0\.9 points, respectively\. These results indicate that relation reranking is useful for recovering plausible relations that may not receive the highest matching score, thereby improving the coverage ofℛc\\mathcal\{R\}\_\{c\}before relation prediction\.
Effect of relation filtering\.We further evaluate whether relation filtering is necessary before relation prediction\. The variant without bothℳ1\\mathcal\{M\}\_\{1\}andℳ2\\mathcal\{M\}\_\{2\}removes the entire relation filtering stage and directly exposesℳ3\\mathcal\{M\}\_\{3\}to the full unseen relation space\. Althoughℳ3\\mathcal\{M\}\_\{3\}still performs fine\-grained comparative reasoning, the comparison objects have not been compressed and filtered, making the relation space less concise and thus impairing performance\. As shown in Table[2](https://arxiv.org/html/2606.26986#S4.T2), removing relation filtering leads to consistent drops on all reported metrics for both FewRel and TACRED\. On FewRel, B3, V\-Measure, ARI, and classification Macro\-F1decrease to 79\.8, 84\.2, 69\.0, and 81\.7, respectively; on TACRED, they decrease to 76\.8, 81\.2, 78\.5, and 69\.9\. The performance drop indicates that fine\-grained comparative reasoning still requires a compact input space: too many loosely related relations introduce noise into the comparison process, and the increased number of comparisons also lengthens the reasoning chain, which may increase the risk of relevant information being underused in long contextsLiu et al\. \([2024](https://arxiv.org/html/2606.26986#bib.bib14)\); He et al\. \([2024](https://arxiv.org/html/2606.26986#bib.bib6)\)\. Therefore, relation filtering is necessary for constructing a focusedℛc\\mathcal\{R\}\_\{c\}before relation prediction\.
Effect of relation prediction\.We next examine whether relation filtering alone is sufficient for OpenRE\. The variant withoutℳ3\\mathcal\{M\}\_\{3\}keeps the relation filtering stage but removes fine\-grained comparative reasoning, so the final decision is made without explicit pairwise comparison among the filtered relations\. As shown in Table[2](https://arxiv.org/html/2606.26986#S4.T2), removingℳ3\\mathcal\{M\}\_\{3\}causes clear drops on both datasets\. On FewRel, V\-Measure decreases from 89\.6 to 78\.6 and ARI decreases from 80\.7 to 70\.0; on TACRED, B3, V\-Measure, ARI, and classification Macro\-F1decrease to 77\.0, 81\.5, 78\.8, and 69\.9, respectively\. These results indicate that relation filtering can narrow the relation space, but it cannot replace fine\-grained comparison when the remaining relations are semantically close\. Thus,ℳ3\\mathcal\{M\}\_\{3\}is necessary for turningℛc\\mathcal\{R\}\_\{c\}into a final relation extraction decision\.
Effect of intermediate reasoning\.We finally evaluate the role of intermediate reasoning chains\. The w/o IR\. variant keeps the same overall prediction pipeline but removes the structured reasoning process, asking the model to output the final label without explicit intermediate analysis\. This setting examines whether the gains of ReaORE come only from the pipeline structure, or also from the reasoning chains generated within each stage\. As shown in Table[2](https://arxiv.org/html/2606.26986#S4.T2), w/o IR\. consistently underperforms the full framework on all reported metrics and both datasets\. On FewRel, classification Macro\-F1decreases from 84\.2 to 78\.0; on TACRED, it decreases from 71\.4 to 69\.5\. The degradation suggests that intermediate reasoning is part of the decision process rather than a post\-hoc explanation\. It makes the evidence used in relation–instance matching, relation reranking, and pairwise comparison explicit, which leads to more reliable final predictions\.
### 4\.4Analysis
Effectiveness on LRMs of Varying Sizes\.We evaluate ReaORE with Qwen3\-8B and Qwen3\-14B backbones to study whether the framework benefits from stronger LRMs\. As shown in Figure[4](https://arxiv.org/html/2606.26986#S4.F4), increasing the parameter scale consistently improves performance on both FewRel and TACRED\. For classification Macro\-F1, the score improves from 84\.2% to 85\.3% on FewRel and from 71\.4% to 72\.5% on TACRED\. Similar gains are observed for clustering metrics, indicating that ReaORE can leverage stronger reasoning capacity across both relation discovery and relation classification views\.
Figure 4:Performance comparison on FewRel and TACRED across different model scales\.Impact of Candidate Set Size\.We further study the sensitivity ofξ\\xi, which denotes the number of top\-ranked relations kept fromℛ~3\\widetilde\{\\mathcal\{R\}\}\_\{3\}to formℛc\\mathcal\{R\}\_\{c\}forℳ3\\mathcal\{M\}\_\{3\}\. We varyξ\\xiover\{3,4,5,6\}\\\{3,4,5,6\\\}and report the results in Table[3](https://arxiv.org/html/2606.26986#S4.T3)\.
Table 3:Sensitivity analysis ofξ\\xi, which controls the number of top\-ranked relations kept inℛc\\mathcal\{R\}\_\{c\}\.The results show that performance is sensitive to the size ofℛc\\mathcal\{R\}\_\{c\}\. Whenξ\\xiis too small,ℛc\\mathcal\{R\}\_\{c\}may exclude the target relation before relation prediction, limiting the effectiveness ofℳ3\\mathcal\{M\}\_\{3\}\. Whenξ\\xiis too large, more loosely related relations are introduced into pairwise comparison, increasing the difficulty of fine\-grained comparative reasoning\. The best performance is achieved atξ=5\\xi=5on both datasets, which provides a suitable balance between coverage and compactness\.
## 5Related Work
Traditional OpenRE methods predominantly adopt clustering paradigms based on Small Language Models\(SLMs\), reformulating relation extraction as a problem of unsupervised orweakly supervisedd representation learning and clustering\. For instance, SelfOREHu et al\. \([2020](https://arxiv.org/html/2606.26986#bib.bib8)\)proposes a self\-supervised framework that optimizes contextual representations through adaptive soft clustering and an iterative pseudo\-labeling mechanism\. RoCOREZhao et al\. \([2021](https://arxiv.org/html/2606.26986#bib.bib27)\)leverages labeled data to learn relation\-oriented representations, facilitating the discovery of unseen, unlabeled relation types\. To further improve clustering quality, ASCOREZhao et al\. \([2023](https://arxiv.org/html/2606.26986#bib.bib28)\)introduces an active learning strategy that dynamically selects representative instances for manual annotation, guiding the model to identify potentially relation clusters\.
However, these SLM\-based clustering methods merely reflect the similarities between instances and lack explicit semantic understanding\. Consequently, when handling semantically similar relations, these methods often exhibit weak discriminative power and poor generalization performance\. MixOREWang et al\. \([2025a](https://arxiv.org/html/2606.26986#bib.bib21)\)introduces LLMs into OpenRE but fails to break free from the clustering paradigm\. It remains subject to the inherent limitations of this paradigm that merely performing clustering does not constitute true relation extraction, thereby necessitating manual post\-annotation\. LLM\-OREFTu et al\. \([2025](https://arxiv.org/html/2606.26986#bib.bib19)\)predicts final relation label directly by prompting an LLM to generate the semantic relation between entity pair given an input instance and a few demonstrations\. To the best of our knowledge, this work represents the first attempt to introduce LRMs into the field of OpenRE\. Our objective is to leverage the fine\-grained reasoning capabilities of LRMs to achieve a thorough understanding of both input instances and relation labels through explicit, multi\-step reasoning chains\. This enables ReaORE to distinguish between semantically similar relations, thereby effectively enhancing its generalization capabilities within complex, open\-world scenarios\.
## 6Conclusion
In this paper, we proposed ReaORE, a reasoning\-guided progressive framework for OpenRE that improves generalization to unseen relation types through coarse\-to\-fine relation reasoning\. Unlike clustering\-based methods that require post\-hoc relation labeling or LLM\-based methods that directly generate relation labels, ReaORE decomposes OpenRE into relation filtering and relation prediction\. In relation filtering, matching\-based reasoning first performs multi\-aspect reasoning over sentence semantics and entity\-type constraints to yield an initial relation set, and relation reranking then supplements and filters relations via embedding\-based similarity to improve target\-relation coverage before prediction\. The relation prediction stage performs fine\-grained comparative reasoning to predict the final relation, aiming to distinguish easily confused relations through explicit pairwise judgment evidence\. To train the stage\-specific reasoning abilities of the two reasoning models, we construct two reasoning\-chain datasets and fine\-tune them separately\. Experiments show that ReaORE outperforms existing baselines on both clustering\-oriented and classification metrics, demonstrating that explicit progressive reasoning improves generalization to unseen relation types\.
## Limitations
ReaORE improves unseen relation extraction through explicit progressive reasoning, but its adaptation process is still not fully dynamic\. ReaORE follows the standard OpenRE setting, where the unseen relation inventory is fixed during evaluation\. Within this setting, however, its adaptation mechanism is still relatively static: relation filtering, relation reranking, and final comparative reasoning are performed with fixed steps and a fixed filtering size, rather than being adjusted according to the difficulty of each instance\. Future work can explore more adaptive OpenRE frameworks that dynamically adjust the filtering granularity and control the depth of comparative reasoning according to instance difficulty\. Finally, the reasoning\-chain training data are generated by a teacher model on known\-relation instances\. Although rejection sampling is used to retain high\-quality examples, the resulting reasoning behavior may still be affected by the teacher model’s coverage and judgment patterns\.
## References
- Bagga and Baldwin \(1998\)Amit Bagga and Breck Baldwin\. 1998\.Algorithms for scoring coreference chains\.In*The first international conference on language resources and evaluation workshop on linguistics coreference*, volume 1, pages 563–566\. Citeseer\.
- Chen et al\. \(2024\)Jianlv Chen, Shitao Xiao, Peitian Zhang, Kun Luo, Defu Lian, and Zheng Liu\. 2024\.Bge m3\-embedding: Multi\-lingual, multi\-functionality, multi\-granularity text embeddings through self\-knowledge distillation\.*arXiv preprint arXiv:2402\.03216*, 4\(5\)\.
- Diaz\-Garcia and Lopez \(2025\)Jose A Diaz\-Garcia and Julio Amador Diaz Lopez\. 2025\.A survey on cutting\-edge relation extraction techniques based on language models\.*Artificial Intelligence Review*, 58\(9\):287\.
- Guo et al\. \(2025\)Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, and 1 others\. 2025\.Deepseek\-r1 incentivizes reasoning in llms through reinforcement learning\.*Nature*, 645\(8081\):633–638\.
- Han et al\. \(2018\)Xu Han, Hao Zhu, Pengfei Yu, Ziyun Wang, Yuan Yao, Zhiyuan Liu, and Maosong Sun\. 2018\.Fewrel: A large\-scale supervised few\-shot relation classification dataset with state\-of\-the\-art evaluation\.In*Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing*, pages 4803–4809\.
- He et al\. \(2024\)Junqing He, Kunhao Pan, Xiaoqun Dong, Zhuoyang Song, Yibo Liu, Qianguo Sun, Yuxin Liang, Hao Wang, Enming Zhang, and Jiaxing Zhang\. 2024\.[Never lost in the middle: Mastering long\-context question answering with position\-agnostic decompositional training](https://doi.org/10.18653/v1/2024.acl-long.736)\.In*Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\)*, pages 13628–13642\.
- Hu et al\. \(2021\)Edward J\. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen\-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen\. 2021\.[Lora: Low\-rank adaptation of large language models](https://arxiv.org/abs/2106.09685)\.*Preprint*, arXiv:2106\.09685\.
- Hu et al\. \(2020\)Xuming Hu, Lijie Wen, Yusong Xu, Chenwei Zhang, and Philip S Yu\. 2020\.Selfore: Self\-supervised relational feature learning for open relation extraction\.In*Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing \(EMNLP\)*, pages 3673–3682\.
- Huang et al\. \(2025\)Kaixuan Huang, Jiacheng Guo, Zihao Li, Xiang Ji, Jiawei Ge, Wenzhe Li, Yingqing Guo, Tianle Cai, Hui Yuan, Runzhe Wang, and 1 others\. 2025\.Math\-perturb: Benchmarking llms’ math reasoning abilities against hard perturbations\.*arXiv preprint arXiv:2502\.06453*\.
- Hubert and Arabie \(1985\)Lawrence Hubert and Phipps Arabie\. 1985\.Comparing partitions\.*Journal of classification*, 2:193–218\.
- Jaech et al\. \(2024\)Aaron Jaech, Adam Kalai, Adam Lerer, Adam Richardson, Ahmed El\-Kishky, Aiden Low, Alec Helyar, Aleksander Madry, Alex Beutel, Alex Carney, and 1 others\. 2024\.Openai o1 system card\.*arXiv preprint arXiv:2412\.16720*\.
- Ji and Grishman \(2011\)Heng Ji and Ralph Grishman\. 2011\.Knowledge base population: Successful approaches and challenges\.In*Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies*, pages 1148–1158\.
- Kwon et al\. \(2023\)Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E\. Gonzalez, Hao Zhang, and Ion Stoica\. 2023\.[Efficient memory management for large language model serving with pagedattention](https://arxiv.org/abs/2309.06180)\.*Preprint*, arXiv:2309\.06180\.
- Liu et al\. \(2024\)Nelson F\. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang\. 2024\.[Lost in the middle: How language models use long contexts](https://doi.org/10.1162/tacl_a_00638)\.*Transactions of the Association for Computational Linguistics*, 12:157–173\.
- Opitz and Burst \(2019\)Juri Opitz and Sebastian Burst\. 2019\.Macro f1 and macro f1\.*arXiv preprint arXiv:1911\.03347*\.
- Pelaccia et al\. \(2025\)Thierry Pelaccia, Jonathan Sherbino, Peter Wyer, and Geoff Norman\. 2025\.Diagnostic reasoning and cognitive error in emergency medicine: Implications for teaching and learning\.*Academic Emergency Medicine*, 32\(3\):320–326\.
- Rosenberg and Hirschberg \(2007\)Andrew Rosenberg and Julia Hirschberg\. 2007\.V\-measure: A conditional entropy\-based external cluster evaluation measure\.*Empirical Methods in Natural Language Processing,Empirical Methods in Natural Language Processing*\.
- Team \(2024\)Qwen Team\. 2024\.Qwq: Reflect deeply on the boundaries of the unknown\.
- Tu et al\. \(2025\)Hongyao Tu, Liang Zhang, Yujie Lin, Xin Lin, Haibo Zhang, Long Zhang, and Jinsong Su\. 2025\.Llm\-oref: An open relation extraction framework based on large language models\.In*Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing*, pages 9051–9063\.
- Wang et al\. \(2024\)Jiaxin Wang, Lingling Zhang, Wee Sun Lee, Yujie Zhong, Liwei Kang, and Jun Liu\. 2024\.When phrases meet probabilities: Enabling open relation extraction with cooperating large language models\.In*Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\)*, pages 13130–13147\.
- Wang et al\. \(2025a\)Qing Wang, Yuepei Li, Qiao Qiao, Kang Zhou, and Qi Li\. 2025a\.Towards a more generalized approach in open relation extraction\.In*Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\)*, pages 6343–6354\.
- Wang et al\. \(2025b\)Ziyue Wang, Junde Wu, Linghan Cai, Chang Han Low, Xihong Yang, Qiaxuan Li, and Yueming Jin\. 2025b\.Medagent\-pro: Towards evidence\-based multi\-modal medical diagnosis via reasoning agentic workflow\.*arXiv preprint arXiv:2503\.18968*\.
- Xu et al\. \(2024\)Derong Xu, Wei Chen, Wenjun Peng, Chao Zhang, Tong Xu, Xiangyu Zhao, Xian Wu, Yefeng Zheng, Yang Wang, and Enhong Chen\. 2024\.Large language models for generative information extraction: A survey\.*Frontiers of Computer Science*, 18\(6\):186357\.
- Yang et al\. \(2025\)An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, and 1 others\. 2025\.Qwen3 technical report\.*arXiv preprint arXiv:2505\.09388*\.
- Yu et al\. \(2017\)Mo Yu, Wenpeng Yin, Kazi Saidul Hasan, Cicero Dos Santos, Bing Xiang, and Bowen Zhou\. 2017\.Improved neural relation detection for knowledge base question answering\.In*Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\)*, pages 571–581\.
- Zhang et al\. \(2017\)Yuhao Zhang, Victor Zhong, Danqi Chen, Gabor Angeli, and Christopher D Manning\. 2017\.Position\-aware attention and supervised data improve slot filling\.In*Conference on empirical methods in natural language processing*\.
- Zhao et al\. \(2021\)Jun Zhao, Tao Gui, Qi Zhang, and Yaqian Zhou\. 2021\.A relation\-oriented clustering method for open relation extraction\.In*Proceedings of the 2021 conference on empirical methods in natural language processing*, pages 9707–9718\.
- Zhao et al\. \(2023\)Jun Zhao, Yongxin Zhang, Qi Zhang, Tao Gui, Zhongyu Wei, Minlong Peng, and Mingming Sun\. 2023\.Actively supervised clustering for open relation extraction\.In*Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\)*, pages 4985–4997\.
- Zheng et al\. \(2024\)Yaowei Zheng, Richong Zhang, Junhao Zhang, Yanhan Ye, and Zheyan Luo\. 2024\.[LlamaFactory: Unified efficient fine\-tuning of 100\+ language models](https://doi.org/10.18653/v1/2024.acl-demos.38)\.In*ACL 2024*\.
## Appendix AAdditional Experimental Details
### A\.1Baseline Details
We provide detailed descriptions of the baselines used in Table[1](https://arxiv.org/html/2606.26986#S4.T1)\.SelfOREHu et al\. \([2020](https://arxiv.org/html/2606.26986#bib.bib8)\)is a self\-supervised OpenRE method that uses a pretrained language model to obtain contextualized relational features\. It performs adaptive clustering to produce pseudo labels and iteratively improves the representations through a relation classification objective\.RoCOREZhao et al\. \([2021](https://arxiv.org/html/2606.26986#bib.bib27)\)is a relation\-oriented clustering method that leverages labeled instances from predefined relations to learn representations more suitable for clustering relation semantics, and transfers this relational knowledge to discover unseen relations in unlabeled data\.ASCOREZhao et al\. \([2023](https://arxiv.org/html/2606.26986#bib.bib28)\)formulates OpenRE as actively supervised clustering, where clustering learning and relation labeling are alternately performed\. It further designs an active labeling strategy to select representative instances and dynamically discover new relation clusters\.
Qwen3\-8BYang et al\. \([2025](https://arxiv.org/html/2606.26986#bib.bib24)\)is the vanilla LLM baseline used to evaluate whether a strong backbone can directly predict unseen relations without ReaORE’s progressive reasoning process\.ORELLMWang et al\. \([2024](https://arxiv.org/html/2606.26986#bib.bib20)\)uses cooperating LLMs for OpenRE\. It extracts relational phrases from input instances and exploits LLM\-based probabilities to estimate semantic similarity between phrases, thereby supporting clustering without relying only on embedding\-space distances\.LLM\-OREFTu et al\. \([2025](https://arxiv.org/html/2606.26986#bib.bib19)\)directly predicts new relations with LLMs through a relation discoverer and a relation predictor\. It uses demonstrations from known relations and adopts a self\-correcting inference strategy that includes relation discovery, relation denoising, and relation prediction\.
### A\.2Implementation Hyperparameters
Table[A\.1](https://arxiv.org/html/2606.26986#A1.T1)summarizes the key implementation hyperparameters used in our experiments\.
Table A\.1:Key implementation hyperparameters used in ReaORE\.
### A\.3Comparison with MixORE
We additionally compare ReaORE with MixOREWang et al\. \([2025a](https://arxiv.org/html/2606.26986#bib.bib21)\)under the experimental setting used in the MixORE paper\. This setting is different from the main experiments in Table[1](https://arxiv.org/html/2606.26986#S4.T1)\. MixORE emphasizes that the test data should contain a mixture of known and unseen relation types; specifically, the number of unseen relation types is set to 6, while the remaining relation types are treated as known relations\. MixORE uses BERT\-base as its backbone and therefore belongs to the SLM\-based OpenRE paradigm\. Table[A\.2](https://arxiv.org/html/2606.26986#A1.T2)reports the comparison under this setting\. The MixORE scores are taken from the MixORE paper and converted to percentage values for consistency with our tables\. For ReaORE, we report the results of the 14B backbone under the same setting\.
Table A\.2:Comparison between ReaORE and MixORE under the MixORE setting\. B3, V\-M, and Cls\. denote B3F1, V\-Measure F1, and classification Macro\-F1, respectively\.
## Appendix BVisualization of Reasoning Effect
This appendix provides a qualitative error analysis to further examine how explicit reasoning affects relation prediction\. The experiment compares ReaORE with a variant that removes intermediate reasoning and directly predicts relation labels\. We focus on representative FewRel relations that are semantically close or share similar entity\-type patterns, because these cases are where OpenRE models are most likely to confuse relation boundaries\.
Figure[B\.1](https://arxiv.org/html/2606.26986#A2.F1)reports the corresponding error matrices\. Rows denote ground\-truth relations and columns denote predicted relations; therefore, diagonal entries indicate correct predictions, while off\-diagonal entries indicate confusions between relation types\. The left matrix shows the variant without reasoning, and the right matrix shows ReaORE\. This comparison is intended to reveal whether the matching\-based reasoning and fine\-grained comparative reasoning modules merely improve aggregate scores, or whether they actually reduce errors among easily confused relations\.


Figure B\.1:Error matrices on representative FewRel relations\. Left: without reasoning\. Right: ReaORE\. Rows denote ground\-truth labels and columns denote predicted labels\.The visualization shows that ReaORE produces fewer off\-diagonal errors than the variant without reasoning\. This indicates that explicit reasoning helps the model avoid relying only on shallow lexical or entity\-type similarity\. Instead, ReaORE requires the model to first check relation\-instance matching from multiple aspects and then compare plausible relations before making the final decision\. As a result, the model better distinguishes subtle relation boundaries, especially for semantically adjacent relations that are difficult to separate through direct label generation\.
## Appendix CInference Efficiency Comparison
This appendix reports the inference\-time cost of ReaORE\. The purpose is to clarify where the computational overhead comes from and whether explicit reasoning makes the final prediction stage inefficient\. Table[C\.1](https://arxiv.org/html/2606.26986#A3.T1)summarizes the time cost of the three inference components\. Under the updated timing results, the third component is the largest single cost becauseℳ3\\mathcal\{M\}\_\{3\}performs fine\-grained pairwise comparisons and aggregates the comparative evidence for the final decision\. The first component remains moderate becauseℳ1\\mathcal\{M\}\_\{1\}performs matching\-based reasoning for each relation, but this step can be parallelized across relations\. The second component is lightweight becauseℳ2\\mathcal\{M\}\_\{2\}only performs embedding\-based relation reranking\.
Table C\.1:Inference\-time comparison across the three inference components of ReaORE\.We further compare different designs for the third stage in Table[C\.2](https://arxiv.org/html/2606.26986#A3.T2)\. Removing pairwise comparison is faster, but it no longer provides explicit comparative judgments among plausible relations\. In this setting, the model uses the judgment result without two\-by\-two comparative reasoning; it then applies a conventional CoT\-style generation process without imposing requirements on the CoT output format or the intermediate analysis procedure\. In contrast, multi\-turn pairwise comparison is slower because it requires repeated interactions\. ReaORE adopts a single\-dialogue comparative reasoning design, which preserves explicit pairwise judgments while keeping the additional time cost moderate\.
Table C\.2:Inference\-time comparison across different relation prediction designs in the third stage\.
## Appendix DPrompts
This appendix provides the complete prompts used by the two reasoning models in ReaORE\. The prompts define the input format, reasoning requirements, and output constraints for the two stages that require explicit reasoning\. During data construction, the same prompt formats are used to guide the teacher model in generating reasoning\-chain training data\. During inference, the fine\-tuned reasoning models follow the corresponding formats to produce structured intermediate reasoning\.
### D\.1Matching\-Based Reasoning Prompt
We first provide the complete prompt used by the matching\-based reasoning modelℳ1\\mathcal\{M\}\_\{1\}for the example shown in Figure[3](https://arxiv.org/html/2606.26986#S3.F3)\(a\)\. The purpose of this prompt is to force the model to make the evidence used for relation filtering explicit before assigning matching scores\. Specifically, the prompt asksℳ1\\mathcal\{M\}\_\{1\}to analyze the sentence and entity pair, then evaluate each relation by semantic support, head\-side argument validity, and tail\-side argument validity\. The expected output is a structured reasoning trace: the<think\>block contains the detailed query analysis and per\-relation judgments, while the<summary\>block gives the three Boolean matching scores used to form relation tiers\.
<\|im\_start\|\>system
\#\#\# Role
You are an expert \*\*Relation Extraction Specialist\*\*\. Your task is to analyze a provided text
and a pair of entities \(Head and Tail\) to determine which candidate relations from a given list
are correct\.
\#\#\# Task Logic
1\. \*\*Analyze\*\*: Infer entity types for \*\*Head\*\* and \*\*Tail\*\* based on Query\.
2\. \*\*Evaluate\*\*: Check 3 criteria for each relation \(1=Yes, 0=No\):
\* \*\*Semantic\*\*: Does relation meaning match Query?
\* \*\*Head Constraint\*\*: Is \*\*Head\*\* type valid as the relation’s Source/Domain? \-\> Maps to
‘head\_entity\_evaluation‘\.
\* \*\*Tail Constraint\*\*: Is \*\*Tail\*\* type valid as the relation’s Target/Range? \-\> Maps to
‘tail\_entity\_evaluation‘\.
\#\#\# Output Format
Strictly output \*\*only\*\* the following two parts\. No markdown code blocks\.
\*\*Part 1: Analysis\*\*
Wrap in ‘<think\>‘ tag\. Use valid JSON\.
<think\>
\{
"Query Analysis": \{
"text\_summary": "Overview and understand this text",
"head\_entity\_type": "Inferred type of Head Entity and explanation",
"tail\_entity\_type": "Inferred type of Tail Entity and explanation"
\},
"Candidate Relation Analysis": \[
\{
"relation\_name": "Relation Name",
"relation\_description": "Brief analysis of relation definition",
"semantic\_evaluation": \{
"is\_match": 0 or 1,
"reason": "Reason for semantic match"
\},
"head\_entity\_evaluation": \{
"is\_valid": 0 or 1,
"reason": "Is Head type valid for this relation domain?"
\},
"tail\_entity\_evaluation": \{
"is\_valid": 0 or 1,
"reason": "Is Tail type valid for this relation range?"
\}
\}
\]
\}
</think\>
\*\*Part 2: Summary\*\*
Wrap in ‘<summary\>‘ tag\. CSV format:
‘Name,Semantic\(1/0\),‘
‘HeadValid\(1/0\),TailValid\(1/0\)‘\.
<summary\>
RelationName1,1,1,0
RelationName2,0,1,1
</summary\>
<\|im\_end\|\>
<\|im\_start\|\>user
Text: It is an affiliated member of the Local Government Association of England and Wales and
the regional employers organisation \.
Head entity: local government association\.
Tail entity: england and wales\.
### D\.2Fine\-Grained Comparative Reasoning Prompt
We next provide the complete prompt used by the contrastive reasoning modelℳ3\\mathcal\{M\}\_\{3\}for the example shown in Figure[3](https://arxiv.org/html/2606.26986#S3.F3)\(b\)\. After relation filtering, the remaining relations are plausible but can still be semantically close\. The goal of this prompt is therefore not to score each relation independently, but to require pairwise comparison under explicit decision rules\. The prompt asksℳ3\\mathcal\{M\}\_\{3\}to prioritize sentence evidence, prefer the more precise relation when multiple relations are plausible, and align the final answer with the phrase or event expressed in the sentence\. The expected output contains acomparison\_logfor pairwise decisions and afinal\_resultthat states the selected relation and decision logic\.
<\|im\_start\|\>system
You are a relation extraction expert\. Select the best relation from candidate relations\.
Rules:
1\. Evidence priority: Does the text clearly support this relation?
2\. Precision: If both are correct, choose the more specific one
3\. Context matching: Which relation is closest to the verb/phrase in the text?
Output format:
<think\>
\{
"comparison\_log": \[
\{"pair": "A vs B", "winner": "A", "reason": "reason"\},
…
\],
"final\_result": \{
"best\_relation": "best relation name",
"decision\_logic": "selection logic"
\}
\}
</think\>
<summary\>
best relation name
</summary\><\|im\_end\|\>
<\|im\_start\|\>user
Text: It is an affiliated member of the Local Government Association of England and Wales and
the regional employers organisation \.
Head entity: local government association\.
Tail entity: england and wales\.
\*\*Candidate Relations to Compare:\*\*
\- applies to jurisdiction
\- located in the administrative territorial entity
\- location
\- contains administrative territorial entity
Please compare these relations pairwise and identify the single best relation\.<\|im\_end\|\>
<\|im\_start\|\>assistant
## Appendix EReasoning Trace Examples
This appendix provides the complete reasoning traces generated from the prompts in Appendix[D](https://arxiv.org/html/2606.26986#A4)\. While Figure[3](https://arxiv.org/html/2606.26986#S3.F3)in the main paper only shows abbreviated examples, the material here presents the full intermediate reasoning for one representative FewRel instance\. The goal is to make the reasoning procedure inspectable: readers can see how the models analyze evidence, assign intermediate judgments, and produce the final relation decision\.
### E\.1Matching\-Based Reasoning Trace
Given the matching\-based reasoning prompt,ℳ1\\mathcal\{M\}\_\{1\}produces the following reasoning trace\. This trace illustrates how ReaORE converts an input sentence and entity pair into explicit multi\-aspect judgments over relations\. The reader should focus on two parts: the query analysis, which identifies the sentence evidence and entity types, and the relation analysis, which explains why each relation receives its semantic, head\-entity, and tail\-entity matching scores\. These scores are the basis for constructing the initial relation tiers used by the subsequent reranking step\.
—————————————\-
Query Index: 0
—————————————\-
Input Text:
Text: It is an affiliated member of the Local Government Association of England and Wales and the regional employers organisation \.
Head entity: local government association\.
Tail entity: england and wales\.
—————————————\-
Raw Output:
—————————————\-
<think\>
\{
"Query Analysis": \{
"text\_summary": "The text states that an unspecified entity is a member of the ’Local Government Association of England and Wales’\. This implies a relationship between the association and the geographical region it serves, which is England and Wales\.",
"head\_entity\_type": "The head entity ’local government association’ is an organization, specifically a professional or trade association\.",
"tail\_entity\_type": "The tail entity ’england and wales’ is a geographical and political region, a country\."
\},
"Candidate Relation Analysis": \[
\{
"relation\_name": "residence",
"relation\_description": "The place where a person is or has been resident\.",
"semantic\_evaluation": \{
"is\_match": 0,
"reason": "The text describes an organizational relationship, not a person’s place of residence\."
\},
"head\_entity\_evaluation": \{
"is\_valid": 0,
"reason": "The head entity is an organization, but the relation’s domain is a person\."
\},
"tail\_entity\_evaluation": \{
"is\_valid": 1,
"reason": "The tail entity is a country/place, which is a valid type for this relation’s range\."
\}
\},
\{
"relation\_name": "headquarters location",
"relation\_description": "The specific location where an organization’s headquarters is or has been situated\.",
"semantic\_evaluation": \{
"is\_match": 0,
"reason": "The text does not state that the headquarters of the Local Government Association is in England and Wales\. It implies the association serves that region\."
\},
"head\_entity\_evaluation": \{
"is\_valid": 1,
"reason": "The head entity is an organization, which is a valid type for this relation’s domain\."
\},
"tail\_entity\_evaluation": \{
"is\_valid": 1,
"reason": "The tail entity is a country, which can be a valid location for a headquarters\."
\}
\},
\{
"relation\_name": "military rank",
"relation\_description": "A military rank achieved by a person\.",
"semantic\_evaluation": \{
"is\_match": 0,
"reason": "The text is about civil government and has no connection to the military\."
\},
"head\_entity\_evaluation": \{
"is\_valid": 0,
"reason": "The head entity is a civil organization, not a person\."
\},
"tail\_entity\_evaluation": \{
"is\_valid": 0,
"reason": "The tail entity is a country, not a military rank\."
\}
\},
\{
"relation\_name": "child",
"relation\_description": "The subject has the object as their biological, foster, or adoptive child\.",
"semantic\_evaluation": \{
"is\_match": 0,
"reason": "The entities are an organization and a country, not people with a familial relationship\."
\},
"head\_entity\_evaluation": \{
"is\_valid": 0,
"reason": "The head entity is an organization, not a person\."
\},
"tail\_entity\_evaluation": \{
"is\_valid": 0,
"reason": "The tail entity is a country, not a person\."
\}
\},
\{
"relation\_name": "operator",
"relation\_description": "The person, profession, or organization that operates the equipment, facility, or service\.",
"semantic\_evaluation": \{
"is\_match": 0,
"reason": "The text does not state that the Local Government Association operates England and Wales\."
\},
"head\_entity\_evaluation": \{
"is\_valid": 1,
"reason": "The head entity is an organization, which could be an operator\."
\},
"tail\_entity\_evaluation": \{
"is\_valid": 0,
"reason": "The tail entity is a country, not an equipment, facility, or service\."
\}
\},
\{
"relation\_name": "heritage designation",
"relation\_description": "The heritage designation of a cultural or natural site\.",
"semantic\_evaluation": \{
"is\_match": 0,
"reason": "The text is about an association, not a heritage site\."
\},
"head\_entity\_evaluation": \{
"is\_valid": 0,
"reason": "The head entity is an organization, not a cultural or natural site\."
\},
"tail\_entity\_evaluation": \{
"is\_valid": 0,
"reason": "The tail entity is a country, not a heritage designation\."
\}
\},
\{
"relation\_name": "field of work",
"relation\_description": "The specialization of a person or organization\.",
"semantic\_evaluation": \{
"is\_match": 0,
"reason": "The text does not state that England and Wales is the field of work for the Local Government Association\."
\},
"head\_entity\_evaluation": \{
"is\_valid": 1,
"reason": "The head entity is an organization, which can have a field of work\."
\},
"tail\_entity\_evaluation": \{
"is\_valid": 0,
"reason": "The tail entity is a country, not a field of work\."
\}
\},
\{
"relation\_name": "position played on team / speciality",
"relation\_description": "The position or specialism of a player on a team\.",
"semantic\_evaluation": \{
"is\_match": 0,
"reason": "The entities are not a player and a team\."
\},
"head\_entity\_evaluation": \{
"is\_valid": 0,
"reason": "The head entity is an organization, not a player\."
\},
"tail\_entity\_evaluation": \{
"is\_valid": 0,
"reason": "The tail entity is a country, not a team or position\."
\}
\},
\{
"relation\_name": "characters",
"relation\_description": "The characters which appear in this item \(like plays, operas, operettas, books, comics, films, TV series, video games\)\.",
"semantic\_evaluation": \{
"is\_match": 0,
"reason": "The entities are not a creative work and characters\."
\},
"head\_entity\_evaluation": \{
"is\_valid": 0,
"reason": "The head entity is an organization, not a creative work\."
\},
"tail\_entity\_evaluation": \{
"is\_valid": 0,
"reason": "The tail entity is a country, not a character\."
\}
\},
\{
"relation\_name": "applies to jurisdiction",
"relation\_description": "The item \(an institution, law, public office …\) or statement belongs to or has power over or applies to the value \(a territorial jurisdiction: a country, state, municipality, …\)\.",
"semantic\_evaluation": \{
"is\_match": 1,
"reason": "The text implies that the Local Government Association \(an institution\) operates within the jurisdiction of England and Wales\. This relation captures that an organization applies to or has power over a jurisdiction\."
\},
"head\_entity\_evaluation": \{
"is\_valid": 1,
"reason": "The head entity is an organization, which fits the description of an ’institution’ or ’public office’\."
\},
"tail\_entity\_evaluation": \{
"is\_valid": 1,
"reason": "The tail entity is a country, which is a ’territorial jurisdiction’\."
\}
\},
\{
"relation\_name": "sibling",
"relation\_description": "The subject has the object as their sibling\.",
"semantic\_evaluation": \{
"is\_match": 0,
"reason": "The entities are an organization and a country, not people with a sibling relationship\."
\},
"head\_entity\_evaluation": \{
"is\_valid": 0,
"reason": "The head entity is an organization, not a person\."
\},
"tail\_entity\_evaluation": \{
"is\_valid": 0,
"reason": "The tail entity is a country, not a person\."
\}
\},
\{
"relation\_name": "sport",
"relation\_description": "The sport in which the subject participates or belongs to\.",
"semantic\_evaluation": \{
"is\_match": 0,
"reason": "The text is not about sports\."
\},
"head\_entity\_evaluation": \{
"is\_valid": 0,
"reason": "The head entity is an organization, not a person or team\."
\},
"tail\_entity\_evaluation": \{
"is\_valid": 0,
"reason": "The tail entity is a country, not a sport\."
\}
\},
\{
"relation\_name": "member of political party",
"relation\_description": "The political party of which this politician is or has been a member\.",
"semantic\_evaluation": \{
"is\_match": 0,
"reason": "The text is about an association, not a political party\."
\},
"head\_entity\_evaluation": \{
"is\_valid": 0,
"reason": "The head entity is an organization, not a politician\."
\},
"tail\_entity\_evaluation": \{
"is\_valid": 0,
"reason": "The tail entity is a country, not a political party\."
\}
\},
\{
"relation\_name": "position held",
"relation\_description": "The subject currently or formerly holds the object position or public office\.",
"semantic\_evaluation": \{
"is\_match": 0,
"reason": "The text does not state that the Local Government Association holds a position called ’England and Wales’\."
\},
"head\_entity\_evaluation": \{
"is\_valid": 1,
"reason": "The head entity is an organization, which can hold a position\."
\},
"tail\_entity\_evaluation": \{
"is\_valid": 0,
"reason": "The tail entity is a country, not a position or public office\."
\}
\},
\{
"relation\_name": "notable work",
"relation\_description": "A notable scientific, artistic or literary work, or other work of significance among the subject’s works\.",
"semantic\_evaluation": \{
"is\_match": 0,
"reason": "The text does not describe England and Wales as a work of the Local Government Association\."
\},
"head\_entity\_evaluation": \{
"is\_valid": 1,
"reason": "The head entity is an organization, which can have notable works\."
\},
"tail\_entity\_evaluation": \{
"is\_valid": 0,
"reason": "The tail entity is a country, not a work\."
\}
\},
\{
"relation\_name": "voice type",
"relation\_description": "A person’s voice type\.",
"semantic\_evaluation": \{
"is\_match": 0,
"reason": "The text is not about music or voice types\."
\},
"head\_entity\_evaluation": \{
"is\_valid": 0,
"reason": "The head entity is an organization, not a person\."
\},
"tail\_entity\_evaluation": \{
"is\_valid": 0,
"reason": "The tail entity is a country, not a voice type\."
\}
\},
\{
"relation\_name": "screenwriter",
"relation\_description": "The person\(s\) who wrote the script for the subject item\.",
"semantic\_evaluation": \{
"is\_match": 0,
"reason": "The text is not about films or screenwriting\."
\},
"head\_entity\_evaluation": \{
"is\_valid": 0,
"reason": "The head entity is an organization, not a creative work\."
\},
"tail\_entity\_evaluation": \{
"is\_valid": 0,
"reason": "The tail entity is a country, not a person\."
\}
\},
\{
"relation\_name": "contains administrative territorial entity",
"relation\_description": "The list of direct subdivisions of an administrative territorial entity\.",
"semantic\_evaluation": \{
"is\_match": 0,
"reason": "The text does not state that the Local Government Association contains England and Wales as a subdivision\."
\},
"head\_entity\_evaluation": \{
"is\_valid": 0,
"reason": "The head entity is an organization, not an administrative territorial entity\."
\},
"tail\_entity\_evaluation": \{
"is\_valid": 1,
"reason": "The tail entity is a country, which is an administrative territorial entity\."
\}
\},
\{
"relation\_name": "location of formation",
"relation\_description": "The location where a group or organization was formed\.",
"semantic\_evaluation": \{
"is\_match": 0,
"reason": "The text does not state that the Local Government Association was formed in England and Wales\."
\},
"head\_entity\_evaluation": \{
"is\_valid": 1,
"reason": "The head entity is an organization, which can have a location of formation\."
\},
"tail\_entity\_evaluation": \{
"is\_valid": 1,
"reason": "The tail entity is a country, which is a valid location\."
\}
\},
\{
"relation\_name": "located in the administrative territorial entity",
"relation\_description": "The item is located on the territory of the following administrative entity\.",
"semantic\_evaluation": \{
"is\_match": 0,
"reason": "The text describes the association’s area of operation, not its physical location\. The relation is about location, not jurisdiction or service area\."
\},
"head\_entity\_evaluation": \{
"is\_valid": 1,
"reason": "The head entity is an organization, which can be located in a place\."
\},
"tail\_entity\_evaluation": \{
"is\_valid": 1,
"reason": "The tail entity is a country, which is an administrative territorial entity\."
\}
\},
\{
"relation\_name": "said to be the same as",
"relation\_description": "This item is said to be the same as that item, but the statement is disputed\.",
"semantic\_evaluation": \{
"is\_match": 0,
"reason": "The Local Government Association and England and Wales are distinct entities\."
\},
"head\_entity\_evaluation": \{
"is\_valid": 1,
"reason": "The head entity is an organization, which can be an item\."
\},
"tail\_entity\_evaluation": \{
"is\_valid": 1,
"reason": "The tail entity is a country, which can be an item\."
\}
\},
\{
"relation\_name": "country of citizenship",
"relation\_description": "The object is a country that recognizes the subject as its citizen\.",
"semantic\_evaluation": \{
"is\_match": 0,
"reason": "The text is not about citizenship\."
\},
"head\_entity\_evaluation": \{
"is\_valid": 0,
"reason": "The head entity is an organization, not a person\."
\},
"tail\_entity\_evaluation": \{
"is\_valid": 1,
"reason": "The tail entity is a country, which is a valid type for this relation’s range\."
\}
\},
\{
"relation\_name": "occupant",
"relation\_description": "A person or organization occupying property\.",
"semantic\_evaluation": \{
"is\_match": 0,
"reason": "The text does not state that the Local Government Association occupies property called ’England and Wales’\."
\},
"head\_entity\_evaluation": \{
"is\_valid": 1,
"reason": "The head entity is an organization, which can be an occupant\."
\},
"tail\_entity\_evaluation": \{
"is\_valid": 0,
"reason": "The tail entity is a country, not property\."
\}
\},
\{
"relation\_name": "location",
"relation\_description": "The location of the item, physical object or event is within\.",
"semantic\_evaluation": \{
"is\_match": 0,
"reason": "The text describes the association’s area of operation, not its physical location\. The relation is about location, not jurisdiction or service area\."
\},
"head\_entity\_evaluation": \{
"is\_valid": 1,
"reason": "The head entity is an organization, which can have a location\."
\},
"tail\_entity\_evaluation": \{
"is\_valid": 1,
"reason": "The tail entity is a country, which is a location\."
\}
\},
\{
"relation\_name": "tributary",
"relation\_description": "A stream or river that flows into this main stem \(or parent\) river\.",
"semantic\_evaluation": \{
"is\_match": 0,
"reason": "The text is not about geography or rivers\."
\},
"head\_entity\_evaluation": \{
"is\_valid": 0,
"reason": "The head entity is an organization, not a river\."
\},
"tail\_entity\_evaluation": \{
"is\_valid": 0,
"reason": "The tail entity is a country, not a river\."
\}
\},
\{
"relation\_name": "nominated for",
"relation\_description": "Award nomination received by a person, organisation or creative work\.",
"semantic\_evaluation": \{
"is\_match": 0,
"reason": "The text does not mention any awards\."
\},
"head\_entity\_evaluation": \{
"is\_valid": 1,
"reason": "The head entity is an organization, which can be nominated for an award\."
\},
"tail\_entity\_evaluation": \{
"is\_valid": 0,
"reason": "The tail entity is a country, not an award\."
\}
\},
\{
"relation\_name": "original language of film or TV show",
"relation\_description": "The language in which a film or a performance work was originally created\.",
"semantic\_evaluation": \{
"is\_match": 0,
"reason": "The text is not about films or TV shows\."
\},
"head\_entity\_evaluation": \{
"is\_valid": 0,
"reason": "The head entity is an organization, not a film or TV show\."
\},
"tail\_entity\_evaluation": \{
"is\_valid": 0,
"reason": "The tail entity is a country, not a language\."
\}
\},
\{
"relation\_name": "member of",
"relation\_description": "The organization or club to which the subject belongs\.",
"semantic\_evaluation": \{
"is\_match": 0,
"reason": "The text does not state that the Local Government Association is a member of England and Wales\."
\},
"head\_entity\_evaluation": \{
"is\_valid": 1,
"reason": "The head entity is an organization, which can be a member of another organization\."
\},
"tail\_entity\_evaluation": \{
"is\_valid": 0,
"reason": "The tail entity is a country, not an organization\."
\}
\},
\{
"relation\_name": "sports season of league or competition",
"relation\_description": "The property that shows the competition of which the item is a season\.",
"semantic\_evaluation": \{
"is\_match": 0,
"reason": "The text is not about sports\."
\},
"head\_entity\_evaluation": \{
"is\_valid": 0,
"reason": "The head entity is an organization, not a sports season\."
\},
"tail\_entity\_evaluation": \{
"is\_valid": 0,
"reason": "The tail entity is a country, not a competition\."
\}
\},
\{
"relation\_name": "instrument",
"relation\_description": "The musical instrument that a person plays\.",
"semantic\_evaluation": \{
"is\_match": 0,
"reason": "The text is not about music\."
\},
"head\_entity\_evaluation": \{
"is\_valid": 0,
"reason": "The head entity is an organization, not a person\."
\},
"tail\_entity\_evaluation": \{
"is\_valid": 0,
"reason": "The tail entity is a country, not an instrument\."
\}
\},
\{
"relation\_name": "competition class",
"relation\_description": "The official classification by a regulating body under which the subject qualifies for inclusion\.",
"semantic\_evaluation": \{
"is\_match": 0,
"reason": "The text is not about sports or competitions\."
\},
"head\_entity\_evaluation": \{
"is\_valid": 0,
"reason": "The head entity is an organization, not a competition or class\."
\},
"tail\_entity\_evaluation": \{
"is\_valid": 0,
"reason": "The tail entity is a country, not a classification\."
\}
\},
\{
"relation\_name": "occupation",
"relation\_description": "The occupation of a person\.",
"semantic\_evaluation": \{
"is\_match": 0,
"reason": "The text is not about a person’s occupation\."
\},
"head\_entity\_evaluation": \{
"is\_valid": 0,
"reason": "The head entity is an organization, not a person\."
\},
"tail\_entity\_evaluation": \{
"is\_valid": 0,
"reason": "The tail entity is a country, not an occupation\."
\}
\},
\{
"relation\_name": "original network",
"relation\_description": "The network\(s\) the radio or television show was originally aired on\.",
"semantic\_evaluation": \{
"is\_match": 0,
"reason": "The text is not about TV or radio\."
\},
"head\_entity\_evaluation": \{
"is\_valid": 0,
"reason": "The head entity is an organization, not a TV show\."
\},
"tail\_entity\_evaluation": \{
"is\_valid": 0,
"reason": "The tail entity is a country, not a network\."
\}
\},
\{
"relation\_name": "mountain range",
"relation\_description": "The range or subrange to which the geographical item belongs\.",
"semantic\_evaluation": \{
"is\_match": 0,
"reason": "The text is not about geography or mountain ranges\."
\},
"head\_entity\_evaluation": \{
"is\_valid": 0,
"reason": "The head entity is an organization, not a geographical feature\."
\},
"tail\_entity\_evaluation": \{
"is\_valid": 0,
"reason": "The tail entity is a country, not a mountain range\."
\}
\},
\{
"relation\_name": "crosses",
"relation\_description": "The obstacle \(body of water, road, …\) which this bridge crosses over or this tunnel goes under\.",
"semantic\_evaluation": \{
"is\_match": 0,
"reason": "The text is not about bridges or tunnels\."
\},
"head\_entity\_evaluation": \{
"is\_valid": 0,
"reason": "The head entity is an organization, not a bridge or tunnel\."
\},
"tail\_entity\_evaluation": \{
"is\_valid": 0,
"reason": "The tail entity is a country, not an obstacle\."
\}
\},
\{
"relation\_name": "operating system",
"relation\_description": "The operating system \(OS\) on which a software works or the OS installed on hardware\.",
"semantic\_evaluation": \{
"is\_match": 0,
"reason": "The text is not about software or operating systems\."
\},
"head\_entity\_evaluation": \{
"is\_valid": 0,
"reason": "The head entity is an organization, not software or hardware\."
\},
"tail\_entity\_evaluation": \{
"is\_valid": 0,
"reason": "The tail entity is a country, not an operating system\."
\}
\},
\{
"relation\_name": "constellation",
"relation\_description": "The area of the celestial sphere of which the subject is a part\.",
"semantic\_evaluation": \{
"is\_match": 0,
"reason": "The text is not about astronomy\."
\},
"head\_entity\_evaluation": \{
"is\_valid": 0,
"reason": "The head entity is an organization, not a celestial object\."
\},
"tail\_entity\_evaluation": \{
"is\_valid": 0,
"reason": "The tail entity is a country, not a constellation\."
\}
\},
\{
"relation\_name": "record label",
"relation\_description": "The brand and trademark associated with the marketing of subject music recordings and music videos\.",
"semantic\_evaluation": \{
"is\_match": 0,
"reason": "The text is not about music or record labels\."
\},
"head\_entity\_evaluation": \{
"is\_valid": 0,
"reason": "The head entity is an organization, not a music recording\."
\},
"tail\_entity\_evaluation": \{
"is\_valid": 0,
"reason": "The tail entity is a country, not a record label\."
\}
\},
\{
"relation\_name": "place served by transport hub",
"relation\_description": "The territorial entity or entities served by this transport hub \(airport, train station, etc\.\)\.",
"semantic\_evaluation": \{
"is\_match": 0,
"reason": "The text is not about transport hubs\."
\},
"head\_entity\_evaluation": \{
"is\_valid": 0,
"reason": "The head entity is an organization, not a transport hub\."
\},
"tail\_entity\_evaluation": \{
"is\_valid": 1,
"reason": "The tail entity is a country, which is a territorial entity\."
\}
\},
\{
"relation\_name": "subsidiary",
"relation\_description": "The subsidiary of a company or organization\.",
"semantic\_evaluation": \{
"is\_match": 0,
"reason": "The text does not state that England and Wales is a subsidiary of the Local Government Association\."
\},
"head\_entity\_evaluation": \{
"is\_valid": 1,
"reason": "The head entity is an organization, which can have subsidiaries\."
\},
"tail\_entity\_evaluation": \{
"is\_valid": 0,
"reason": "The tail entity is a country, not a subsidiary organization\."
\}
\}
\]
\}
</think\>
<summary\>
residence,0,0,1
headquarters location,0,1,1
military rank,0,0,0
child,0,0,0
operator,0,1,0
heritage designation,0,0,0
field of work,0,1,0
position played on team / speciality,0,0,0
characters,0,0,0
applies to jurisdiction,1,1,1
sibling,0,0,0
sport,0,0,0
member of political party,0,0,0
position held,0,1,0
notable work,0,1,0
voice type,0,0,0
screenwriter,0,0,0
contains administrative territorial entity,0,0,1
location of formation,0,1,1
located in the administrative territorial entity,0,1,1
said to be the same as,0,1,1
country of citizenship,0,0,1
occupant,0,1,0
location,0,1,1
tributary,0,0,0
nominated for,0,1,0
original language of film or TV show,0,0,0
member of,0,1,0
sports season of league or competition,0,0,0
instrument,0,0,0
competition class,0,0,0
occupation,0,0,0
original network,0,0,0
mountain range,0,0,0
crosses,0,0,0
operating system,0,0,0
constellation,0,0,0
record label,0,0,0
place served by transport hub,0,0,1
subsidiary,0,1,0</summary\>
—————————————\-
Parsed Result:
—————————————\-
A layer \(1 relations\): \[’applies to jurisdiction’\]
B layer \(5 relations\): \[’headquarters location’, ’location of formation’, ’located in the administrative territorial entity’, ’said to be the same as’, ’location’\]
C layer \(34 relations\): \[’residence’, ’military rank’, ’child’, ’operator’, ’heritage designation’, ’field of work’, ’position played on team / speciality’, ’characters’, ’sibling’, ’sport’, ’member of political party’, ’position held’, ’notable work’, ’voice type’, ’screenwriter’, ’contains administrative territorial entity’, ’country of citizenship’, ’occupant’, ’tributary’, ’nominated for’, ’original language of film or TV show’, ’member of’, ’sports season of league or competition’, ’instrument’, ’competition class’, ’occupation’, ’original network’, ’mountain range’, ’crosses’, ’operating system’, ’constellation’, ’record label’, ’place served by transport hub’, ’subsidiary’\]
### E\.2Fine\-Grained Comparative Reasoning Trace
Given the fine\-grained comparative reasoning prompt,ℳ3\\mathcal\{M\}\_\{3\}produces the following reasoning trace\. This trace shows how ReaORE makes the final relation prediction by comparing relation pairs rather than directly generating a label\. Each comparison records the pair, the preferred relation, and the reason for the preference\. The final decision summary then aggregates these pairwise judgments into one relation prediction, explaining why the selected relation is more appropriate than the alternatives\.
Query:
Input: Text: It is an affiliated member of the Local Government Association of England and Wales and the regional employers organisation \.
Head entity: local government association\.
Tail entity: england and wales\.
Candidate: \[’applies to jurisdiction’, ’located in the administrative territorial entity’, ’location’, ’contains administrative territorial entity’\]
Output: <think\>
\{
"comparison\_log": \[
\{
"pair": "applies to jurisdiction vs located in the administrative territorial entity",
"winner": "applies to jurisdiction",
"reason": "The text states the Local Government Association is ’of England and Wales’, which means it operates within that jurisdiction\. ’applies to jurisdiction’ correctly captures this relationship\. ’located in the administrative territorial entity’ is incorrect because a national organization is not located within a specific country; it operates within its jurisdiction\."
\},
\{
"pair": "applies to jurisdiction vs location",
"winner": "applies to jurisdiction",
"reason": "While the association is in ’England and Wales’, ’applies to jurisdiction’ is a more precise and formal description of the relationship between an organization and the country it serves\. ’location’ is too generic and less precise for this context\."
\},
\{
"pair": "applies to jurisdiction vs contains administrative territorial entity",
"winner": "applies to jurisdiction",
"reason": "The relation ’contains administrative territorial entity’ would imply that the Local Government Association is a larger entity that contains the country of England and Wales, which is factually incorrect\. ’applies to jurisdiction’ correctly describes the relationship\."
\},
\{
"pair": "located in the administrative territorial entity vs location",
"winner": "located in the administrative territorial entity",
"reason": "Both relations are poor fits\. However, ’located in the administrative territorial entity’ is slightly better than ’location’ because it specifies the type of entity \(administrative territorial\), which is relevant to the context of a country like England and Wales\. ’location’ is too generic\."
\},
\{
"pair": "located in the administrative territorial entity vs contains administrative territorial entity",
"winner": "located in the administrative territorial entity",
"reason": "’contains administrative territorial entity’ is factually incorrect as the association does not contain the country\. ’located in the administrative territorial entity’ is a weak but plausible interpretation of the text, even if not the most precise\."
\},
\{
"pair": "location vs contains administrative territorial entity",
"winner": "location",
"reason": "’contains administrative territorial entity’ is factually incorrect\. ’location’ is a very general but not entirely wrong description of the relationship, as the association is situated in the countries of England and Wales\."
\}
\],
"final\_result": \{
"best\_relation": "applies to jurisdiction",
"decision\_logic": "The text ’of England and Wales’ indicates that the Local Government Association operates within the jurisdiction of England and Wales\. This makes ’applies to jurisdiction’ the most accurate and specific relation\. The other candidates are either too generic \(’location’\), incorrectly describe the relationship \(’contains administrative territorial entity’\), or are factually incorrect \(’located in the administrative territorial entity’\)\."
\}
\}
</think\>
\{
"comparison\_log": \[
\{
"pair": "applies to jurisdiction vs located in the administrative territorial entity",
"winner": "applies to jurisdiction",
"reason": "The phrase ’of England and Wales’ indicates the association operates within that jurisdiction\. ’applies to jurisdiction’ is the correct relation\. ’located in the administrative territorial entity’ is incorrect because a national organization is not located within a country; it operates within its jurisdiction\."
\},
\{
"pair": "applies to jurisdiction vs location",
"winner": "applies to jurisdiction",
"reason": "’applies to jurisdiction’ is more precise than the generic ’location’\. The text implies a functional relationship \(operating within a jurisdiction\), not just a physical location\."
\},
\{
"pair": "applies to jurisdiction vs contains administrative territorial entity",
"winner": "applies to jurisdiction",
"reason": "’contains administrative territorial entity’ is factually incorrect\. The association does not contain the country of England and Wales\. ’applies to jurisdiction’ correctly describes the relationship\."
\},
\{
"pair": "located in the administrative territorial entity vs location",
"winner": "located in the administrative territorial entity",
"reason": "Both are weak fits\. However, ’located in the administrative territorial entity’ is slightly better because it specifies the type of entity \(administrative territorial\), which is relevant to the context of a country like England and Wales\. ’location’ is too generic\."
\},
\{
"pair": "located in the administrative territorial entity vs contains administrative territorial entity",
"winner": "located in the administrative territorial entity",
"reason": "’contains administrative territorial entity’ is factually incorrect\. ’located in the administrative territorial entity’ is a weak but plausible interpretation of the text, even if not the most precise\."
\},
\{
"pair": "location vs contains administrative territorial entity",
"winner": "location",
"reason": "’contains administrative territorial entity’ is factually incorrect\. ’location’ is a very general but not entirely wrong description of the relationship, as the association is situated in the countries of England and Wales\."
\}
\],
"final\_result": \{
"best\_relation": "applies to jurisdiction",
"decision\_logic": "The text ’of England and Wales’ indicates that the Local Government Association operates within the jurisdiction of England and Wales\. This makes ’applies to jurisdiction’ the most accurate and specific relation\. The other candidates are either too generic \(’location’\), incorrectly describe the relationship \(’contains administrative territorial entity’\), or are factually incorrect \(’located in the administrative territorial entity’\)\."
\}
\}
“‘
Parsed: \{’comparison\_log’: \[\{’pair’: ’applies to jurisdiction vs located in the administrative territorial entity’, ’winner’: ’applies to jurisdiction’, ’reason’: "The text states the Local Government Association is ’of England and Wales’, which means it operates within that jurisdiction\. ’applies to jurisdiction’ correctly captures this relationship\. ’located in the administrative territorial entity’ is incorrect because a national organization is not located within a specific country; it operates within its jurisdiction\."\}, \{’pair’: ’applies to jurisdiction vs location’, ’winner’: ’applies to jurisdiction’, ’reason’: "While the association is in ’England and Wales’, ’applies to jurisdiction’ is a more precise and formal description of the relationship between an organization and the country it serves\. ’location’ is too generic and less precise for this context\."\}, \{’pair’: ’applies to jurisdiction vs contains administrative territorial entity’, ’winner’: ’applies to jurisdiction’, ’reason’: "The relation ’contains administrative territorial entity’ would imply that the Local Government Association is a larger entity that contains the country of England and Wales, which is factually incorrect\. ’applies to jurisdiction’ correctly describes the relationship\."\}, \{’pair’: ’located in the administrative territorial entity vs location’, ’winner’: ’located in the administrative territorial entity’, ’reason’: "Both relations are poor fits\. However, ’located in the administrative territorial entity’ is slightly better than ’location’ because it specifies the type of entity \(administrative territorial\), which is relevant to the context of a country like England and Wales\. ’location’ is too generic\."\}, \{’pair’: ’located in the administrative territorial entity vs contains administrative territorial entity’, ’winner’: ’located in the administrative territorial entity’, ’reason’: "’contains administrative territorial entity’ is factually incorrect as the association does not contain the country\. ’located in the administrative territorial entity’ is a weak but plausible interpretation of the text, even if not the most precise\."\}, \{’pair’: ’location vs contains administrative territorial entity’, ’winner’: ’location’, ’reason’: "’contains administrative territorial entity’ is factually incorrect\. ’location’ is a very general but not entirely wrong description of the relationship, as the association is situated in the countries of England and Wales\."\}\], ’final\_result’: \{’best\_relation’: ’applies to jurisdiction’, ’decision\_logic’: "The text ’of England and Wales’ indicates that the Local Government Association operates within the jurisdiction of England and Wales\. This makes ’applies to jurisdiction’ the most accurate and specific relation\. The other candidates are either too generic \(’location’\), incorrectly describe the relationship \(’contains administrative territorial entity’\), or are factually incorrect \(’located in the administrative territorial entity’\)\."\}\}Similar Articles
OThink-SRR1: Search, Refine and Reasoning with Reinforced Learning for Large Language Models
OThink-SRR1 introduces an iterative Search-Refine-Reason framework trained with GRPO-IR reinforcement learning to reduce retrieval noise and token costs while boosting multi-hop QA accuracy.
Enhanced and Efficient Reasoning in Large Learning Models
This paper proposes a method for improving reasoning in large language models by recoding data to explicitly represent relationships, enabling efficient principled reasoning with polynomial-time learnability for relational rules, which addresses hallucinations and supports sound reasoning across multiple calls.
ReasonOps: Operator Segmentation for LLM Reasoning Traces
ReasonOps introduces an unsupervised method for annotating chain-of-thought traces from large reasoning models, identifying 7 recurring reasoning operators. The method enables analysis of reasoning structure, model identification, and correctness prediction across 12 models and 8 benchmarks.
GraphReAct: Reasoning and Acting for Multi-step Graph Inference
This paper introduces GraphReAct, a framework that extends reasoning-acting paradigms to graph-structured data for multi-step inference. It combines topological and semantic retrieval with context refinement to improve performance on graph learning benchmarks.
Perceive-to-Reason: Decoupling Perception and Reasoning for Fine-Grained Visual Reasoning
Introduces Perceive-to-Reason (P2R), a framework that decouples visual perception from reasoning in vision-language models using a two-stage process and a role-aware reinforcement learning strategy, achieving state-of-the-art results on fine-grained visual reasoning benchmarks.