Ontology for Policing: Conceptual Knowledge Learning for Semantic Understanding and Reasoning in Law Enforcement Reports
Summary
This paper proposes a symbolic framework that converts redacted police narratives into evidence-linked facts using ontology, semantic parsing (AMR), and reasoning, enabling structured querying of incident details that are typically only available in free text.
View Cached Full Text
Cached at: 05/18/26, 06:35 AM
# Ontology for Policing: Conceptual Knowledge Learning for Semantic Understanding and Reasoning in Law Enforcement Reports
Source: [https://arxiv.org/html/2605.15978](https://arxiv.org/html/2605.15978)
###### Abstract
Law enforcement reports contain structured fields and written narratives\. However, many incident facts that are needed for review, police training, and investigations are in natural language and require manual reading\. We propose a framework using symbolic methods for converting narratives into evidence–linked facts\. Our objective is to measure the value of narratives to recover incident details only from the unstructured text and build temporal graphs with time cues and domain axioms\. We achieve this by redacting personal identifiers, semantic parsing, predicate mapping to ontology, and reasoning\. We evaluate the symbolic approach on450property crime reports and a short human review\. Of the extracted events from the system,54\.1%had a confidence score≥\\geq0\.80 and93\.7%were mapped through the PropBank→\\rightarrowVerbNet→\\rightarrowWordNet semantic path\.100%agreement was reached on incident initiation, stolen items, and temporal cues and lower agreement for forced entry interpretation\.
## IIntroduction
Law enforcement agencies rely on incident reports to document events, investigations and keep track of operations which havestructured fieldssuch as coded categories, checkboxes, administrative data \(case number, offense title, and statute\) andnatural languagewritten by police officers\. The metadata is machine–readable and supports counting, filtering and record management\. Agencies use law enforcement records management system data entry tools to input structured data into incident reports, so a clear narrative is expected from law enforcement personnel that organizes incident facts, which are typically in chronological order\[[15](https://arxiv.org/html/2605.15978#bib.bib18)\]\. However, many important details of an incident are only in the narrative\. These include what happened and in what order, who was involved and how participants such asofficer,victimandsuspectare described\. Narrative details can also vary across reports\. One report may begin with a911911call, while another may begin with an officer on patrol, and details such as vehicles, stolen items, or where and how someone entered a property may appear only in the free text\. These details are harder to access systematically which creates a major bottleneck when investigators or analysts need to review large numbers of reports\. In somepublic–releasesettings these narratives are redacted which also adds to the challenge of recovering important information as well as keeping traceability to the source text\. In this work we focus only on extracting event details, participants, and temporal ordering from police narratives that are often missing from structured data\. We treat these asevidence–linked facts, meaning facts that are grounded in the narrative and can be traced back to the report text\. Extracting this information can make narratives more useful for systematic review, analysis and investigation which leads to the following research questions:
1. RQ1\.Can redacted police narratives be converted into evidence–linked facts that can be traced back to the original sentences with symbolic natural language understanding \(NLU\) techniques?
2. RQ2\.Do redacted police narratives provide enough evidence for event and temporal information that may not be captured in the structured metadata?
1\. Privacy and ExtractionNarrative→\\rightarrowRedacted Text \+ Entities2\. Semantic LayerStructured Event Semantics \(AMR\)3\. Analytical OutputsQueries \+ Auditable InferenceOntology\(OWL/DL\)
Figure 1:Narrative redaction, extraction, AMR semantic normalization, and ontology outputs\.For large–scale use, methods for analyzing police narratives must keepprivacy, handlelinguistic variationand maintaintraceabilityto the narratives\. Figure[1](https://arxiv.org/html/2605.15978#S1.F1)illustrates the proposed symbolic approach\. First, the narratives are redacted, and entities/events are extracted\. Next the text is converted to semantic representation that keeps all events and participants\. And finally the extracted meaning is mapped to ontology for reasoning\. With this, we address RQ11, while RQ22is examined with a short human review\. The remainder of this paper is organized as follows\. Section[II](https://arxiv.org/html/2605.15978#S2)reviews related work on police narratives and redaction, as well as on semantic parsing and linguistic knowledge bases for NLU\. Section[III](https://arxiv.org/html/2605.15978#S3)describes the text corpus\. Section[IV](https://arxiv.org/html/2605.15978#S4)outlines our methods\. Section[V](https://arxiv.org/html/2605.15978#S5)shows the results\. Section[VI](https://arxiv.org/html/2605.15978#S6)discusses limitations, usage, next steps and Section[VII](https://arxiv.org/html/2605.15978#S7)concludes\.
## IIRelated Work
Symbolic NLU\.This work uses symbolic NLU since our goal is to extract facts from police narratives that can be typed, checked and audited\. Prior work has shown that robust NLU helps move beyond shallow string matching toward text representations that support reasoning\[[4](https://arxiv.org/html/2605.15978#bib.bib17)\]\. In our approach, we treat the task as knowledge extraction \(KE\) from redacted police narratives, following the NLU tradition of representing sentence meaning in a form that supports semantic interpretation before reasoning\[[5](https://arxiv.org/html/2605.15978#bib.bib22)\]\. Some information in police narratives is not stated directly and needs background knowledge for interpretation\. Prior work has identified theknowledge bottleneckas one of the main issues in symbolic NLU\[[20](https://arxiv.org/html/2605.15978#bib.bib15)\]and has shown that definitions can support the recovery of implicit commonsense relations\[[19](https://arxiv.org/html/2605.15978#bib.bib10)\]\. This is relevant to our approach because event interpretation often depends on conceptual relations that give a deeper understanding\. Lexical resources\.For many natural language processing applications, semantically meaningful sentence structures are best represented in the form of predicate–argument structure, i\.e\.,“who did what to whom”, and our extraction pipeline depends on identifying such information from police narratives\. Much progress has been made in creating resources for consistent role annotation, especially in PropBank \(PB\) for predicate senses \(verbs\) and roles \(arguments\), which are often annotated with labels such as:ARG0or:ARG1\[[21](https://arxiv.org/html/2605.15978#bib.bib4)\]\. These resources support our semantic parsing that recovers event structure\. In our setting, we use lexical resources for event interpretation and argument typing\. In particular, VerbNet \(VN\) organizes verbs into semantically related classes\[[13](https://arxiv.org/html/2605.15978#bib.bib6)\]and WordNet \(WN\) is organized by synonym sets and hypernym relations\[[16](https://arxiv.org/html/2605.15978#bib.bib5)\]\. We use VerbNet through SemLink\[[26](https://arxiv.org/html/2605.15978#bib.bib21)\]to connect predicate senses to semantic classes and WordNet that supports“is–a”checks over extracted arguments and helps determine types such as vehicles, structures or structure parts\. Semantic layer\.As shown in Figure[1](https://arxiv.org/html/2605.15978#S1.F1), our workflow parses narratives into Abstract Meaning Representation \(AMR\) as an intermediate semantic layer\. AMR represents sentence meaning as a graph of predicates and their participants, and prior work has studied sentence meaning and the challenges of accurate parsing\[[14](https://arxiv.org/html/2605.15978#bib.bib3),[27](https://arxiv.org/html/2605.15978#bib.bib1)\]\. In our system, AMR provides the semantic input for the ontology\. While FrameNet provides a way to represent event structure and frame semantics\[[6](https://arxiv.org/html/2605.15978#bib.bib20)\], we use AMR graphs because they give explicit graph structure and PropBank sense labels that integrate directly with our mapping rules\. In general, an event is an action or occurrence and a frame is a record that organizes the details of that event into groups or slots\. For example an𝖤𝗇𝗍𝗋𝗒\\mathsf\{Entry\}frame may have entry point, method, structure and tool slots and a𝖳𝗁𝖾𝖿𝗍\\mathsf\{Theft\}frame may have stolen items and value mentioned\. So this makes AMR a better fit for our pipeline where sentence meaning must be mapped into entities, events and roles\. Ontologies\.In the Semantic Web community, ontologies give a formal schema for encoding constraints that support validation and reasoning\[[1](https://arxiv.org/html/2605.15978#bib.bib2)\], so after semantic parsing, we map extracted entities and events into an ontology\. Prior work on ontologies in the policing domain has further shown that explicit schemas and logical constraints can be developed for property crime concepts\[[17](https://arxiv.org/html/2605.15978#bib.bib9)\], which motivates our use of such representations for auditable police narrative analysis\. Temporal event ordering\.Part of this work involves ordering events in time so investigators have a clearer picture of the timeline of events, e\.g\., the sentence“a suspect broke a window before entering a home”shows a framework of temporal relationships\[[3](https://arxiv.org/html/2605.15978#bib.bib8)\]\. Event representations have also been implemented in cognitive systems where some future states are dependent on past events\[[2](https://arxiv.org/html/2605.15978#bib.bib7)\]\. While we do not implement the full framework, we do borrow some ideas to help extract and verify temporal relationships between events, as reflected in Section[IV\-E](https://arxiv.org/html/2605.15978#S4.SS5)\. Police narratives and redaction\.Police reports have fixed fields with written narratives by law enforcement officers\. We focus only on narratives because they often contain important details that may not be fully captured in structured fields\. However, operating at scale with narratives is difficult because incident reports can be incomplete or inconsistent\[[12](https://arxiv.org/html/2605.15978#bib.bib13)\]\. In earlier studies, it was argued that textual analysis alone would not be sufficient for real–world challenges\[[9](https://arxiv.org/html/2605.15978#bib.bib12)\]\. Recent work in this area has applied text mining and machine learning to other unstructured crime incident narratives \(e\.g\., court documents\) to address classification tasks \(e\.g\., offense type\)\. While we do not use machine learning in our approach, these studies show the value of free text in incident reports, while also suggesting that deep patterns cannot be captured from surface structure alone\[[7](https://arxiv.org/html/2605.15978#bib.bib11)\]\. Personally Identifiable Information \(PII\) may be in police narratives and since policing is a sensitive domain a typical preprocessing step is redaction before semantic parsing, ontology population and auditing\[[22](https://arxiv.org/html/2605.15978#bib.bib14)\]\. Hence, redaction is a necessary step in our workflow\. OpenBWC\.This work extendsOpenBWC111OpenBWC is a research open–source initiative for ethical AI and statistical analysis of body–worn camera \(BWC\) footage:[https://openbwc\.org/](https://openbwc.org/)\., which is a collaboration between Rochester Institute of Technology, the Rochester Police Department \(RPD\), and criminologists at the University at Albany, by adding symbolic NLU components for internal use with the RPD through an ontology pipeline for redacted police narratives\[[25](https://arxiv.org/html/2605.15978#bib.bib23)\]\.
## IIIDataset and Preprocessing
This work uses a text corpus of450450RPD incident reports from five offense categories,Burglary,Larceny,Motor Vehicle Theft,RobberyandStolen Property, between20142014and20252025\. The reports were provided in unredacted form\. We only work with property crimes because their narratives often describe how the incident began and include useful details about individuals, vehicles, and objects\.
TABLE I:Dataset composition and narrative statistics\.Table[I](https://arxiv.org/html/2605.15978#S3.T1)shows a breakdown of the corpus by category with word count statistics\. The narrative length is different from4040to868868words\. This is important because more dense narratives have more entity and event descriptions\. Before analysis, we perform the following steps:
1. 1\.Extraction:the source PDFs are processed to extract theNARRATIVEsection, which is then transformed to plain text\. We perform Optical Character Recognition \(OCR\), which is rendered at300300DPI and passed through Tesseract\[[24](https://arxiv.org/html/2605.15978#bib.bib25)\], where heading, footer, and artifacts are removed\. The extracted text in all uppercase is then converted to sentence case\.
2. 2\.OCR:the system corrects errors; for example we replace\|withI\.
3. 3\.Redaction:we use spaCy’s named entity recognition222spaCy API:[https://spacy\.io/api](https://spacy.io/api), regular expressions and metadata to redact PII for example, names, addresses, dates of birth and vehicle details\. We keep shorthand notations \(V\(Victim\),S\(Suspect\),W\(Witness\)\), use the same placeholders such as\[PERSON\_1\]for referencing the same entities and we also consider first–person mentions for reporting officers\. This outputs redacted narrative files and audit files in JSON format, which record the locations of all placeholders\.
TABLE II:Redaction summary from audit logs across the corpus\.- 1GPE = Geopolitical entity in named entity recognition\. It refers to geographical locations that also have a governing body, such as countries, cities, states, provinces, and municipalities\.
In Table[II](https://arxiv.org/html/2605.15978#S3.T2)we report the total number of placeholders for the redacted entities and the average number of placeholders per report\.
Unit of analysis\.The incident reports are analyzed as individual documents\. Due to the sensitive data in the narratives, the corpus is not publicly available\. All incident reports are maintained and processed in a secure and controlled research computing environment for batch execution and large–scale analysis\[[18](https://arxiv.org/html/2605.15978#bib.bib16)\]\.
## IVMethodology
This paper proposes a framework for converting police narratives into evidence–linked facts through a symbolic pipeline\. Algorithm[1](https://arxiv.org/html/2605.15978#algorithm1)summarizes the symbolic extraction pipeline\. The algorithm begins by inducing an ontology𝒯\\mathcal\{T\}and a set of logic templates from the semantic descriptionsℒ\\mathcal\{L\}from the ontology𝒯0\\mathcal\{T\}\_\{0\}and a corpusDdefD\_\{\\text\{def\}\}\(Line11\)\. For example, a definition of theft can provide a template stating that a taking event with an agent, an item, and lack of permission supports a theft interpretation\. For all reportsd∈𝒟d\\in\\mathcal\{D\}the narrativennand its metadatammare extracted \(Lines3−43\-4\)\. For example, the narrative may have“John Doe broke the window and took a wallet,”and the metadata may include the case number, offense type and date\. Then the narrativennis redacted using the set of metadata rulesℛ\\mathcal\{R\}which results in redacted narrativen′n^\{\\prime\}and redaction logℓd\\ell\_\{d\}\(Lines5−65\-6\)\. Here“John Doe”is replaced with\[PERSON\_1\]\. Next, each sentence is parsed with the semantic parser𝒫\\mathcal\{P\}which gives a set of AMR graphsGdG\_\{d\}\(Line77\)\. For example the sentence “\[PERSON\_1\]broke the window and took a wallet”may produce predicate nodes such asbreak\-0101andtake\-0101together with their arguments\. The resulting AMR graphs are then transformed into extracted facts mapped to classes and roles𝒜N\(d\)\\mathcal\{A\}\_\{N\}\(d\)\(Line88\), and separately the metadata is transformed into ontology facts𝒜M\(d\)\\mathcal\{A\}\_\{M\}\(d\)\(Line99\) so it can give facts such as𝖥𝗈𝗋𝖼𝖾𝖽𝖤𝗇𝗍𝗋𝗒𝖤𝗏𝖾𝗇𝗍\\mathsf\{ForcedEntryEvent\},𝖳𝗁𝖾𝖿𝗍𝖤𝗏𝖾𝗇𝗍\\mathsf\{TheftEvent\}or participant links for suspects\. The narrative facts with the metadata–derived facts are merged into a case–level fact set𝒜\\mathcal\{A\}\(Line1010\)\. A reasoning stepℰ\\mathcal\{E\}then checks the consistency of the fact set𝒜\\mathcal\{A\}with the applied ontology \(Line1111\)\. For example, if a theft event is extracted, the reasoner can check whether a stolen item is present and whether participants are typed consistently\. Inconsistent parts of the ontology are updated using a resolution step \(Line1212\)\. For a single report, the algorithm outputs the redacted narrative, extracted events and participants, the evidence–linked facts, ontology and the validation output for audit \(Line1313\)\.
1
Input :Reports
𝒟\\mathcal\{D\}; metadata
ℳ\\mathcal\{M\}; definition corpus
DdefD\_\{\\text\{def\}\}; linguistic KBs
𝒦lex\\mathcal\{K\}\_\{\\text\{lex\}\}; ontology
𝒯0\\mathcal\{T\}\_\{0\}; redaction rules
ℛ\\mathcal\{R\}; semantic parser
𝒫\\mathcal\{P\}; reasoner
ℰ\\mathcal\{E\}\.
2
Output :Redacted narratives
𝒩′\\mathcal\{N\}^\{\\prime\}; narrative facts
𝒜N\\mathcal\{A\}\_\{N\}; metadata facts
𝒜M\\mathcal\{A\}\_\{M\}; rules
ℒ\\mathcal\{L\}; ontology
𝒯\\mathcal\{T\}; validation
𝒱\\mathcal\{V\}\.
3
//Encode domain knowledge
4
\(𝒯,ℒ\)←InduceLogic\(Ddef,𝒯0\)\(\\mathcal\{T\},\\mathcal\{L\}\)\\leftarrow\\mathrm\{\\textbf\{InduceLogic\}\}\(D\_\{\\text\{def\}\},\\mathcal\{T\}\_\{0\}\)
5
6foreach*d∈𝒟d\\in\\mathcal\{D\}*do
7
n←ExtractNarrative\(d\)n\\leftarrow\\mathrm\{\\textbf\{ExtractNarrative\}\}\(d\)
8
m←Metadata\(d,ℳ\)m\\leftarrow\\mathrm\{\\textbf\{Metadata\}\}\(d,\\mathcal\{M\}\)
9
10
\(n′,ℓd\)←Redact\(n,m;ℛ\)\(n^\{\\prime\},\\ell\_\{d\}\)\\leftarrow\\mathrm\{\\textbf\{Redact\}\}\(n,m;\\mathcal\{R\}\)
11add
n′n^\{\\prime\}to
𝒩′\\mathcal\{N\}^\{\\prime\}; store
ℓd\\ell\_\{d\}
12
13
Gd←\{𝒫\(s\)∣s∈SentSplit\(n′\)\}G\_\{d\}\\leftarrow\\\{\\mathcal\{P\}\(s\)\\mid s\\in\\mathrm\{\\textbf\{SentSplit\}\}\(n^\{\\prime\}\)\\\}
14
𝒜N\(d\)←ExtractFacts\(Gd;𝒯,ℒ\)\\mathcal\{A\}\_\{N\}\(d\)\\leftarrow\\mathrm\{\\textbf\{ExtractFacts\}\}\(G\_\{d\};\\mathcal\{T\},\\mathcal\{L\}\)
15
16
𝒜M\(d\)←MapMetadata\(m,𝒯\)\\mathcal\{A\}\_\{M\}\(d\)\\leftarrow\\mathrm\{\\textbf\{MapMetadata\}\}\(m,\\mathcal\{T\}\)
17
𝒜←𝒜N\(d\)∪𝒜M\(d\)\\mathcal\{A\}\\leftarrow\\mathcal\{A\}\_\{N\}\(d\)\\cup\\mathcal\{A\}\_\{M\}\(d\)
18
19
𝒱\(d\)←ℰ\(𝒯∪𝒜\)\\mathcal\{V\}\(d\)\\leftarrow\\mathcal\{E\}\(\\mathcal\{T\}\\cup\\mathcal\{A\}\)
20
21
\(𝒯,ℒ\)←ResolveInconsistencies\(𝒯,ℒ;𝒱\(d\),Ddef\)\(\\mathcal\{T\},\\mathcal\{L\}\)\\leftarrow\\mathrm\{\\textbf\{ResolveInconsistencies\}\}\(\\mathcal\{T\},\\mathcal\{L\};\\mathcal\{V\}\(d\),D\_\{\\text\{def\}\}\)
22
23
24return
𝒩′,𝒜N,𝒜M,ℒ,𝒯,𝒱\\mathcal\{N\}^\{\\prime\},\\mathcal\{A\}\_\{N\},\\mathcal\{A\}\_\{M\},\\mathcal\{L\},\\mathcal\{T\},\\mathcal\{V\}
Algorithm 1Symbolic KE### IV\-AKnowledge Sources
Police narratives contain semantics useful for agents, but they are often written with shorthand notation and varied sentence structures, so we employ a set of knowledge sources\. We treat the extracted OWL/RDF333OWL: Web Ontology Language; RDF: Resource Description Frameworkassertions as aper–report case knowledge base\. For each report
dd, the system constructs a case–level set of factual assertions containing event and participant instances linked to sentence evidence and to their extraction source\. In our approach we have incorporated three different types of knowledge sources\. Firstly, a participation–centric ontology
𝒯0\\mathcal\{T\}\_\{0\}is used for structuring events that are linked to entities through participation roles, as described in Section[IV\-C](https://arxiv.org/html/2605.15978#S4.SS3)\. Then, a definition \(formal description of a concept\)
DdefD\_\{\\text\{def\}\}is used for inducting logic templates based on regularities that constrain the participation of typical individuals in events and relations between events\. Finally, linguistic knowledge bases
𝒦lex\\mathcal\{K\}\_\{\\text\{lex\}\}such as PropBank, VerbNet/SemLink and WordNet give lexical semantics to predicate normalization \(mapping different verbs to event types\) and argument typing\. These lexical resources were accessed through the Natural Language Toolkit \(NLTK\)
3\.9\.23\.9\.2\[[8](https://arxiv.org/html/2605.15978#bib.bib26)\], including PropBank, VerbNet, and WordNet
3\.03\.0\.
Semantic interpretation\.Semantic templates give some of this guidance for predicate normalization\. We can use lexical semantic resources to provide information about events, roles, and additional constraints that a predicate is likely to express\. Semantics can be determined by the mainverbwhich usually identifies the event and also fromnounsandargumentsthat determine the participants and some other target objects\. For example atakeevent involving an agent, a transferred item and an impacted owner without authorization is the semantic meaning of theft predicate\. In“The suspect stole a wallet from the victim’s vehicle,”stoleis a theft event,walletis a stolen item,suspectis the main agent,victimis the affected entity andvehicleis the source context:
steal\_v1\(e\)∧agent\(e,x\)∧theme\(e,y\)∧owner\(y,z\)\\displaystyle\\textit\{steal\\\_v1\}\(e\)\\wedge\\textit\{agent\}\(e,x\)\\wedge\\textit\{theme\}\(e,y\)\\wedge\\textit\{owner\}\(y,z\)→take\_v1\(e\)∧agent\(e,x\)∧theme\(e,y\)\\displaystyle\\qquad\\rightarrow\\textit\{take\\\_v1\}\(e\)\\wedge\\textit\{agent\}\(e,x\)\\wedge\\textit\{theme\}\(e,y\)∧¬permission\(z,x,e\)\.\\displaystyle\\qquad\\qquad\\wedge\\neg\\textit\{permission\}\(z,x,e\)\.
### IV\-BSemantic Parsing
We use the AMRlib sentence–to–graph parser\[[28](https://arxiv.org/html/2605.15978#bib.bib27)\], which is a BART–large encoder–decoder model for AMR graphs with PropBank sense labels \(e\.g\.,steal\-0101\)\. The released checkpoint reports a SMATCH score of83\.7%83\.7\\%\. The AMR outputs are in PENMAN notation\[[11](https://arxiv.org/html/2605.15978#bib.bib28)\], which is a text format for writing AMR graphs, so ontology population is reproducible\.
Figure 2:AMR graph for the sentence “The suspect broke the rear passenger window of the vehicle and stole a wallet,” showing two events \(break\-0101andsteal\-0101\) with a shared agent \(suspect\)\.Figure[2](https://arxiv.org/html/2605.15978#S4.F2)shows how an AMR graph represents the meaning of a sentence\. Each box corresponds to a concept node, where predicates such asbreak\-0101andsteal\-0101represent events labeled with PropBank senses \(“0101” indicating a specific sense of the verb\)\. The edges between nodes are the semantic roles\. For example,:ARG0usually denotes the agent \(the doer of the action\) and in this case is theperson\(suspect\), while:ARG1is the patient or object affected by the action, such as thewindowin the breaking event and thewalletin the stealing event\. The other nodes represent the entities and their relationships\. For example,windowis linked tovehiclewith a:part\-ofrelation and modifiers such asrearandpassengerdescribe the object\. Thepersonnode connected to both predicates means that the same agent participates in both events\. The resulting graphs are used as the basis for event extraction\. Predicate nodes with PropBank senses are treated as candidate event mentions, and their arguments are passed for ontology role assignment and event typing \(shown in[IV\-B](https://arxiv.org/html/2605.15978#S4.SS2.SSS0.Px1)and[IV\-D](https://arxiv.org/html/2605.15978#S4.SS4)\)\. Each candidate event is scored for confidence at the sentence level, meaning that the confidence in a given scored event is maintained at the span of local evidence \(the sentence\) before temporal ordering\.
##### Role Extraction\.
The redaction process outputs a pseudonym map that assigns names to each individual \(e\.g\.,Victim\_1,Suspect\_Unknown,Officer\) and for each sentence in the AMR it gives a variable associated with each name\. These are used to ground the predicate–argument roles in the AMR to specific participants that are relevant to the policing domain\. For each case we specified a function to map each pseudonym to a unique entity identifier where each case–linked entity is assigned a semantic type \(e\.g\.,𝖯𝖾𝗋𝗌𝗈𝗇\\mathsf\{Person\}or𝖵𝖾𝗁𝗂𝖼𝗅𝖾\\mathsf\{Vehicle\}\), the role associated with that mention and sentence–indexed evidence showing where it appears in the narrative\.
##### Unknown–actor separation\.
When the pseudonym map shows different unknown suspects in the same caseccthey are stored as separate individuals and not merged:
𝖯𝗅𝖺𝗒𝗌𝖱𝗈𝗅𝖾\(x,suspect\_unknown,c\)\\displaystyle\\mathsf\{PlaysRole\}\(x,\\textit\{suspect\\\_unknown\},c\)∧𝖯𝗅𝖺𝗒𝗌𝖱𝗈𝗅𝖾\(y,suspect\_unknown,c\)\\displaystyle\\quad\\wedge\\ \\mathsf\{PlaysRole\}\(y,\\textit\{suspect\\\_unknown\},c\)∧x≠y\\displaystyle\\quad\\wedge\\ x\\neq y⇒𝖣𝗂𝖿𝖿𝖾𝗋𝖾𝗇𝗍𝖥𝗋𝗈𝗆\(x,y\)\.\\displaystyle\\qquad\\Rightarrow\\ \\mathsf\{DifferentFrom\}\(x,y\)\.
### IV\-COntology Construction
We construct an OWL22ontology in Protégé444Protégé is an open–source OWL ontology editor:[https://protege\.stanford\.edu](https://protege.stanford.edu/)to represent case–level knowledge where facts are structured through events and entities\. Our ontology uses a relatively compact schema but many case–specific factual assertions\. There is a TBox \(class schema\), an RBox \(property schema\), and an ABox \(case–specific facts\)\. The ontology is dominated by ABox assertions relative to the TBox and RBox\. Figure[3](https://arxiv.org/html/2605.15978#S4.F3)shows a partial class hierarchy of the ontology which is organized around three main branches:𝖤𝗏𝖾𝗇𝗍\\mathsf\{Event\},𝖤𝗇𝗍𝗂𝗍𝗒\\mathsf\{Entity\}and𝖯𝖺𝗋𝗍𝗂𝖼𝗂𝗉𝖺𝗍𝗂𝗈𝗇𝖱𝗈𝗅𝖾\\mathsf\{ParticipationRole\}and all descend from𝗈𝗐𝗅\\mathsf\{owl\}:𝖳𝗁𝗂𝗇𝗀\\mathsf\{Thing\}\. The“is–a”arrows show subclass relationships from general to more specific concepts\. For example,𝖢𝗋𝗂𝗆𝖾𝖤𝗏𝖾𝗇𝗍\\mathsf\{CrimeEvent\}is a subclass of𝖤𝗏𝖾𝗇𝗍\\mathsf\{Event\}but𝖳𝗁𝖾𝖿𝗍𝖤𝗏𝖾𝗇𝗍\\mathsf\{TheftEvent\},𝖤𝗇𝗍𝗋𝗒𝖤𝗏𝖾𝗇𝗍\\mathsf\{EntryEvent\},𝖯𝗋𝗈𝗉𝖾𝗋𝗍𝗒𝖣𝖺𝗆𝖺𝗀𝖾𝖤𝗏𝖾𝗇𝗍\\mathsf\{PropertyDamageEvent\}and𝖥𝗈𝗋𝖼𝖾𝖽𝖤𝗇𝗍𝗋𝗒𝖤𝗏𝖾𝗇𝗍\\mathsf\{ForcedEntryEvent\}are subclasses of𝖢𝗋𝗂𝗆𝖾𝖤𝗏𝖾𝗇𝗍\\mathsf\{CrimeEvent\}\.𝖯𝖾𝗋𝗌𝗈𝗇\\mathsf\{Person\},𝖵𝖾𝗁𝗂𝖼𝗅𝖾\\mathsf\{Vehicle\},𝖶𝖾𝖺𝗉𝗈𝗇\\mathsf\{Weapon\}and𝖫𝗈𝖼𝖺𝗍𝗂𝗈𝗇\\mathsf\{Location\}are subclasses of𝖤𝗇𝗍𝗂𝗍𝗒\\mathsf\{Entity\}\.𝖲𝗎𝗌𝗉𝖾𝖼𝗍𝖱𝗈𝗅𝖾\\mathsf\{SuspectRole\},𝖶𝗂𝗍𝗇𝖾𝗌𝗌𝖱𝗈𝗅𝖾\\mathsf\{WitnessRole\}and𝖵𝗂𝖼𝗍𝗂𝗆𝖱𝗈𝗅𝖾\\mathsf\{VictimRole\}are subclasses of𝖯𝖺𝗋𝗍𝗂𝖼𝗂𝗉𝖺𝗍𝗂𝗈𝗇𝖱𝗈𝗅𝖾\\mathsf\{ParticipationRole\}\. This class hierarchy helps the system reason at different levels of abstraction, for example by recognizing that an extracted theft event should also satisfy the more general constraints defined for crime events\. Ontology statistics are reported in Table[VIII](https://arxiv.org/html/2605.15978#S9.T8)in Appendix[IX](https://arxiv.org/html/2605.15978#S9)\.
Figure 3:Partial class–level view of the ontology, showing event, entity, and role classes\.##### Ontology structure\.
At the ontology level, the schema includes a set of classes and properties\. Most of the properties used at the event scope are related to𝖤𝗏𝖾𝗇𝗍\\mathsf\{Event\}participation\. Each𝖯𝖺𝗋𝗍𝗂𝖼𝗂𝗉𝖺𝗍𝗂𝗈𝗇\\mathsf\{Participation\}is a relationship between an entity and an event\. The property𝗂𝗇𝖤𝗏𝖾𝗇𝗍\\mathsf\{inEvent\}captures participants that are part of a given event\. Each role in a predicate gives a unique role property \(e\.g\.,𝗁𝖺𝗌𝖠𝗀𝖾𝗇𝗍\\mathsf\{hasAgent\}or𝗁𝖺𝗌𝖯𝖺𝗍𝗂𝖾𝗇𝗍\\mathsf\{hasPatient\}\) that links the participation to an entity\. The property𝗌𝗎𝗉𝗉𝗈𝗋𝗍𝖾𝖽𝖡𝗒\\mathsf\{supportedBy\}links extracted assertions back to sentences and𝗂𝗇𝖢𝖺𝗌𝖾\\mathsf\{inCase\}associates each event with its report case\.
##### Ontology mapping from AMR\.
AMR predicate–argument structures are converted into ontology assertions with a small set of recurring mapping patterns\. For examplesteal\-0101:ARG0x:ARG1yis mapped to an event instanceeesuch that
𝖳𝗁𝖾𝖿𝗍𝖤𝗏𝖾𝗇𝗍\(e\)∧∃p\(𝖯𝖺𝗋𝗍𝗂𝖼𝗂𝗉𝖺𝗍𝗂𝗈𝗇\(p\)∧𝗂𝗇𝖤𝗏𝖾𝗇𝗍\(p,e\)\\displaystyle\\mathsf\{TheftEvent\}\(e\)\\wedge\\exists p\\,\(\\mathsf\{Participation\}\(p\)\\wedge\\mathsf\{inEvent\}\(p,e\)∧𝗁𝖺𝗌𝖠𝗀𝖾𝗇𝗍\(p,x\)∧𝗁𝖺𝗌𝖯𝖺𝗍𝗂𝖾𝗇𝗍\(p,y\)\)\.\\displaystyle\\quad\\wedge\\ \\mathsf\{hasAgent\}\(p,x\)\\wedge\\mathsf\{hasPatient\}\(p,y\)\)\.We keep an inspectable intermediate representation of the meaning graphs that includes case–local pseudonyms for entities and concepts, role assignments and the verb together with its numbered sense label for audit of each extracted role and event to verify the corresponding AMR node and sentence text\. The final lexical typing depends on predicate sense, argument structure and semantic checks over participant types, which is described in Section[IV\-D](https://arxiv.org/html/2605.15978#S4.SS4)\.
Description logic \(DL\)\.We apply a small set of DL constraints that capture structural invariants of the ontology\. Here
RRis an object property and
CCa class;
⊑\\sqsubseteqmeans subsumption,
∃R\.C\\exists R\.Can existential restriction,
∀R\.C\\forall R\.Ca universal restriction,
⊓\\sqcapconcept intersection and
⊤\\topthe universal class\. These are written to capture missing entities, events, links or wrong typed roles\.
1. 1\.Participation constraints\.Every participation instance must be linked to an event, and every event must be associated with a case: 𝖯𝖺𝗋𝗍𝗂𝖼𝗂𝗉𝖺𝗍𝗂𝗈𝗇\\displaystyle\\mathsf\{Participation\}⊑∃𝗂𝗇𝖤𝗏𝖾𝗇𝗍\.𝖤𝗏𝖾𝗇𝗍\\displaystyle\\sqsubseteq\\exists\\,\\mathsf\{inEvent\}\.\\mathsf\{Event\}𝖤𝗏𝖾𝗇𝗍\\displaystyle\\mathsf\{Event\}⊑∃𝗂𝗇𝖢𝖺𝗌𝖾\.𝖢𝖺𝗌𝖾\\displaystyle\\sqsubseteq\\exists\\,\\mathsf\{inCase\}\.\\mathsf\{Case\}
2. 2\.Role constraints\.Role assertions are restricted so that only appropriately typed individuals can fill each role: ∃𝗁𝖺𝗌𝖠𝗀𝖾𝗇𝗍\.⊤\\displaystyle\\exists\\,\\mathsf\{hasAgent\}\.\\top⊑𝖯𝖺𝗋𝗍𝗂𝖼𝗂𝗉𝖺𝗍𝗂𝗈𝗇\\displaystyle\\sqsubseteq\\mathsf\{Participation\}
3. 3\.Event constraints\.Some event types have additional requirements on their participants\. For example, theft events are modeled as a subclass of𝖢𝗋𝗂𝗆𝖾𝖤𝗏𝖾𝗇𝗍\\mathsf\{CrimeEvent\}, must include a stolen item, and require that the stolen item be an instance of𝖨𝗍𝖾𝗆\\mathsf\{Item\}: 𝖳𝗁𝖾𝖿𝗍𝖤𝗏𝖾𝗇𝗍\\displaystyle\\mathsf\{TheftEvent\}⊑𝖢𝗋𝗂𝗆𝖾𝖤𝗏𝖾𝗇𝗍⊓∃𝗁𝖺𝗌𝖲𝗍𝗈𝗅𝖾𝗇𝖨𝗍𝖾𝗆\.𝖨𝗍𝖾𝗆\\displaystyle\\sqsubseteq\\mathsf\{CrimeEvent\}\\sqcap\\exists\\,\\mathsf\{hasStolenItem\}\.\\mathsf\{Item\}𝖳𝗁𝖾𝖿𝗍𝖤𝗏𝖾𝗇𝗍\\displaystyle\\mathsf\{TheftEvent\}⊑∀𝗁𝖺𝗌𝖲𝗍𝗈𝗅𝖾𝗇𝖨𝗍𝖾𝗆\.𝖨𝗍𝖾𝗆\\displaystyle\\sqsubseteq\\forall\\,\\mathsf\{hasStolenItem\}\.\\mathsf\{Item\}
4. 4\.Domain and range constraints\.The axioms to restrict role properties: ∃𝗁𝖺𝗌𝖵𝗂𝖼𝗍𝗂𝗆\.⊤⊑𝖤𝗏𝖾𝗇𝗍\\exists\\,\\mathsf\{hasVictim\}\.\\top\\sqsubseteq\\mathsf\{Event\}
These are evaluated using the HermiT reasoner\[[10](https://arxiv.org/html/2605.15978#bib.bib19)\]\. The axioms check for missing participants, incorrectly typed role fillers and contradictory role assignments\.
### IV\-DLexical Rules
We represent the meaning of a sentence with predicate–argument structure where the main predicate is the event trigger and the arguments give the initial participant structure\. For example the sentence“The suspect stole a wallet”gives𝖳𝗁𝖾𝖿𝗍𝖤𝗏𝖾𝗇𝗍\(e\)∧𝗁𝖺𝗌𝖠𝗀𝖾𝗇𝗍\(e,x\)∧𝗁𝖺𝗌𝖯𝖺𝗍𝗂𝖾𝗇𝗍\(e,y\)∧𝖲𝗎𝗌𝗉𝖾𝖼𝗍\(x\)∧𝖨𝗍𝖾𝗆\(y\)\\mathsf\{TheftEvent\}\(e\)\\wedge\\mathsf\{hasAgent\}\(e,x\)\\wedge\\mathsf\{hasPatient\}\(e,y\)\\wedge\\mathsf\{Suspect\}\(x\)\\wedge\\mathsf\{Item\}\(y\)\. So:ARG0is the actor and:ARG1is the affected entity\. We use three lexical resources for predicate and role normalization\. PropBank gives verb senses and their argument roles, which are represented in the AMR graph\. SemLink maps each PropBank sense to a VerbNet class which proposes broad candidate event types from the verb or verb family\. WordNet gives synsets and a hypernym hierarchy for semantic typing of extracted arguments, such as distinguishing structures \(e\.g\., vehicle\) from their parts \(e\.g\., car window\)\. A lemma is the basic dictionary form of a word\. For example,stole,stealsandstealingall reduce to the lemmasteal\. This begins with PropBank predicate senses, which are mapped through SemLink to VerbNet classes and then to WordNet synsets\. Event typing is performed intwostages: the predicate–family stage, which generates a coarse set of candidate event types \(e\.g\.,theft,entry,break\) for a given predicate or predicate family; and the argument–sensitive, refined stage, where the candidate events are filtered, given information about the typed participants and objects\. In cases where a sense has no sense mapping, a lemma–level lexical retrieval is performed and this fallback is recorded in the extracted event\.
##### WordNet–supported typing\.
We use WordNet to support“is–a”checks between arguments and lexical entries whereis–a\(x,y\)\(x,y\)means thatxxis a more specific type ofyyin the hypernym hierarchy\[[23](https://arxiv.org/html/2605.15978#bib.bib24)\]\. For example if asedanis linked through WordNet tocarandcartovehiclethenis–a\(sedan,vehicle\)\(\\textit\{sedan\},\\textit\{vehicle\}\)gives evidence that the argument belongs to a vehicle class\. In“he broke the window”we link the objectwindowto the noun synsetwindow\.n\.01and follow the path and its hypernyms \(window→\\rightarrowwindow\.n\.01→\\rightarrowstructure\_part\) and determine that window is a structural part\. With the break–like predicate sense on the verb get, this supports a𝖥𝗈𝗋𝖼𝖾𝖽𝖤𝗇𝗍𝗋𝗒𝖤𝗏𝖾𝗇𝗍\\mathsf\{ForcedEntryEvent\}or𝖯𝗋𝗈𝗉𝖾𝗋𝗍𝗒𝖣𝖺𝗆𝖺𝗀𝖾𝖤𝗏𝖾𝗇𝗍\\mathsf\{PropertyDamageEvent\}\.
##### Argument–sensitive disambiguation\.
PropBank predicate senses alone are not sufficient to give a stable policing event type\. So, predicate typing is resolved from the predicate family and the typed argument structure\. For example, theft interpretations are strengthened when the affected object belongs to a property–related class and damage interpretations are strengthened when the object is typed as a structure part or vehicle part\. This avoids collapsing semantically different cases under the same predicate\.
##### Confidence score\.
For each event mentionee, we give a scorec\(e\)∈\[0,1\]c\(e\)\\in\[0,1\]that says to what extent the assigned event type is supported by the given evidence\. This is not a learned probability but anexploratory heuristicto summarize the amount of evidence behind an event typing decision\. In[V\-C](https://arxiv.org/html/2605.15978#S5.SS3)the confidence score is treated as a reliability–oriented summary\. We report the proportion of events whose confidence goes beyond the given threshold as an indicator of strongly typed extractions\. The score is computed from event categories, lexical grounding quality, structural rule support and penalties for ambiguity\. We first compute a raw score
craw\(e\)=b\(e\)\+glex\(e\)\+gstruct\(e\)−p\(e\),c\_\{\\mathrm\{raw\}\}\(e\)=b\(e\)\+g\_\{\\mathrm\{lex\}\}\(e\)\+g\_\{\\mathrm\{struct\}\}\(e\)\-p\(e\),where the base termb\(e\)b\(e\)comes from the initial policing bucket of the event\. Events in theincident\_coregroup start at0\.550\.55,police\_actionstart at0\.500\.50,context\_adminevents start at0\.120\.12and uncertain events start at0\.300\.30\. These initial values give a starting point and judgment about how central the event is to the report\. The main incident events start higher and some uncertain events begin lower\. The lexical termglex\(e\)g\_\{\\mathrm\{lex\}\}\(e\)is how strongly the predicate is grounded in the lexical resources\. When the event is supported through the full PropBank→\\rightarrowVerbNet→\\rightarrowWordNet path we add\+0\.25\+0\.25\. When that full path is unavailable and the system falls back to a lemma it adds only\+0\.10\+0\.10, so stronger lexical mapping gives a larger increase\. The termgstruct\(e\)g\_\{\\mathrm\{struct\}\}\(e\)measures how well the event matches an expected event pattern\. If the predicate has a match for a target event type such as theft, entry or damage then the score increases by\+0\.25\+0\.25and object evidence can add\+0\.15\+0\.15\. For example, if the typing rule containsobj\_propertyand the object lemmas include property–related words such aswalletthe score is increased\. Also, if the rule containsobj\_structure\_or\_vehicle\_partand the object includes words such asdoor,windowor other structure or vehicle terms, the score also increases\. The penalty termp\(e\)p\(e\)reduces the score when the evidence is more ambiguous or uncertain\. We subtract a small amount based on the number of WordNet synsets and VerbNet senses associated with the predicate so more lexical ambiguity lowers the confidence\. We also subtract0\.350\.35for negation in core incident events or0\.100\.10for negation outside that group, and subtract0\.120\.12for uncertain language such asappears,possiblyorlikely\. When a rule is available, the final score is:
c\(e\)=αr\(ρe\)\+\(1−α\)bound\[0,1\]\(craw\(e\)\)\.c\(e\)=\\alpha\\,r\(\\rho\_\{e\}\)\+\(1\-\\alpha\)\\,\\mathrm\{bound\}\_\{\[0,1\]\}\\\!\\big\(c\_\{\\mathrm\{raw\}\}\(e\)\\big\)\.Otherwise:
c\(e\)=bound\[0,1\]\(craw\(e\)\)\.c\(e\)=\\mathrm\{bound\}\_\{\[0,1\]\}\(c\_\{\\mathrm\{raw\}\}\(e\)\)\.Hereρe\\rho\_\{e\}is the triggered typing rule,r\(ρe\)r\(\\rho\_\{e\}\)is the rule prior andα=0\.7\\alpha=0\.7\. This means the final score is computed as70%70\\%rule prior and30%30\\%evidence–based score\. This was done so that when a specific rule strongly supports the typing, that prior has the main influence but it still allows the lexical and structural evidence to adjust the final value\. Also,
bound\[0,1\]\(x\)=max\(0,min\(1,x\)\),\\mathrm\{bound\}\_\{\[0,1\]\}\(x\)=\\max\(0,\\min\(1,x\)\),so the final output is between0and11\. Table[III](https://arxiv.org/html/2605.15978#S4.T3)shows example extracted events with different confidence scores where higher values ofc\(e\)c\(e\)mean better supported extractions\. For examplekick\-0101typed as𝖥𝗈𝗋𝖼𝖾𝖽𝖤𝗇𝗍𝗋𝗒𝖤𝗏𝖾𝗇𝗍\\mathsf\{ForcedEntryEvent\}\. This event starts at0\.550\.55because it is in theincident\_corebucket\. If it is grounded through the full semantic path it gets\+0\.25\+0\.25, bringing it to0\.800\.80\. Sincekickmatches a damage–related anchor, it gets another\+0\.25\+0\.25which brings it to1\.051\.05\. Because the object is something likedoorthen the ruledamage\_anchor\+obj\_structure\_or\_vehicle\_partadds\+0\.15\+0\.15, bringing it to1\.201\.20\. Then the score is capped at an object–supported rule at0\.980\.98which blends with the rule prior0\.850\.85as0\.7\(0\.85\)\+0\.3\(0\.98\)=0\.8890\.7\(0\.85\)\+0\.3\(0\.98\)=0\.889and then adds\+0\.03\+0\.03because the rule is highly specific and this gives0\.9190\.919\. Another example istake\-0101with0\.8500\.850score for𝖳𝗁𝖾𝖿𝗍𝖤𝗏𝖾𝗇𝗍\\mathsf\{TheftEvent\}where the narrative supports theft with stolen items\. An event likeleave\-1515is typed by anarrative\_actionthat starts lower because it belongs touncertainbucket \(at0\.300\.30\) anddiscover\-0101has a score of0\.5220\.522because the narrative evidence is less specific\.
TABLE III:Example extracted events across different scores\.
### IV\-ETemporal Reasoning
We construct case temporal graphs for temporal relationships where the nodes in the graphs are individual events, and the edges are precedence relations supported by evidence\. We exploit temporal cues derived from the narrative to construct cue–based edges and complement these with domain–specific rules for precedence\.
#### IV\-E1Timeline Edges\.
We aim to establish an explicit timeline of events by using temporal cues such asthen,afterandbeforeto order closely related event mentions\.
TABLE IV:Precedence edges added to the temporal graph\.Local cue rules\.For adjacent sentences if sentencesi\+1s\_\{i\+1\}begins with or has cues such asthen,afterorbefore, the system links the event mention selected from sentencesis\_\{i\}to the event mention in sentencesi\+1s\_\{i\+1\}with aPrecedesedge:
Then\(si\+1\)∧Ev\(ei,si\)∧Ev\(ei\+1,si\+1\)→Precedes\(ei,ei\+1\)\.\\textit\{Then\}\(s\_\{i\+1\}\)\\wedge\\textit\{Ev\}\(e\_\{i\},s\_\{i\}\)\\wedge\\textit\{Ev\}\(e\_\{i\+1\},s\_\{i\+1\}\)\\to\\textit\{Precedes\}\(e\_\{i\},e\_\{i\+1\}\)\.For example“Suspect \(S\) entered the home\. Then Victim \(V\) discovered the damage,”the cuethenplaces the discovery event after the entry event\. Within a single sentence, cues such asbeforeandafterare used in the same way\. And in this example“the suspect broke the window before entering the home,”the breaking event is ordered before the entering event\.
Domain axioms\.We also apply a small set of local domain precedence axioms over event classes \(shown in Table[IV](https://arxiv.org/html/2605.15978#S4.T4)\)\. For example if a narrative states that someone broke into a house and later property was taken the system adds an edge placing the forced–entry event before theft event so these axioms add candidatePrecedesedges:
ForcedEntryEvent\(ef,c\)∧TheftEvent\(et,c\)→Precedes\(ef,et\)\.\\textit\{ForcedEntryEvent\}\(e\_\{f\},c\)\\wedge\\textit\{TheftEvent\}\(e\_\{t\},c\)\\to\\textit\{Precedes\}\(e\_\{f\},e\_\{t\}\)\.
#### IV\-E2Temporal Graph Construction\.
By constructing typed event nodes, participants and precedence edges, we build temporal graphs that correspond to redacted narratives \(see[V\-C](https://arxiv.org/html/2605.15978#S5.SS3)\)\. The resulting graphs have links to the sentences for verification of which events were extracted from the corresponding AMR and how local cues and domain axioms contributed to event temporal ordering\.
## VEvaluation
### V\-AExperimental Setup
This evaluation measures how well the pipeline structures its output in events, frames and temporal edges\. We include a short review to determine the practical usefulness of the symbolic approach\. To address RQ2\{2\}, we conducted a short questionnaire with55redacted narratives, one from each offense category:Burglary,Larceny,Motor Vehicle Theft,Stolen PropertyandRobbery, and we asked66reviewers to answer the same99questions for each case\. The full questionnaire is provided in Table[IX](https://arxiv.org/html/2605.15978#S10.T9)in Appendix[X](https://arxiv.org/html/2605.15978#S10)\.
### V\-BMetrics
We use two complementary forms of evaluation:
##### Corpus–level\.
We report corpus–level results from the symbolic pipeline outputs, including role coverage, event typing, semantic grounding coverage, participant counts, frame slot filling, and temporal edge coverage\. The confidence score was measured as:
HighConf\(τ\)=\|\{e∈ℰ:c\(e\)≥τ\}\|\|ℰ\|,\\text\{HighConf\}\(\\tau\)=\\frac\{\|\\\{e\\in\\mathcal\{E\}:c\(e\)\\geq\\tau\\\}\|\}\{\|\\mathcal\{E\}\|\},whereℰ\\mathcal\{E\}is the set of extracted events,c\(e\)c\(e\)is the confidence score assigned to eventeeandτ\\tauis a threshold \(in our case,τ=0\.80\\tau=0\.80\)\. PB→\\rightarrowVN→\\rightarrowWN coverage was computed as the percentage of extracted events whose semantic mapping followed the full path as well as the lemma→\\rightarrowWN fallback\. Participants per case were summarized using the number of unique extracted participants in each case, and we report the median and maximum over all cases\. Frame slot filling was measured as the percentage of relevant frames in which a given slot was non–empty\. Temporal support was reported in terms of the percentage of cases where at least one temporal edge was extracted with the average number and proportion of cue vs\. axiom–based edges\.
TABLE V:Corpus–level extraction and ordering results\.
##### Human review\.
The answers from the reviewers were summarized using majority agreement, confidence and ambiguity measures\. For questions with a single–choice response, the most frequently selected answer was taken as the summary human label and ties were labeled asNot clear\. For multiple–choice questions, an option was included only if it received more than a strict majority of reviewer votes, using the thresholdt=⌊n2⌋\+1t=\\left\\lfloor\\frac\{n\}\{2\}\\right\\rfloor\+1\. Withn=6n=6, this required at least44votes\. Human–system agreement was computed only on cases where the human majority was clear\. Here,nmatchn\_\{\\text\{match\}\}is the number of matches between the system and the human reference andnhuman\-clearn\_\{\\text\{human\-clear\}\}is the number of cases for which the human reference was clear:
Agreement=nmatchnhuman\-clear×100\.\\text\{Agreement\}=\\frac\{n\_\{\\text\{match\}\}\}\{n\_\{\\text\{human\-clear\}\}\}\\times 100\.We also report majority support as the percentage of response outcomes with a clear majority label, andHuman Not clearas the percentage of outcomes for which no clear human majority was reached\. For the human evaluation, we compared the system’s response against the human majority vote for every question\. Theprecisionshows how often the system was correct when it predicted that a detail was present, whilerecallshows how often the system found a detail when the human reviewers agreed it was present, andF11combines precision and recall\. These scores were computed only on cases with a clear human majority\. For Yes/No questions, we treated the positive case as the presence of the detail\.
### V\-CResults
Table[V](https://arxiv.org/html/2605.15978#S5.T5)shows the corpus–level results over450450reports\. The symbolic framework produced6,6866\{,\}686events and6,6526\{,\}652frames\.ARG1coverage was much higher thanARG0\(78\.8%78\.8\\%vs\.29\.8%29\.8\\%\) since affected objects are often stated more explicitly than actor identity\. The proportion of high–confidence \(≥\\geq0\.800\.80\) events was54\.1%54\.1\\%, and for most typed events, the semantic path used to link them was complete \(93\.7%93\.7\\%\)\. In terms of frame analysis, important details about events, such as the entry point \(41\.8%41\.8\\%\), entry method \(33\.4%33\.4\\%\), and stolen items \(22\.8%22\.8\\%\) were recovered\. The most temporal ordering edges were introduced by domain axioms \(78\.4%78\.4\\%\)\.
Figure 4:Ranked frequency distribution of extracted AMR predicate senses forBurglary\. Purple curves show meaningful senses, yellow curves show trivial senses, and the annotation box summarizes top senses\.To better understand those results Figure[4](https://arxiv.org/html/2605.15978#S5.F4)shows the ranked frequency distributions of extracted AMR predicate senses forBurglary\. The meaningful sense curve shows the predicates with the event content of the narratives and the trivial sense curve shows predicates that are less useful for offense interpretation\. InBurglarythe most frequent meaningful predicates areburglarize\-0101,enter\-0101andbreak\-0101\. Trivial predicates include forms such asbe\-located\-at\-9191andhave\-0303which are common in AMR graphs and contribute mainly to graph structure\. The burglary sense distribution shows that these narratives have common patterns but also diverse wording\. Some predicates appear frequently which give common burglary concepts such as breaking, property loss or damage, however the long tail of lower–frequency senses appear less which shows that these incidents can be described in many different ways\. This is important for scaling up to other types of crimes because a keyword–based approach could miss many of these variations and symbolic approach helps by grouping different words under the same event classes and ontology concepts\. The plots forLarceny,Motor Vehicle Theft,Stolen PropertyandRobberyare provided in Figure[6](https://arxiv.org/html/2605.15978#S11.F6)in Appendix[XI](https://arxiv.org/html/2605.15978#S11)\.
Figure 5:Case–level symbolic graph from a redacted narrative\. Solid edges show narrative sequencing and dashed edges show domain axioms\.TABLE VI:Human agreement in the five–case review\.Agreement between the system and human majority for cases where reviewers reached a clear majority in Table[VI](https://arxiv.org/html/2605.15978#S5.T6)\) was100%100\\%for incident initiation \(Q11\), stolen items \(Q66\), time cues \(Q77\) and participant roles \(Q88\)\. Agreement was80\.0%80\.0\\%for vehicle involvement \(Q22\), theft stated \(Q55\), answerability \(Q99\) and40\.0%40\.0\\%for forced entry \(Q33\)\. Ambiguity was highest for entry point \(Q44\) where100\.0%100\.0\\%of the cases wereNot clear\(see Figure[7](https://arxiv.org/html/2605.15978#S12.F7)in Appendix[XII](https://arxiv.org/html/2605.15978#S12)\)\. Precision, recall and F11against human–majority were75\.0/100\.0/85\.775\.0/100\.0/85\.7for vehicle involvement,50\.0/50\.0/50\.050\.0/50\.0/50\.0for forced entry,100\.0/80\.0/88\.9100\.0/80\.0/88\.9for theft stated,100\.0/100\.0/100\.0100\.0/100\.0/100\.0for stolen items and time cues and80\.0/80\.0/80\.080\.0/80\.0/80\.0for participant roles\. Forced entry was challenging where in some cases the pipeline identified forced entry but reviewers marked that evidence as uncertain \(Table[VII](https://arxiv.org/html/2605.15978#S5.T7)\)\.
TABLE VII:Precision, recall, and F1 against human–majority votes\.Figure[5](https://arxiv.org/html/2605.15978#S5.F5)shows a partial view of a case temporal graph\. The output is mapped to police actions, narrative actions and main incident events capturing the language of building entry point, the specific items that were stolen and the local ordering of events\. The full graph is provided in Figure[8](https://arxiv.org/html/2605.15978#S13.F8)in Appendix[XIII](https://arxiv.org/html/2605.15978#S13)\. Our findings support both research questions\. ForRQ1, our symbolic pipeline is able to extract evidence–linked facts from cases and relate them to the ontology for reasoning\. In many cases, the pipeline is able to recover temporal facts beyond, e\.g\., for theft, it is able to recover facts such as what was stolen and how the events of a case are sequenced\. Reviews of redacted narratives forRQ2show that many questions about entities, events, and roles can still be answered from redacted narratives\.
## VIDiscussion
Police narratives can be different in completeness, wording and style, so redaction, OCR or the use of abbreviations such asV,SorWcan make extraction less reliable\. Moreover, the use of anonymized references might lead to confusion in the interpretation of descriptive phrases that are meant to refer to unknown persons\. The environment used in this work does not have a gold standard corpus for events, participants, and temporal ordering, so the evaluation focuses on traceability and not gold–standard benchmark performance\. The intended use of this pipeline is to extract entities, events, roles and their ordering\. However, it should not be used for making conclusions about investigations, ranking police officers or deciding about individuals without human review\. In the future we will focus on improving entity linking, so that repeated mentions across reports are more reliably mapped to the same people, vehicles, or items\. We will also compare the symbolic approach with the outputs produced by a large language model\.
## VIIConclusion
Police narratives contain important details beyond the information from structured fields\. Incident details can be recovered using a symbolic ontology approach, augmented with AMR, PropBank, VerbNet and WordNet\. Results show that for the extracted events,54\.1%54\.1\\%have a confidence score≥\\geq0\.800\.80and93\.7%93\.7\\%were mapped through the full semantic hierarchy\. Human reviewers had100%100\\%agreement for incident initiation, stolen items and specific time cues\. However, temporal ordering was the most challenging for forced entry signal\. Overall, these results show that symbolic NLU can show uncertainty through confidence scores as well as keep traceability through lexicon and sentence evidence\.
## VIIIAcknowledgments
I thank my faculty advisor Dr\. Jansen Orfan, and the Rochester \(NY\) Police Department Office of Business Intelligence for the data, guidance and support\.
## References
- \[1\]\(2013\)Automatically deriving event ontologies for a commonsense knowledge base\.InProceedings of the 10th International Conference on Computational Semantics \(IWCS 2013\) – Long Papers,Potsdam, Germany,pp\. 23–34\.External Links:[Link](https://aclanthology.org/W13-0103/)Cited by:[§II](https://arxiv.org/html/2605.15978#S2.p1.2)\.
- \[2\]J\. Allen, W\. Beaumont, N\. Blaylock, G\. Ferguson, J\. Orfan, and M\. Swift\(2011\)Acquiring commonsense knowledge for a cognitive agent\.InAdvances in Cognitive Systems: Papers from the 2011 AAAI Fall Symposium \(FS–11–01\),External Links:[Link](https://aaai.org/papers/04192-4192-acquiring-commonsense-knowledge-for-a-cognitive-agent/)Cited by:[§II](https://arxiv.org/html/2605.15978#S2.p1.2)\.
- \[3\]J\. Allen and G\. Ferguson\(1994\)Actions and events in interval temporal logic\.pp\. 531–579\.External Links:[Link](https://doi.org/10.1093/logcom/4.5.531),[Document](https://dx.doi.org/https%3A//doi.org/10.1093/logcom/4.5.531)Cited by:[§II](https://arxiv.org/html/2605.15978#S2.p1.2)\.
- \[4\]J\. Allen\(1993\)Natural language, knowledge representation, and logical form\.InA Symposium on Future Directions in Natural Language Processing on Challenges in Natural Language Processing,USA,pp\. 146–175\.External Links:[Link](https://apps.dtic.mil/sti/tr/pdf/ADA247389.pdf),ISBN 0521410150Cited by:[§II](https://arxiv.org/html/2605.15978#S2.p1.2)\.
- \[5\]J\. Allen\(1995\)Natural language understanding \(2nd ed\.\)\.Benjamin\-Cummings Publishing Co\., Inc\.,USA\.External Links:ISBN 978\-0\-8053\-0334\-6,[Link](https://dl.acm.org/doi/book/10.5555/199291)Cited by:[§II](https://arxiv.org/html/2605.15978#S2.p1.2)\.
- \[6\]C\. F\. Baker\(2014\-06\)FrameNet: a knowledge base for natural language processing\.InProceedings of Frame Semantics in NLP: A Workshop in Honor of Chuck Fillmore \(1929–2014\),Baltimore, MD, USA,pp\. 1–5\.External Links:[Link](https://aclanthology.org/W14-3001/),[Document](https://dx.doi.org/10.3115/v1/W14-3001)Cited by:[§II](https://arxiv.org/html/2605.15978#S2.p1.2)\.
- \[7\]E\. Bifari, A\. Basbrain, R\. Mirza, A\. Bafail, S\. Albaradei, and W\. Alhalabi\(2024\)Text mining and machine learning for crime classification: using unstructured narrative court documents in police academic\.Cogent Social Sciences, 11\(1\)\.External Links:[Link](https://www.tandfonline.com/doi/full/10.1080/23311916.2024.2359850),[Document](https://dx.doi.org/10.1080/23311916.2024.2359850)Cited by:[§II](https://arxiv.org/html/2605.15978#S2.p1.2)\.
- \[8\]S\. Bird, E\. Klein, and E\. Loper\(2009\)Natural language processing with python\.O’Reilly Media\.External Links:[Link](https://www.oreilly.com/library/view/natural-language-processing/9780596803346/)Cited by:[§IV\-A](https://arxiv.org/html/2605.15978#S4.SS1.p1.6)\.
- \[9\]M\. Chau, J\. J\. Xu, and H\. Chen\(2002\)Extracting meaningful entities from police narrative reports\.InProceedings of the 2002 Annual National Conference on Digital Government Research,dg\.o ’02,pp\. 1–5\.External Links:[Link](https://dl.acm.org/doi/10.5555/1123098.1123138)Cited by:[§II](https://arxiv.org/html/2605.15978#S2.p1.2)\.
- \[10\]B\. Glimm, I\. Horrocks, B\. Motik, G\. Stoilos, and Z\. Wang\(2014\-10\)HermiT: an owl 2 reasoner\.Journal of Automated Reasoning53\(3\),pp\. 245–269\.External Links:[Link](https://doi.org/10.1007/s10817-014-9305-1),[Document](https://dx.doi.org/10.1007/s10817-014-9305-1)Cited by:[§IV\-C](https://arxiv.org/html/2605.15978#S4.SS3.SSS0.Px2.p2.1)\.
- \[11\]M\. W\. Goodman\(2020\-07\)Penman: an open\-source library and tool for amr graphs\.InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations,Online,pp\. 312–319\.External Links:[Link](https://aclanthology.org/2020.acl-demos.35/),[Document](https://dx.doi.org/10.18653/v1/2020.acl-demos.35)Cited by:[§IV\-B](https://arxiv.org/html/2605.15978#S4.SS2.p1.2)\.
- \[12\]C\. D\. Güss, Ma\. T\. Tuason, and A\. Devine\(2020\)Problems with police reports as data sources: a researchers’ perspective\.Frontiers in Psychology, Volume 11\.External Links:[Link](https://www.frontiersin.org/journals/psychology/articles/10.3389/fpsyg.2020.582428/full),[Document](https://dx.doi.org/10.3389/fpsyg.2020.582428)Cited by:[§II](https://arxiv.org/html/2605.15978#S2.p1.2)\.
- \[13\]K\. Kipper, A\. Korhonen, N\. Ryant, and M\. Palmer\(2006\)Extending verbnet with novel verb classes\.InProceedings of the Fifth International Conference on Language Resources and Evaluation \(LREC’06\),Genoa, Italy,pp\. 1027–1032\.External Links:[Link](https://aclanthology.org/L06-1280/)Cited by:[§II](https://arxiv.org/html/2605.15978#S2.p1.2)\.
- \[14\]B\. Mansouri\(2025\)Survey of abstract meaning representation: then, now, future\.External Links:2505\.03229,[Link](https://arxiv.org/abs/2505.03229),[Document](https://dx.doi.org/https%3A//doi.org/10.48550/arXiv.2505.03229)Cited by:[§II](https://arxiv.org/html/2605.15978#S2.p1.2)\.
- \[15\]Metropolitan Police Department \(Washington, DC\)\(2025\)Basic report writing\.External Links:[Link](https://mpdc.dc.gov/sites/default/files/dc/sites/mpdc/publication/attachments/3.1%20Basic%20Report%20Writing%20-%20IA_071625.pdf)Cited by:[§I](https://arxiv.org/html/2605.15978#S1.p1.1)\.
- \[16\]G\. A\. Miller\(1995\-11\)WordNet: a lexical database for english\.Commun\. ACM38\(11\),pp\. 39–41\.External Links:ISSN 0001\-0782,[Link](https://doi.org/10.1145/219717.219748),[Document](https://dx.doi.org/10.1145/219717.219748)Cited by:[§II](https://arxiv.org/html/2605.15978#S2.p1.2)\.
- \[17\]F\. Navarrete, L\. A\. Garrido, C\. Bobed, M\. Atencia, and A\. Vallecillo\(2024\)Ontology\-driven automated reasoning about property crimes\.pp\. 687–710\.External Links:[Link](https://link.springer.com/article/10.1007/s12599-024-00886-3),[Document](https://dx.doi.org/https%3A//doi.org/10.1007/s12599-024-00886-3)Cited by:[§II](https://arxiv.org/html/2605.15978#S2.p1.2)\.
- \[18\]R\. I\. of Technology\(2026\)Research computing services\.Rochester Institute of Technology\.External Links:[Link](https://www.rit.edu/researchcomputing/),[Document](https://dx.doi.org/10.34788/0S3G-QD15)Cited by:[§III](https://arxiv.org/html/2605.15978#S3.p3.1)\.
- \[19\]J\. Orfan and J\. Allen\(2017\)Identifying underlying commonsense knowledge in definitions\.InProceedings of the Thirtieth International Florida Artificial Intelligence Research Society Conference \(FLAIRS\),External Links:[Link](https://aaai.org/papers/688-flairs-2017-15494/)Cited by:[§II](https://arxiv.org/html/2605.15978#S2.p1.2)\.
- \[20\]J\. Orfan\(2020\)Toward deep language understanding: methods for learning conceptual knowledge from definitions\.Note:University of Rochester Institutional Publication Record\.External Links:[Link](https://urresearch.rochester.edu/institutionalPublicationPublicView.action?institutionalItemId=35547)Cited by:[§II](https://arxiv.org/html/2605.15978#S2.p1.2)\.
- \[21\]M\. Palmer, D\. Gildea, and P\. Kingsbury\(2005\-03\)The proposition bank: an annotated corpus of semantic roles\.Computational Linguistics31\(1\),pp\. 71–106\.External Links:ISSN 0891\-2017,[Link](https://doi.org/10.1162/0891201053630264),[Document](https://dx.doi.org/10.1162/0891201053630264)Cited by:[§II](https://arxiv.org/html/2605.15978#S2.p1.2)\.
- \[22\]P\. Rane, A\. Rao, D\. Verma, and A\. Mhaisgawali\(2021\)Redacting sensitive information from the data\.In2021 International Conference on Smart Generation Computing, Communication and Networking \(SMART GENCON\),pp\. 1–5\.External Links:[Link](https://ieeexplore.ieee.org/document/9645752),[Document](https://dx.doi.org/10.1109/SMARTGENCON51891.2021.9645752)Cited by:[§II](https://arxiv.org/html/2605.15978#S2.p1.2)\.
- \[23\]L\. K\. Schubert\(2002\)Can we derive general world knowledge from texts?\.InProceedings of the Second International Conference on Human Language Technology Research,HLT ’02,San Francisco, CA, USA,pp\. 94–97\.External Links:[Document](https://dx.doi.org/10.3115/1289189.1289263),[Link](https://dl.acm.org/doi/10.5555/1289189.1289263)Cited by:[§IV\-D](https://arxiv.org/html/2605.15978#S4.SS4.SSS0.Px1.p1.8)\.
- \[24\]R\. Smith\(2007\)An overview of the tesseract ocr engine\.InProceedings of the Ninth International Conference on Document Analysis and Recognition \(ICDAR 2007\),pp\. 629–633\.External Links:[Link](https://research.google/pubs/an-overview-of-the-tesseract-ocr-engine/)Cited by:[item 1](https://arxiv.org/html/2605.15978#S3.I1.i1.p1.1)\.
- \[25\]A\. Srbinovska, A\. Srbinovska, V\. Senthil, A\. Martin, J\. McCluskey, J\. Bateman, and E\. Fokoué\(2025\)Towards ai\-driven policing: interdisciplinary knowledge discovery from police body\-worn camera footage\.Note:arXivExternal Links:[Link](https://arxiv.org/abs/2504.20007)Cited by:[§II](https://arxiv.org/html/2605.15978#S2.p1.2)\.
- \[26\]K\. Stowe, J\. Preciado, K\. Conger, S\. W\. Brown, G\. Kazeminejad, and M\. Palmer\(2021\)SemLink 2\.0: chasing lexical resources\.InProceedings of the 14th International Conference on Computational Semantics \(IWCS\),Groningen, The Netherlands \(online\),pp\. 222–227\.External Links:[Link](https://aclanthology.org/2021.iwcs-1.21/)Cited by:[§II](https://arxiv.org/html/2605.15978#S2.p1.2)\.
- \[27\]C\. Wang and N\. Xue\(2017\)Getting the most out of amr parsing\.InProceedings of the 2017 Conference on Empirical Methods in Natural Language Processing,Copenhagen, Denmark,pp\. 1257–1268\.External Links:[Link](https://aclanthology.org/D17-1129/),[Document](https://dx.doi.org/10.18653/v1/D17-1129)Cited by:[§II](https://arxiv.org/html/2605.15978#S2.p1.2)\.
- \[28\]S\. Zhang, X\. Ma, K\. Duh, and B\. Van Durme\(2019\)AMR parsing as sequence\-to\-graph transduction\.InProceedings of the 57th Annual Meeting of the Association for Computational Linguistics,Florence, Italy,pp\. 80–94\.External Links:[Link](https://aclanthology.org/P19-1009/),[Document](https://dx.doi.org/10.18653/v1/P19-1009)Cited by:[§IV\-B](https://arxiv.org/html/2605.15978#S4.SS2.p1.2)\.
## Appendix
## IXSupplementary Ontology Statistics
TABLE VIII:Derived ontology indicators from Protégé metrics\.
## XHuman Review Questionnaire
TABLE IX:Questionnaire used for human review of redacted narratives\.
## XIAdditional sense–frequency plots
Figure 6:Ranked frequency distributions of extracted AMR predicate senses forLarceny,Motor Vehicle Theft,Stolen Property, andRobbery\.
## XIIHuman Review Ambiguity
0101020203030404050506060707080809090100100Q1 InitiationQ2 VehicleQ3 Forced entryQ4 Entry pointQ5 Theft statedQ6 Items namedQ7 Time cueQ8 RolesQ9 Confidence76\.776\.786\.786\.786\.786\.7404096\.796\.710010083\.383\.3303076\.776\.740400010010000060600%Majority supportNo clear majorityFigure 7:Reviewer ambiguity in the five–case review\.
## XIIITemporal Case Graph
Figure 8:Full temporal case graph from an example redacted report, showing police actions, narrative actions, core incident events, entity links, and temporal relations\.Similar Articles
Decompose-and-Refine: Structured Legal Question Answering with Parametric Retrieval
Proposes Decompose-and-Refine (DaR), a framework for statute-grounded legal question answering that decomposes complex questions into atomic sub-questions and generates parametric queries for precise statutory retrieval, showing improvements on the KoBLEX benchmark.
@akshay_pachaar: https://x.com/akshay_pachaar/status/2058976178908885210
Explains how to fix agent memory by defining an ontology using Pydantic schemas, enabling structured extraction into knowledge graphs for multi-hop reasoning, with an open-source solution (Zep).
Bridging Legal Interpretation and Formal Logic: Faithfulness, Assumption, and the Future of AI Legal Reasoning
This paper identifies a systematic gap between legal interpretation and formal logic in AI legal reasoning, proposes a neuro-symbolic approach to bridge it, and demonstrates substantial label shifts when re-annotating legal NLI data under strict formal entailment.
ReasonOps: Operator Segmentation for LLM Reasoning Traces
ReasonOps introduces an unsupervised method for annotating chain-of-thought traces from large reasoning models, identifying 7 recurring reasoning operators. The method enables analysis of reasoning structure, model identification, and correctness prediction across 12 models and 8 benchmarks.
From Snippets to Semantics: Rethinking Evidence Granularity for Multilingual Fact Verification
This paper introduces SEEK, a framework for semantic evidence extraction in multilingual fact verification, which constructs coherent evidence chunks from full articles and fine-tunes multilingual LLMs with LoRA, achieving up to 20% improvement in macro-F1 over baselines.