Automatic Extraction of Structured Information from Brain MRI Reports Using an Open-Weight Large Language Model
Summary
This paper evaluates the open-weight LLM LLaMA 3.1 for automatic extraction of structured data from Dutch brain MRI reports, achieving high performance on visual rating scores and accurate detection of findings, with few-shot prompting improving extraction of numerical variables.
View Cached Full Text
Cached at: 06/09/26, 08:52 AM
# Automatic Extraction of Structured Information from Brain MRI Reports Using an Open-Weight Large Language Model Source: [https://arxiv.org/abs/2606.07721](https://arxiv.org/abs/2606.07721) Authors:[Kaouther Mouheb](https://arxiv.org/search/cs?searchtype=author&query=Mouheb,+K),[Amos Pomp](https://arxiv.org/search/cs?searchtype=author&query=Pomp,+A),[Antoine Manenti](https://arxiv.org/search/cs?searchtype=author&query=Manenti,+A),[Romy de Haan](https://arxiv.org/search/cs?searchtype=author&query=de+Haan,+R),[Farog Faghir](https://arxiv.org/search/cs?searchtype=author&query=Faghir,+F),[Joy Martens](https://arxiv.org/search/cs?searchtype=author&query=Martens,+J),[Harro Seelaar](https://arxiv.org/search/cs?searchtype=author&query=Seelaar,+H),[Francesco Mattace\-Raso](https://arxiv.org/search/cs?searchtype=author&query=Mattace-Raso,+F),[Meike W\. Vernooij](https://arxiv.org/search/cs?searchtype=author&query=Vernooij,+M+W),[Frank J\. Wolters](https://arxiv.org/search/cs?searchtype=author&query=Wolters,+F+J),[Stefan Klein](https://arxiv.org/search/cs?searchtype=author&query=Klein,+S),[Esther E\. Bron](https://arxiv.org/search/cs?searchtype=author&query=Bron,+E+E) [View PDF](https://arxiv.org/pdf/2606.07721) > Abstract:Objectives: Automatic data extraction from free\-text radiology reports enables large\-scale research, but few studies assessed the performance of large language models \(LLMs\) on Dutch neuroradiology reports\. Methods: We analyzed 947 brain MRI reports from a tertiary memory clinic \(2016\-2021\), authored by consultant neuroradiologists\. Trained medical students annotated thirty variables; 100 reports were double\-annotated to assess inter\-rater reliability\. We evaluated the performance of the open\-weight LLM LLaMA 3\.1 using different languages \(Dutch vs\. English translation\) and few\-shot prompting with different example selection strategies\. Performance was evaluated using balanced accuracy for categorical variables, accuracy and mean absolute error for counts, and text similarity for free\-text\. Metrics were computed across 10 random splits of the 947 reports\. Results: LLaMA 3\.1 demonstrated high zero\-shot performance for visual rating scores \(mean \[95%\-CI\]\): Medial Temporal Atrophy: 90% \[77\-100%\] on the left and 96% \[94\-99%\] on the right, Global Cortical Atrophy: 87% \[83\-91%\], and Fazekas: 94% \[93\-96%\]\. Microbleed mentions were detected with 93% accuracy \[92\-95%\] and infarct mentions with 82% \[80\-84%\]\. Text similarity for lesion location reached 0\.95 \[0\.95\-0\.96\]\. Performance was lower for numerical variables: 80% \[78\-82%\] for the number of microbleeds and 66% \[63\-68%\] for infarcts\. English translation yielded comparable results\. Few\-shot prompting improved performance for numerical variables, achieving 92% \[90\-93%\] for microbleeds and 81% \[77\-85%\] for infarcts using structural similarity\-based selection\. Conclusion: LLaMA 3\.1 shows strong potential for extracting data from Dutch neuroradiology reports\. Few\-shot prompting enhances performance for numerical variables, whereas challenges remain for location\-specific variables\. ## Submission history From: Kaouther Mouheb \[[view email](https://arxiv.org/show-email/fa603aba/2606.07721)\] **\[v1\]**Fri, 5 Jun 2026 15:57:35 UTC \(6,056 KB\)
Similar Articles
Sparse Autoencoders Map Brain-LLM Alignment onto Cortical Semantic Topography
This paper uses sparse autoencoders to decompose LLMs into interpretable features and shows that semantic features explain brain alignment with cortical semantic topography, generalizing across English, Chinese, and French.
@liquidai: Introducing LFM2.5-VL-1.6B-Extract and LFM2.5-VL-450M-Extract: Vision-language models that return structured JSON, not …
Liquid AI released LFM2.5-VL-1.6B-Extract and LFM2.5-VL-450M-Extract, vision-language models that output structured JSON from images and field lists. The models are open-weight and available in two sizes.
Agentic Large Language Models for Automated Structural Analysis of 3D Frame Systems
This paper proposes an agentic LLM framework for automated structural analysis of 3D frame systems from natural language inputs, achieving 90% accuracy on ten representative 3D frames through a multi-agent pipeline.
Brain-CLIPLM: Decoding Compressed Semantic Representations in EEG for Language Reconstruction
Researchers propose Brain-CLIPLM, a two-stage EEG-to-text decoding framework using contrastive learning for semantic anchor extraction and a retrieval-grounded LLM with Chain-of-Thought reasoning, achieving 67.55% top-5 sentence retrieval accuracy and suggesting EEG-to-text decoding should focus on recovering compressed semantic content rather than full sentence reconstruction.
Retrieval-Augmented Large Language Models for Schema-Constrained Clinical Information Extraction
This paper presents a modular retrieval-augmented generation (RAG) pipeline for extracting structured clinical observations from conversational nurse-patient transcripts, using schema-constrained prompting and second-pass auditing with Llama and GPT backbones, achieving 80.36% F1 score.