Protocol for evaluating ChatGPT in biomedical association generation and verification using a RAG-enabled, cross-model majority voting workflow

arXiv cs.CL 06/01/26, 04:00 AM Papers

biomedical chatgpt evaluation retrieval-augmented-generation cross-model majority-voting protocol

Summary

This arXiv paper presents a protocol for evaluating ChatGPT's ability to generate and verify biomedical associations using a RAG-enabled, cross-model majority voting workflow to address hallucination and ontology limitations.

arXiv:2605.30400v1 Announce Type: new Abstract: We present a protocol to evaluate ChatGPT's ability to generate disease-centric biomedical associations. It outlines how we generate the associations, validate the biological entities using biomedical ontologies, and verify associations using literature. The protocol includes a self-consistency strategy to assess generative reliability across ChatGPT models. To address ontology exact-match limitations, we provide a use case performing semantic verification through a workflow enabled by Retrieval-Augmented Generation (RAG) powered by open-source large language models (LLMs). This enables LLMs to establish truth over content generated by other LLMs and expose hallucination.

Original Article

View Cached Full Text

Cached at: 06/01/26, 09:22 AM

# Protocol for evaluating ChatGPT in biomedical association generation and verification using a RAG-enabled, cross-model majority voting workflow
Source: [https://arxiv.org/abs/2605.30400](https://arxiv.org/abs/2605.30400)
[View PDF](https://arxiv.org/pdf/2605.30400)

> Abstract:We present a protocol to evaluate ChatGPT's ability to generate disease\-centric biomedical associations\. It outlines how we generate the associations, validate the biological entities using biomedical ontologies, and verify associations using literature\. The protocol includes a self\-consistency strategy to assess generative reliability across ChatGPT models\. To address ontology exact\-match limitations, we provide a use case performing semantic verification through a workflow enabled by Retrieval\-Augmented Generation \(RAG\) powered by open\-source large language models \(LLMs\)\. This enables LLMs to establish truth over content generated by other LLMs and expose hallucination\.

## Submission history

From: Ahmed Abdeen Hamed Ph\.D \[[view email](https://arxiv.org/show-email/c942b358/2605.30400)\] **\[v1\]**Thu, 28 May 2026 16:01:24 UTC \(2,289 KB\)

Protocol for evaluating ChatGPT in biomedical association generation and verification using a RAG-enabled, cross-model majority voting workflow

Similar Articles

An A.I. Aggregator?

Navigating health questions with ChatGPT

Introducing ChatGPT Health

ChatGPT for research

Making ChatGPT better for clinicians

Submit Feedback