Protocol for evaluating ChatGPT in biomedical association generation and verification using a RAG-enabled, cross-model majority voting workflow

arXiv cs.CL Papers

Summary

This arXiv paper presents a protocol for evaluating ChatGPT's ability to generate and verify biomedical associations using a RAG-enabled, cross-model majority voting workflow to address hallucination and ontology limitations.

arXiv:2605.30400v1 Announce Type: new Abstract: We present a protocol to evaluate ChatGPT's ability to generate disease-centric biomedical associations. It outlines how we generate the associations, validate the biological entities using biomedical ontologies, and verify associations using literature. The protocol includes a self-consistency strategy to assess generative reliability across ChatGPT models. To address ontology exact-match limitations, we provide a use case performing semantic verification through a workflow enabled by Retrieval-Augmented Generation (RAG) powered by open-source large language models (LLMs). This enables LLMs to establish truth over content generated by other LLMs and expose hallucination.
Original Article
View Cached Full Text

Cached at: 06/01/26, 09:22 AM

# Protocol for evaluating ChatGPT in biomedical association generation and verification using a RAG-enabled, cross-model majority voting workflow
Source: [https://arxiv.org/abs/2605.30400](https://arxiv.org/abs/2605.30400)
[View PDF](https://arxiv.org/pdf/2605.30400)

> Abstract:We present a protocol to evaluate ChatGPT's ability to generate disease\-centric biomedical associations\. It outlines how we generate the associations, validate the biological entities using biomedical ontologies, and verify associations using literature\. The protocol includes a self\-consistency strategy to assess generative reliability across ChatGPT models\. To address ontology exact\-match limitations, we provide a use case performing semantic verification through a workflow enabled by Retrieval\-Augmented Generation \(RAG\) powered by open\-source large language models \(LLMs\)\. This enables LLMs to establish truth over content generated by other LLMs and expose hallucination\.

## Submission history

From: Ahmed Abdeen Hamed Ph\.D \[[view email](https://arxiv.org/show-email/c942b358/2605.30400)\] **\[v1\]**Thu, 28 May 2026 16:01:24 UTC \(2,289 KB\)

Similar Articles

An A.I. Aggregator?

Reddit r/AI_Agents

A user shares their experience using ChatGPT for complex medical caregiving and proposes the idea of aggregating multiple AI models to improve reliability by seeking consensus among different LLMs.

Navigating health questions with ChatGPT

OpenAI Blog

OpenAI publishes guidance on using ChatGPT to navigate health-related questions, addressing how users can leverage the model while understanding its limitations in medical contexts.

Introducing ChatGPT Health

OpenAI Blog

OpenAI introduces ChatGPT Health, a dedicated experience with enhanced privacy and security features that allows users to securely connect medical records and wellness apps to receive more personalized health guidance. The feature addresses the common use case of health queries on ChatGPT (230+ million weekly users) while maintaining strict data isolation and declining to use health conversations for model training.

ChatGPT for research

OpenAI Blog

OpenAI Academy introduces ChatGPT for research, featuring Search and Deep Research capabilities to help users move from questions to evidence-backed insights through source synthesis, citation generation, and structured report production.

Making ChatGPT better for clinicians

OpenAI Blog

OpenAI launches ChatGPT for Clinicians, a free version of ChatGPT for verified U.S. healthcare professionals designed to support documentation, research, and patient care workflows.