factual-accuracy

#factual-accuracy

Can AI Agents Synthesize Scientific Conclusions?

arXiv cs.AI ↗ · yesterday Cached

This paper introduces SciConBench, a large-scale benchmark with 9.11K questions and expert-written conclusions for evaluating AI agents' ability to synthesize scientific conclusions from open-domain evidence. The study finds that even the best agent achieves only a factual F1 of 0.337 in clean-room settings, highlighting that reliable synthesis remains an open challenge.

0 favorites 0 likes

#factual-accuracy

Training AI chatbots to be warm and empathetic makes them less factually accurate

Reddit r/artificial ↗ · 2026-05-29 Cached

New research shows that training AI chatbots to be warmer and more empathetic significantly reduces their factual accuracy, leading to higher error rates in medical advice and increased agreement with user misconceptions. The findings challenge the common assumption that conversational style can be adjusted without compromising factual correctness.

0 favorites 0 likes

#factual-accuracy

@HEI: Evaluating Commercial AI Chatbots as News Intermediaries Mirac Suzgun, Emily Shen, Federico Bianchi, Alexander Spangher…

X AI KOLs Timeline ↗ · 2026-05-28 Cached

A study evaluating six commercial AI chatbots on factual questions derived from BBC News across six languages, finding high multiple-choice accuracy but significant drops in free-response, with retrieval errors driving over 70% of failures and revealing regional biases.

0 favorites 0 likes

#factual-accuracy

Verifiable Rewards Beyond Math and Code: Lightweight Corpus-Grounded Process Supervision for Factual Question Answering

Hugging Face Daily Papers ↗ · 2026-05-28 Cached

CorVer is a lightweight, corpus-grounded reward mechanism that uses Wikipedia co-occurrence statistics to provide efficient sentence-level feedback for reinforcement learning in factual question answering, outperforming neural verifiers while training 4.8 to 8.4x faster.

0 favorites 0 likes

#factual-accuracy

Understanding New-Knowledge-Induced Factual Hallucinations in LLMs: Analysis and Interpretation

arXiv cs.CL ↗ · 2026-04-20 Cached

This paper investigates how fine-tuning LLMs on new knowledge induces factual hallucinations, showing that unfamiliarity within specific knowledge types drives hallucinations through weakened attention to key entities. The authors propose mitigating this by reintroducing known knowledge during later training stages.

0 favorites 0 likes

#factual-accuracy

WebGPT: Improving the factual accuracy of language models through web browsing

OpenAI Blog ↗ · 2021-12-16 Cached

OpenAI fine-tuned GPT-3 to answer open-ended questions more accurately by enabling it to use a text-based web browser to search, retrieve, and cite sources. The model outperforms human demonstrators 56% of the time on questions from ELI5 dataset but shows limitations on out-of-distribution tasks like TruthfulQA.

0 favorites 0 likes

factual-accuracy

Can AI Agents Synthesize Scientific Conclusions?

Training AI chatbots to be warm and empathetic makes them less factually accurate

@HEI: Evaluating Commercial AI Chatbots as News Intermediaries Mirac Suzgun, Emily Shen, Federico Bianchi, Alexander Spangher…

Verifiable Rewards Beyond Math and Code: Lightweight Corpus-Grounded Process Supervision for Factual Question Answering

Understanding New-Knowledge-Induced Factual Hallucinations in LLMs: Analysis and Interpretation

WebGPT: Improving the factual accuracy of language models through web browsing

Submit Feedback