factuality

Tag

Cards List
#factuality

@FinanceYF5: 3/ Improved Accuracy: GPT-5.5 Instant shows significant improvements in factual accuracy, particularly in fields with high accuracy requirements such as medicine, law, and finance.

X AI KOLs Following · 2026-05-10 Cached

Report claims that GPT-5.5 Instant shows significant improvements in factual accuracy, particularly in high-stakes fields like medicine, law, and finance.

0 favorites 0 likes
#factuality

MoshiRAG: Asynchronous Knowledge Retrieval for Full-Duplex Speech Language Models

arXiv cs.CL · 2026-04-20 Cached

MoshiRAG combines a compact full-duplex speech language model with asynchronous retrieval-augmented generation to improve factuality while maintaining real-time interactivity. The approach leverages natural temporal gaps in conversation to retrieve external knowledge without disrupting the natural flow of dialogue.

0 favorites 0 likes
#factuality

FACTS Benchmark Suite: Systematically evaluating the factuality of large language models

Google DeepMind Blog · 2025-12-09 Cached

Google DeepMind and Kaggle have launched the FACTS Benchmark Suite, a comprehensive set of evaluations including parametric, search, multimodal, and grounding benchmarks to systematically measure the factuality of large language models.

0 favorites 0 likes
#factuality

FACTS Grounding: A new benchmark for evaluating the factuality of large language models

Google DeepMind Blog · 2024-12-17 Cached

DeepMind introduces FACTS Grounding, a comprehensive benchmark with 1,719 examples for evaluating how accurately large language models ground their responses in source material and avoid hallucinations. The benchmark includes a public dataset and an online Kaggle leaderboard tracking LLM performance on factual accuracy and grounding tasks.

0 favorites 0 likes
#factuality

Introducing SimpleQA

OpenAI Blog · 2024-10-30 Cached

OpenAI introduces SimpleQA, a new factuality benchmark dataset with 4,326 short fact-seeking questions designed to evaluate frontier language models on their ability to provide accurate answers without hallucination. The dataset achieves high quality through dual independent annotation, rigorous criteria, and achieves only ~3% estimated error rate, with GPT-4o scoring less than 40%.

0 favorites 0 likes
← Back to home

Submit Feedback