ai-reliability

Tag

Cards List
#ai-reliability

Pramaana Labs raises $27M seed round from Khosla Ventures to bring formal verification to AI

TechCrunch AI · 17h ago Cached

Pramaana Labs raised $27M in seed funding led by Khosla Ventures to apply formal verification (using the LEAN programming language) to improve AI reliability in high-stakes domains like law, drug discovery, and tax preparation.

0 favorites 0 likes
#ai-reliability

I gave Google AI a simple test and it gave me the wrong answer 3 times in a row in different browsers even though it said it would record the correct answer and remember it for future results.

Reddit r/artificial · 3d ago

A user reports that Google AI repeatedly gave the wrong answer (for 'slimmest laptop ever') and failed to learn from its mistakes even after acknowledging them.

0 favorites 0 likes
#ai-reliability

Gemini -- confidently fabricates technical answers

Reddit r/ArtificialInteligence · 2026-06-08

The author reports that Google's Gemini consistently fabricates technical answers, inventing features and instructions rather than admitting uncertainty, posing risks for technical guidance.

0 favorites 0 likes
#ai-reliability

An A.I. Aggregator?

Reddit r/AI_Agents · 2026-06-03

A user shares their experience using ChatGPT for complex medical caregiving and proposes the idea of aggregating multiple AI models to improve reliability by seeking consensus among different LLMs.

0 favorites 0 likes
#ai-reliability

On SWEBench Pro, 68.5% of GPT 5.5’s failures were caused by broken or incorrect test cases, totaling 28.9% of the entire benchmark

Reddit r/ArtificialInteligence · 2026-05-26

An analysis reveals that 28.9% of GPT 5.5's failures on SWEBench Pro are due to broken or incorrect test cases, and similar issues affect other major AI benchmarks, raising concerns about the accuracy of current evaluation methods.

0 favorites 0 likes
#ai-reliability

@pallavishekhar_: https://x.com/pallavishekhar_/status/2058460434035060758

X AI KOLs Timeline · 2026-05-24 Cached

Explains what large language models actually do (next-token prediction) and why they sound confident even when wrong. Offers a mental model and verification checklist for using LLMs safely.

0 favorites 0 likes
#ai-reliability

Open ai

Reddit r/ArtificialInteligence · 2026-05-21

The article discusses the industry consensus that AI is becoming extremely capable but still faces reliability issues for high-stakes tasks, emphasizing that current systems optimize for plausibility rather than guaranteed truth, and that the path forward involves layered verification systems rather than a single perfect model.

0 favorites 0 likes
#ai-reliability

Measuring AI Faithfulness-For Better or For Worse

Reddit r/AI_Agents · 2026-05-20

This article discusses the importance of faithfulness in LLM optimization, introducing a Structural Fidelity Score that measures drift across word overlap, constraint survival, and task-type match to ensure prompt optimization does not sacrifice intent.

0 favorites 0 likes
#ai-reliability

Ontario auditors find doctors' AI note takers routinely blow basic facts

Hacker News Top · 2026-05-14 Cached

An audit by the Office of the Auditor General of Ontario found that AI note-taking systems approved for healthcare routinely fabricate information, insert incorrect drug details, and miss critical patient data, with accuracy accounting for only 4% of their evaluation score.

0 favorites 0 likes
#ai-reliability

@GigaAI: Introducing hallucination correction. We have reduced hallucination by 70%. Giga's hallucination rate is at ~1%. Better…

X AI KOLs Timeline · 2026-05-07 Cached

GigaAI announces a new hallucination correction feature that reduces the model's hallucination rate to approximately 1%, claiming superior reliability compared to frontier models.

0 favorites 0 likes
#ai-reliability

@AiwithYasir: Just IN: This paper from Stanford and Harvard explains why most “agentic AI” systems feel impressive in demos and then …

X AI KOLs Timeline · 2026-04-20 Cached

A paper from Stanford and Harvard researchers argues that agentic AI systems fail in real-world deployment not because they lack intelligence, but due to fundamental issues that cause demo performance to collapse in practice.

0 favorites 0 likes
#ai-reliability

On the Reliability of Computer Use Agents

Hugging Face Daily Papers · 2026-04-20 Cached

A preprint analyzing why computer-use agents succeed once but fail on repeated executions, attributing unreliability to execution stochasticity, task ambiguity, and behavioral variability, and advocating repeated evaluation and stable strategies.

0 favorites 0 likes
← Back to home

Submit Feedback