How can we prevent AI models from cannibalizing themselves when human-generated data runs out? Scientists say they've found the answer.

Reddit r/artificial News

Summary

Scientists claim to have found a solution to prevent AI models from cannibalizing themselves when human-generated data runs out, addressing the problem of model collapse where LLMs trained on synthetic data produce gibberish and hallucinations.

No content available
Original Article
View Cached Full Text

Cached at: 05/22/26, 01:44 PM

# How can we prevent AI models from cannibalizing themselves when human-generated data runs out? Scientists say… Source: [https://www.livescience.com/technology/artificial-intelligence/how-can-we-prevent-ai-models-from-cannibalizing-themselves-when-human-generated-data-runs-out-scientists-say-theyve-found-the-answer](https://www.livescience.com/technology/artificial-intelligence/how-can-we-prevent-ai-models-from-cannibalizing-themselves-when-human-generated-data-runs-out-scientists-say-theyve-found-the-answer) While the evolution of[artificial intelligence](https://www.livescience.com/technology/artificial-intelligence/what-is-artificial-intelligence-ai)\(AI\) systems has shown no sign of slowing, there's a growing concern that large language models \(LLMs\) will soon run out of human\-made data to ingest and learn from\. Once this happens, scientists say, AI models will increasingly rely on synthetic AI\-made information, which will lead to an effect called "[model collapse](https://www.livescience.com/technology/artificial-intelligence/ai-models-trained-on-ai-generated-data-could-spiral-into-unintelligible-nonsense-scientists-warn)\." This is where LLMs spout gibberish and the AI systems they underpin deliver inaccurate answers and hallucinate information to queries far more commonly than they do today\. Get the world’s most fascinating discoveries delivered straight to your inbox\.

Similar Articles

AI is deteriorating in realtime

Reddit r/ArtificialInteligence

AI models are deteriorating due to training on recursively generated synthetic data, leading to model collapse; multiple studies highlight the risks of scaling with synthetic data.

Agentic AI memory isn't a hoarding problem. It's a pruning problem.

Reddit r/AI_Agents

The author argues that AI agent memory should focus on pruning data rather than hoarding, drawing parallels to human memory types (sensory, short-term, long-term) and suggesting that modeling after human memory can reduce token usage while maintaining high-quality context.