How LLMs decide which pages to cite — and how to optimize for it
Summary
Article explains how LLMs like ChatGPT and Perplexity select sources to cite, highlighting that schema markup (JSON-LD) can dramatically improve citation rates from 16% to 54% by enabling better information extraction.
Similar Articles
I spent 40% of my development time preventing an LLM from citing sources wrong. here are the 7 failure modes I found
A developer building an AI legal assistant for a German law firm details seven specific LLM citation failure modes and the prompt-engineering fixes used to meet strict legal citation standards.
Whose Facts Win? LLM Source Preferences under Knowledge Conflicts
This paper investigates how LLMs handle knowledge conflicts in retrieval-augmented generation by studying their preferences for different information sources. The authors find that LLMs prefer institutionally-corroborated sources but these preferences can be reversed by repetition, proposing a method to reduce repetition bias while maintaining consistent source preferences.
Stop letting LLMs edit your .bib [D]
The article criticizes the reliance on Large Language Models for generating bibliographic entries, highlighting issues with hallucinated citations and incorrect author lists in academic papers.
@omarsar0: LLM Artifacts Connected to @karpathy's LLM Knowledge base idea, I've been building out a fun way to generate dynamic ar…
A developer is building a system to generate dynamic artifacts from LLM knowledge bases inspired by Karpathy's LLM Knowledge Base idea, aiming to surface deeper and more meaningful insights from content that is otherwise hard for humans to consume.
We tried vectors, ASTs, and brute-force context stuffing for code retrieval. Graphs with LLM-generated semantics worked best. Here's what we learned.
The authors detail their experience building a code indexing system, concluding that graph-based retrieval with LLM-generated semantics outperforms vector embeddings and pure AST parsing. They open-sourced the system, Bytebell, which uses Neo4j to store semantic context for efficient and precise code retrieval.