We captured the network traffic of ChatGPT, Gemini and DeepSeek to see how each defines a "source" — they're three completely different mechanisms

Reddit r/artificial 06/11/26, 12:15 PM News

network-traffic ai-sources chatgpt gemini deepseek source-citation transparency

Summary

A technical investigation captured and compared the network traffic of ChatGPT, Gemini, and DeepSeek to understand how each system technically defines and attaches sources to responses, revealing three fundamentally different mechanisms and distinct citation preferences.

Disclosure upfront: I'm the founder of an AI-visibility company, so this research scratches our own itch. Our domain was excluded from all counts before analysis. Not linking anything in the post. We wanted to answer a simple question: when an AI assistant shows you "sources," what is that, technically? So we opened devtools on the web clients of ChatGPT, Gemini, and DeepSeek, and ran the same 4 queries 10 times through each system. What we found: **ChatGPT** streams the answer over SSE and attaches citations as `url_citation` objects with `start_ix`/`end_ix` — character offsets into the generated text (UTF-16 code units, so emoji and CJK break your parsing if you count bytes). A citation is bound to a specific *fragment* of the answer, not the answer as a whole. **Gemini** runs on Google's batchexecute/JSPB transport — protobuf-as-JSON-arrays where fields have positions, not names. Next to each cited URL there's a family of short obfuscated fields. Our working hypotheses (not confirmed by Google docs): `rs` ≈ reliability score for the domain, `ls` ≈ last-seen date, `GK` ≈ character range (functional analog of ChatGPT's offsets). The interesting part isn't the exact decoding — it's that Gemini ships internal per-domain trust signals alongside every source. **DeepSeek** is the most transparent: a plain `search_results[]` array attached to the sub-queries it decomposes your question into. No offsets, no hidden fields. And what they actually cite is just as different: ChatGPT favored arXiv + Wikipedia (one arXiv paper got cited in 10/10 runs), Gemini favors big SaaS/marketing domains and — fun detail — never cited a single Google property in our runs, DeepSeek lives on press-release wires and news aggregators, including Chinese-language sources the other two never touched. Bonus finding: we compared all of this against Google/Bing top-10 for the same queries. URL-level overlap: 3.3% (4 matches out of 120 SERP positions). All four matches were Bing-side. Google: zero. Caveats: 4 queries from one B2B category, N=10 per system (±15–20 pp), single-day snapshot, field decodings are hypotheses from traffic analysis. Happy to answer anything about the methodology. If anyone has captured different field names in their own sessions, I'd love to compare.

Original Article

We captured the network traffic of ChatGPT, Gemini and DeepSeek to see how each defines a "source" — they're three completely different mechanisms

Similar Articles

ChatGPT for research

Gen AI web traffic share update Main takeaways: → Claude and Gemini continue to grow. → ChatGPT moves closer to the 50% mark.

Found a tool that asks GPT, Claude, Gemini, and Grok the same question and gives you one consensus answer

ChatGPT’s market share slips below 50% for first time

Research with ChatGPT

Submit Feedback

Similar Articles

Gen AI web traffic share update Main takeaways: → Claude and Gemini continue to grow. → ChatGPT moves closer to the 50% mark.

Found a tool that asks GPT, Claude, Gemini, and Grok the same question and gives you one consensus answer

ChatGPT’s market share slips below 50% for first time