Tag
Netflix engineers open-sourced the Headroom tool, which automatically compresses LLM input context during local preprocessing, reducing token consumption by up to 95%. It is compatible with mainstream AI coding tools like Codex and Cursor, and works without any code modifications.
ContextRAG introduces an extraction-free method for constructing hierarchical graph indices for retrieval-augmented generation, using Residual-Quantization K-Means and Formal Concept Analysis to reduce LLM calls and tokens by orders of magnitude while maintaining competitive F1 scores on multi-hop questions.