token-efficiency

#token-efficiency

GPT-5.5 may burn fewer tokens, but it always burns more cash

Reddit r/artificial ↗ · 5h ago Cached

OpenAI's GPT-5.5 costs 49–92% more than GPT-5.4 in practice despite claimed token efficiency improvements, while Anthropic's Claude Opus 4.7 also raised effective costs by 12–27% for longer prompts, reflecting a broader trend of rising frontier model prices as both companies face massive projected losses.

0 favorites 0 likes

#token-efficiency

Measuring information density in web pages from an LLM agent's perspective [R]

Reddit r/MachineLearning ↗ · 20h ago

This paper presents empirical measurements of information density in web pages from the perspective of LLM agents, using a curated benchmark of 100 URLs across five categories. It finds that structural extraction reduces token count by an average of 71.5% while preserving answer quality, and reveals an undocumented compression layer in Claude Code.

0 favorites 0 likes

#token-efficiency

Improving token efficiency in GitHub Agentic Workflows (12 minute read)

TLDR AI ↗ · yesterday Cached

GitHub improved token efficiency in their agentic workflows by logging token usage via an API proxy and building daily optimization workflows, reducing overhead from unused MCP tool registrations.

0 favorites 0 likes

#token-efficiency

Avoiding Overthinking and Underthinking: Curriculum-Aware Budget Scheduling for LLMs

arXiv cs.CL ↗ · 2026-04-23 Cached

BACR introduces adaptive token budgeting and curriculum-aware scheduling to prevent LLMs from overthinking easy problems and underthinking hard ones, cutting token use 34% while boosting accuracy up to 8.3%.

0 favorites 0 likes

#token-efficiency

MiMo-V2.5 & Pro

Product Hunt ↗ · 2026-04-22

MiMo-V2.5 & Pro introduces frontier agent capabilities with improved token efficiency.

0 favorites 0 likes

#token-efficiency

@AntLingAGI: Introducing Ling-2.6-flash, an instruct model with 104B total parameters and 7.4B active parameters. Ling-2.6-flash is …

X AI KOLs Following ↗ · 2026-04-21 Cached

Ling-2.6-flash is a 104B-total/7.4B-active sparse instruct model optimized for token efficiency, aiming to cut costs and boost throughput on agent tasks.

0 favorites 0 likes

#token-efficiency

A Self-Evolving Framework for Efficient Terminal Agents via Observational Context Compression

Hugging Face Daily Papers ↗ · 2026-04-21 Cached

TACO introduces a self-evolving compression framework that automatically learns to shrink redundant terminal interaction history, cutting token overhead ~10% while boosting accuracy 1-4% across TerminalBench and other code-agent benchmarks.

0 favorites 0 likes

#token-efficiency

YourMemory

Product Hunt ↗ · 2026-04-20

YourMemory is an MCP-based memory tool that uses self-pruning to reduce token waste by up to 84%, improving efficiency in AI context management.

0 favorites 0 likes

#token-efficiency

Not All Tokens Matter: Towards Efficient LLM Reasoning via Token Significance in Reinforcement Learning

arXiv cs.CL ↗ · 2026-04-20 Cached

This paper proposes a reinforcement learning framework that improves LLM reasoning efficiency by modeling token significance to selectively penalize unimportant tokens while preserving essential reasoning, using both significance-aware and dynamic length rewards to reduce verbosity without sacrificing accuracy.

0 favorites 0 likes

#token-efficiency

@samhogan: RLMs pretty much solved context btw You can shove tens of millions of tokens into a good RLM harness and it just works.…

X AI KOLs Following ↗ · 2026-04-18 Cached

A developer shares their experience with Recurrent Language Models (RLMs), claiming they effectively handle extremely long context windows with tens of millions of tokens, representing a significant advancement in context handling capabilities.

0 favorites 0 likes

#token-efficiency

GenericAgent: A Token-Efficient Self-Evolving LLM Agent via Contextual Information Density Maximization (V1.0)

Papers with Code Trending ↗ · 2026-04-18 Cached

This paper introduces GenericAgent, a self-evolving LLM agent system designed to maximize context information density. It addresses long-horizon limitations through hierarchical memory, reusable SOPs, and efficient compression, achieving better performance with fewer tokens compared to leading agents.

0 favorites 0 likes

#token-efficiency

Prompt Caching in the API

OpenAI Blog ↗ · 2024-10-01 Cached

OpenAI introduces Prompt Caching, an automatic feature that reduces API costs by 50% and improves latency by reusing recently cached input tokens on GPT-4o, GPT-4o mini, o1-preview, and o1-mini models. The feature automatically applies to prompts longer than 1,024 tokens without requiring developer integration changes.

0 favorites 0 likes

token-efficiency

Submit Feedback