efficiency

Tag

Cards List
#efficiency

@DashHuang: Fable 5 feels like it opened a third eye for me. In just a few days, I got my production environment running smoothly. I've burned through 10 billion tokens in the past month. Over 20 years of entrepreneurship, growing the team from 20 to 200 to 2000 people, but I've never enjoyed building products this much, and efficiency and output have never been this high…

X AI KOLs Following · yesterday Cached

A startup founder shares how using Fable 5 dramatically boosted productivity, consuming 10 billion tokens in a month, with the team scaling from 20 to 2000 and achieving record output.

0 favorites 0 likes
#efficiency

@BohuTANG: I used to try cross-model mutual review, but that was too slow for me. Now I've discovered a new method: /harden, which achieves great results with two rounds of convergence on the same model. Interested folks can try this skill.

X AI KOLs Timeline · yesterday Cached

BohuTANG introduces /harden, a method for same-model two-round convergence, and highlights the evot agent engine which completes complex tasks with fewer tokens and lower cost than alternatives like Claude Code.

0 favorites 0 likes
#efficiency

HAKARI-Bench: A Lightweight Benchmark for Comparing Retrieval Architectures and Efficiency Settings under Unified Conditions

Hugging Face Daily Papers · yesterday Cached

HAKARI-Bench is a lightweight benchmark for comparing retrieval methods across multiple configurations and languages, enabling efficient model selection and performance analysis. It reproduces full benchmarks like MTEB at high correlation while being faster to run.

0 favorites 0 likes
#efficiency

Unlimited OCR Works

Hugging Face Daily Papers · yesterday Cached

Unlimited OCR introduces Reference Sliding Window Attention to eliminate growing memory consumption in long-sequence OCR tasks, enabling efficient transcription of multiple pages in a single forward pass.

0 favorites 0 likes
#efficiency

PolicyTrim: Boosting Intrinsic Policy Efficiency of Vision-Language-Action Models

Hugging Face Daily Papers · 2d ago Cached

PolicyTrim is a reinforcement learning-based post-training framework that improves action chunk utilization by 3× and reduces physical execution steps by 51.4% in Vision-Language-Action models, delivering up to 5.83× deployment speedup.

0 favorites 0 likes
#efficiency

@AnatoliKopadze: https://x.com/AnatoliKopadze/status/2068328135611822149

X AI KOLs Timeline · 3d ago Cached

The article explains the concept of using loops in AI interactions, where the AI iterates on a goal rather than one-off prompts, and discusses the key components of verify, state, and stop conditions.

0 favorites 0 likes
#efficiency

GLM 5.2: 98% of max level intelligence with less than half of tokens usage

Reddit r/LocalLLaMA · 3d ago

GLM 5.2 offers improved token efficiency, allowing users to achieve 98% of max-level intelligence using less than half the tokens. The model's 'high' effort level provides a practical alternative for day-to-day use compared to the resource-intensive 'max' level.

0 favorites 0 likes
#efficiency

Granularity comes at a cost – Game Theory

Hacker News Top · 4d ago Cached

A blog post discussing how increased granularity in systems, such as tick sizes in financial markets and time slots for booking sports courts, can introduce strategic gaming and inefficiencies, arguing that finer choices are not always beneficial.

0 favorites 0 likes
#efficiency

A startup claims it broke through a bottleneck that’s holding back LLMs

MIT Technology Review · 4d ago Cached

Miami-based startup Subquadratic claims its new SubQ model solves the quadratic attention bottleneck, making LLMs faster and cheaper. Independent tests from Appen back up many of the claims, though skepticism remains.

0 favorites 0 likes
#efficiency

Improving Text-to-Music Generation with Human Preference Rewards

Hugging Face Daily Papers · 4d ago Cached

This paper presents a text-to-music generation system that leverages reward conditioning, expert iteration, and preference tuning to improve audio quality within a 120M-parameter model, submitted to the ATTM Grand Challenge at ICME 2026.

0 favorites 0 likes
#efficiency

Everyone says AI needs more GPUs. I profiled one and it was sitting idle most of the time, just waiting on data. how much of the "GPU shortage" is actually wasted GPUs?

Reddit r/artificial · 5d ago

Analysis showing that GPUs used for AI training often sit idle waiting for data, questioning the severity of the GPU shortage.

0 favorites 0 likes
#efficiency

@rohanpaul_ai: Big claim in this paper, pushes against the common idea that more test-time compute should keep helping. Claims a code …

X AI KOLs Following · 5d ago Cached

This paper introduces LoopCoder-v2, a 7B code model that benefits most from a single rethinking loop; additional loops degrade performance, challenging the assumption that more test-time compute always helps.

0 favorites 0 likes
#efficiency

Is AI automation actually helping you at work, or are we just burning insane token budgets to justify layoffs?

Reddit r/AI_Agents · 5d ago

Reflects on the mixed impact of AI automation in enterprises, noting that efficiency gains are often used to justify layoffs while token budgets may be wasteful. Raises data privacy concerns about AI agents accessing work communication platforms.

0 favorites 0 likes
#efficiency

Grouped Query Experts: Mixture-of-Experts on GQA Self-Attention

Hugging Face Daily Papers · 5d ago Cached

Grouped Query Experts (GQE) improves Transformer efficiency by applying a mixture-of-experts layer on top of grouped-query attention, selectively activating query heads per token while keeping key-value cache benefits, matching baseline accuracy with half the query-head compute at 250M parameter scale.

0 favorites 0 likes
#efficiency

Building an always low-compute intelligence system.

Reddit r/ArtificialInteligence · 5d ago

The article introduces Buddy AI, an always low-compute intelligence system designed to operate within strict compute limits, focusing on efficiency and grounded output instead of scaling models.

0 favorites 0 likes
#efficiency

@dair_ai: Outstanding paper on computer-using agents. (bookmark it) Computer-using agents drive real software through the screen,…

X AI KOLs Following · 5d ago Cached

PreAct compiles successful agent runs into small state-machine programs, enabling 8.5-13x faster replay on repeated tasks without per-step language model calls, with runtime screen checks to ensure correctness.

0 favorites 0 likes
#efficiency

@vikingmute: Found that Ponytail and Codex are a perfect match https://github.com/DietrichGebert/ponytail… GPT is addicted to writing fallback code; without clear instructions, it always writes a ton of defensive code, which is heartbreaking to read. Its core philosophy is “The…

X AI KOLs Timeline · 6d ago Cached

Ponytail is an AI agent skill that significantly reduces over-engineering by forcing the agent to first check whether new code is needed, claiming to reduce code volume by 80-94% and cost by 42-75%. The author recommends using it with Codex and has open-sourced it on GitHub.

0 favorites 0 likes
#efficiency

@DorothyDDU: LoopCoder-v2 is out Loop Transformers reuse the same block for recurrent hidden-state refinement — letting models “thin…

X AI KOLs Timeline · 6d ago Cached

This paper introduces LoopCoder-v2, a family of 7B parameter parallel loop transformers for code generation, and studies the optimal number of loops, finding that two loops yield significant gains while more loops cause degradation.

0 favorites 0 likes
#efficiency

AI made me more productive, but somehow more tired

Reddit r/artificial · 6d ago

A personal reflection on how AI tools boost productivity but also raise expectations, leading to more work and psychological fatigue rather than free time.

0 favorites 0 likes
#efficiency

PreAct: Computer-Using Agents that Get Faster on Repeated Tasks

arXiv cs.AI · 6d ago Cached

PreAct compiles successful task runs of computer-using agents into small state-machine programs, allowing fast replay (8.5–13× faster) on repeated tasks by skipping per-step language model calls, while verifying screen states at each step and falling back to the agent when mismatches occur.

0 favorites 0 likes
Next →
← Back to home

Submit Feedback