Tag
A startup founder shares how using Fable 5 dramatically boosted productivity, consuming 10 billion tokens in a month, with the team scaling from 20 to 2000 and achieving record output.
BohuTANG introduces /harden, a method for same-model two-round convergence, and highlights the evot agent engine which completes complex tasks with fewer tokens and lower cost than alternatives like Claude Code.
HAKARI-Bench is a lightweight benchmark for comparing retrieval methods across multiple configurations and languages, enabling efficient model selection and performance analysis. It reproduces full benchmarks like MTEB at high correlation while being faster to run.
Unlimited OCR introduces Reference Sliding Window Attention to eliminate growing memory consumption in long-sequence OCR tasks, enabling efficient transcription of multiple pages in a single forward pass.
PolicyTrim is a reinforcement learning-based post-training framework that improves action chunk utilization by 3× and reduces physical execution steps by 51.4% in Vision-Language-Action models, delivering up to 5.83× deployment speedup.
The article explains the concept of using loops in AI interactions, where the AI iterates on a goal rather than one-off prompts, and discusses the key components of verify, state, and stop conditions.
GLM 5.2 offers improved token efficiency, allowing users to achieve 98% of max-level intelligence using less than half the tokens. The model's 'high' effort level provides a practical alternative for day-to-day use compared to the resource-intensive 'max' level.
A blog post discussing how increased granularity in systems, such as tick sizes in financial markets and time slots for booking sports courts, can introduce strategic gaming and inefficiencies, arguing that finer choices are not always beneficial.
Miami-based startup Subquadratic claims its new SubQ model solves the quadratic attention bottleneck, making LLMs faster and cheaper. Independent tests from Appen back up many of the claims, though skepticism remains.
This paper presents a text-to-music generation system that leverages reward conditioning, expert iteration, and preference tuning to improve audio quality within a 120M-parameter model, submitted to the ATTM Grand Challenge at ICME 2026.
Analysis showing that GPUs used for AI training often sit idle waiting for data, questioning the severity of the GPU shortage.
This paper introduces LoopCoder-v2, a 7B code model that benefits most from a single rethinking loop; additional loops degrade performance, challenging the assumption that more test-time compute always helps.
Reflects on the mixed impact of AI automation in enterprises, noting that efficiency gains are often used to justify layoffs while token budgets may be wasteful. Raises data privacy concerns about AI agents accessing work communication platforms.
Grouped Query Experts (GQE) improves Transformer efficiency by applying a mixture-of-experts layer on top of grouped-query attention, selectively activating query heads per token while keeping key-value cache benefits, matching baseline accuracy with half the query-head compute at 250M parameter scale.
The article introduces Buddy AI, an always low-compute intelligence system designed to operate within strict compute limits, focusing on efficiency and grounded output instead of scaling models.
PreAct compiles successful agent runs into small state-machine programs, enabling 8.5-13x faster replay on repeated tasks without per-step language model calls, with runtime screen checks to ensure correctness.
Ponytail is an AI agent skill that significantly reduces over-engineering by forcing the agent to first check whether new code is needed, claiming to reduce code volume by 80-94% and cost by 42-75%. The author recommends using it with Codex and has open-sourced it on GitHub.
This paper introduces LoopCoder-v2, a family of 7B parameter parallel loop transformers for code generation, and studies the optimal number of loops, finding that two loops yield significant gains while more loops cause degradation.
A personal reflection on how AI tools boost productivity but also raise expectations, leading to more work and psychological fatigue rather than free time.
PreAct compiles successful task runs of computer-using agents into small state-machine programs, allowing fast replay (8.5–13× faster) on repeated tasks by skipping per-step language model calls, while verifying screen states at each step and falling back to the agent when mismatches occur.