optimization

#optimization

kv-cache : avoid kv cells copies by ggerganov · Pull Request #24277 · ggml-org/llama.cpp

Reddit r/LocalLLaMA ↗ · 2026-06-08 Cached

This pull request by ggerganov optimizes kv-cache in llama.cpp to avoid unnecessary copies of kv cells, improving inference performance. It is a contribution to the open-source LLM inference library llama.cpp.

0 favorites 0 likes

#optimization

@steeve: aaaaaand we're faster (i know i know)

X AI KOLs Following ↗ · 2026-06-08 Cached

Steeve Morin reports that after 5 days of work, his implementation is now within 10% of llama.cpp's speed, achieving 64 tok/s vs 70 tok/s, with more work to do.

0 favorites 0 likes

#optimization

Accelerating Multi-Objective Bayesian Optimisation via Predictive-Gradient Catalysts

arXiv cs.LG ↗ · 2026-06-08 Cached

This paper introduces a general acceleration mechanism for multi-objective Bayesian optimisation that uses Gaussian process predictive gradients as auxiliary signals to augment existing acquisition functions, enabling faster convergence to the global Pareto set under limited evaluation budgets.

0 favorites 0 likes

#optimization

Flatland: The Adventures of Gradient Descent with Large Step Sizes

arXiv cs.LG ↗ · 2026-06-08 Cached

This paper addresses the open question of maximum step size for gradient descent convergence on non-L-smooth objectives, introducing adaptive methods that operate at the edge of stability and can minimize sharpness globally.

0 favorites 0 likes

#optimization

Principles and Practice of Deep Representation Learning: or a Mathematical Theory of Memory

arXiv cs.LG ↗ · 2026-06-08 Cached

This book presents a mathematical theory of deep representation learning, aiming to demystify the internal mechanisms of large deep networks using optimization and information theory, making architecture design a matter of linear algebra and calculus.

0 favorites 0 likes

#optimization

Coordinated optimization of departure sequencing and section-track allocation in railway short-term concentrated departure scenarios based on qubo and hybrid quantum algorithms

arXiv cs.AI ↗ · 2026-06-08 Cached

This paper presents a QUBO-based model for coordinating departure sequencing and track allocation in railway short-term concentrated departure scenarios, evaluated using simulation and hybrid quantum algorithms. Results show quantum-enhanced methods reduce cost and delay under dynamic conditions.

0 favorites 0 likes

#optimization

Dopamine Fracking

Hacker News Top ↗ · 2026-06-08 Cached

The article coins the term 'dopamine fracking' to describe the process of pumping excessive resources into casual activities to extract maximum dopamine, ignoring long-term harm. It critiques the commodification of online culture, hobbies, and relationships in the digital age.

0 favorites 0 likes

#optimization

How's Linear so fast? A technical breakdown

Hacker News Top ↗ · 2026-06-07 Cached

This article provides a technical breakdown of how the project management tool Linear achieves its fast performance by using a browser-side database (IndexedDB), local-first mutations, and a sync engine, eliminating network latency from user interactions.

0 favorites 0 likes

#optimization

Moving beyond fork() + exec()

Lobsters Hottest ↗ · 2026-06-07 Cached

A proposal to add spawn templates to the Linux kernel aims to optimize the fork+exec pattern by caching executable information, though the current patch set is unlikely to be accepted as-is.

0 favorites 0 likes

#optimization

Adopting the Parallel DWARF linker in dsymutil

Hacker News Top ↗ · 2026-06-06 Cached

Apple's dsymutil tool, which links DWARF debug info into self-contained bundles, is adopting a parallel DWARF linker to address the single-threaded bottleneck in type deduplication, despite challenges in qualification due to non-binary-identical output.

0 favorites 0 likes

#optimization

Life is too short for a slow terminal

Lobsters Hottest ↗ · 2026-06-06 Cached

This article details practical techniques to speed up terminal startup by avoiding frameworks, caching completions, and lazy-loading tools, achieving a 30ms shell start.

0 favorites 0 likes

#optimization

Running Qwen3.6-35B-A3B on a laptop RTX 4060 (8GB) — what worked, what didn't, and a surprising speculative-decoding result

Reddit r/LocalLLaMA ↗ · 2026-06-05

A detailed account of running the Qwen3.6-35B-A3B MoE model on an 8GB laptop GPU, covering effective optimizations like --no-mmap and VRAM headroom, unexpected findings where speculative decoding improved speed by 26% contrary to benchmarks, and pitfalls with Windows and CPU bottlenecks.

0 favorites 0 likes

#optimization

Rotation revisited: Avoiding having to calculate the gcd when doing cycle decomposition

The Old New Thing (Raymond Chen) ↗ · 2026-06-05 Cached

This article explains a technique to avoid calculating the greatest common divisor when performing cycle decomposition in std::rotate, as used in OpenJDK's Collections.rotate method. It provides a C++ implementation that tracks the count of rotated elements to determine when all cycles are complete.

0 favorites 0 likes

#optimization

Qwen 3.5 122B MoE OC on a single 3090 at 35 t/s — full local stack breakdown

Reddit r/openclaw ↗ · 2026-06-05

Detailed breakdown of running Qwen 3.5 122B MoE on a single RTX 3090 at 35 t/s using a custom llama.cpp fork (ik_llama.cpp) with fused MoE operations and expert offloading to CPU RAM, significantly outperforming stock llama.cpp MTP.

0 favorites 0 likes

#optimization

Dominant-Layer ZO: A Single Layer Dominates Zeroth-Order Fine-Tuning of LLMs

arXiv cs.LG ↗ · 2026-06-05 Cached

This paper reveals that zeroth-order fine-tuning of LLMs is dominated by a single decoding layer, which can be identified by activation outliers, and fine-tuning only that layer matches or exceeds full-model fine-tuning with up to 4.52x speedup.

0 favorites 0 likes

#optimization

Sharp First-Order Lower Bounds for Higher-Order Smooth Nonconvex Optimization

arXiv cs.LG ↗ · 2026-06-05 Cached

This paper proves sharp dimension-free first-order lower bounds for finding epsilon-stationary points in higher-order smooth nonconvex optimization, resolving open problems for Hessian-Lipschitz and third-order smooth cases.

0 favorites 0 likes

#optimization

DP-MacAdam: Differentially Private Mechanism with Adaptive Clipping and Adaptive Momentum

arXiv cs.LG ↗ · 2026-06-05 Cached

DP-MacAdam combines adaptive clipping and adaptive momentum to improve differentially private SGD, achieving better model utility without manual tuning of the clipping threshold.

0 favorites 0 likes

#optimization

Gradient Descent with Large Step Size Restores Symmetry in Deep Linear Networks with Multi-Pathway

arXiv cs.LG ↗ · 2026-06-05 Cached

This paper shows that discrete Gradient Descent with large step sizes restores symmetry in multi-pathway Deep Linear Networks, countering the symmetry-breaking predicted by Gradient Flow, and leads to signal re-balancing across pathways. The authors theoretically prove that balanced solutions are flatter (less sharp) than sparse ones, and large learning rates drive the network toward stable, balanced configurations.

0 favorites 0 likes

#optimization

@sydneyrunkle: let's assume agent = model + harness unfortunately, good models are getting really expensive! so you need a great harne…

X AI KOLs Following ↗ · 2026-06-04

A guide on optimizing AI agent performance by improving the harness component to compensate for expensive model costs, focusing on hill climbing techniques.

0 favorites 0 likes

#optimization

@vivekgalatage: Memory organization with Algorithmica is one resource that keeps shining. https://en.algorithmica.org/hpc/cpu-cache/

X AI KOLs Timeline ↗ · 2026-06-04 Cached

A recommendation for the Algorithmica resource on CPU cache memory organization, which provides detailed experimental analysis and optimization techniques for in-memory algorithms.

0 favorites 0 likes

optimization

Submit Feedback