optimization

#optimization

AI Treats Humanity Like a Draft

Reddit r/ArtificialInteligence ↗ · 2026-05-16

The author argues that AI models like GPT and Claude over-optimize human creations, missing the value of imperfection, messiness, and emotional depth in art and life.

0 favorites 0 likes

#optimization

Luce Megakernal: Why nobody is taking about this?

Reddit r/LocalLLaMA ↗ · 2026-05-15 Cached

Lucebox Hub provides optimized CUDA kernels (Megakernel, DFlash, PFlash) for local LLM inference, achieving significant speedups (2-10x) over llama.cpp on various models and GPUs.

0 favorites 0 likes

#optimization

5× faster fast_blur in image-rs

Lobsters Hottest ↗ · 2026-05-15 Cached

Arthur Pastel optimized the fast_blur function in the Rust image-rs crate, achieving up to 5.9x speedup on u8 images by using box blur approximations for faster Gaussian-like blurs.

0 favorites 0 likes

#optimization

@eigensteve: I Wrote a New Book!!! Optimization: A Bootcamp for Machine Learning, Inverse Problems, and Control Pre-Order Now (July …

X AI KOLs Timeline ↗ · 2026-05-15 Cached

Steven Brunton announces his new book 'Optimization: A Bootcamp for Machine Learning, Inverse Problems, and Control', with pre-order available and accompanying free PDF, YouTube videos, and Python code.

0 favorites 0 likes

#optimization

@nicekate8888: For the past twenty days, I've been obsessing over one thing — how to make Qwen3.6-27B run fast and well on my Mac. I started with Unsloth Q5, got 18 tok/s, and the fan was roaring. Then I switched to MLX 6bit + DFlash, hitting 22 tok/s, still not fast enough. Eventually I found MTPLX 4bit: 43 tok/s with good quality.

X AI KOLs Timeline ↗ · 2026-05-15

The user shares their experience optimizing Qwen3.6-27B inference speed on a Mac using different quantization methods (Unsloth Q5, MLX 6bit + DFlash, MTPLX 4bit), ultimately reaching 43 tok/s.

0 favorites 0 likes

#optimization

@RisingSayak: The kernels project at Hugging Face has been growing! We want it to be the go-to place for kernel devs and kernel users…

X AI KOLs Following ↗ · 2026-05-15 Cached

Hugging Face's kernels project is expanding and seeking contributors for agentic kernel development to provide real optimization value to models.

0 favorites 0 likes

#optimization

EnergyLens: Predictive Energy-Aware Exploration for Multi-GPU LLM Inference Optimization

arXiv cs.LG ↗ · 2026-05-15 Cached

EnergyLens is an end-to-end framework for predictive energy-aware optimization of multi-GPU LLM inference, validated on Llama3 and Qwen3-MoE, achieving mean absolute percentage errors between 9.25% and 13.19% and revealing significant energy variation across configurations.

0 favorites 0 likes

#optimization

How to Scale Mixture-of-Experts: From muP to the Maximally Scale-Stable Parameterization

arXiv cs.LG ↗ · 2026-05-15 Cached

This paper develops a principled scaling theory for Mixture-of-Experts (MoE) architectures, introducing the Maximally Scale-Stable Parameterization (MSSP) that ensures stable training and hyperparameter transfer across width, depth, expert width, and number of experts, validated by experiments.

0 favorites 0 likes

#optimization

Performance-Driven Policy Optimization for Speculative Decoding with Adaptive Windowing

arXiv cs.CL ↗ · 2026-05-15 Cached

Proposes PPOW, a reinforcement learning framework for optimizing draft models in speculative decoding using window-level objectives and adaptive windowing, achieving significant speedups across multiple benchmarks.

0 favorites 0 likes

#optimization

@no_stp_on_snek: meanwhile the rest of us just plug away. im trying to solve sparse attention for mlx-swift-lm right now. making good pr…

X AI KOLs Following ↗ · 2026-05-14 Cached

Developer reports progress implementing sparse attention for mlx-swift-lm, achieving only +4% overhead vs dense attention on M5 Max.

0 favorites 0 likes

#optimization

If you're using Windows, disable memory compression to stop bottlenecks!

Reddit r/LocalLLaMA ↗ · 2026-05-14

A user shares a fix for performance bottlenecks when running AI models on AMD GPUs in Windows 11 by disabling memory compression via the command 'Disable-mmagent -mc'.

0 favorites 0 likes

#optimization

Bayesian Model Merging

arXiv cs.LG ↗ · 2026-05-14 Cached

Introduces Bayesian Model Merging (BMM), a plug-and-play bi-level optimization framework for combining multiple task-specific experts into a single model, achieving state-of-the-art performance on vision and language benchmarks.

0 favorites 0 likes

#optimization

IGT-OMD: Implicit Gradient Transport for Decision-Focused Learning under Delayed Feedback

arXiv cs.LG ↗ · 2026-05-14 Cached

This paper identifies 'staleness amplification' in bilevel optimization under delayed feedback and proposes IGT-OMD, which uses Implicit Gradient Transport to achieve sublinear regret and improve decision loss on benchmarks like Warcraft shortest-path and LQR.

0 favorites 0 likes

#optimization

@LakshyAAAgrawal: Learning from rich textual feedback (errors, traces, partial reasoning) beats scalar reward alone for LLM optimization.…

X AI KOLs Following ↗ · 2026-05-13

Fast-Slow Training (FST) interleaves context optimization (via GEPA) with model weight updates via RL, achieving 3× sample efficiency over RL alone on math, code, and physics reasoning while preserving plasticity and enabling continual learning.

0 favorites 0 likes

#optimization

The biggest AI risk may not be superintelligence — but optimized misunderstanding

Reddit r/artificial ↗ · 2026-05-13

The article argues that the primary AI risk may not be superintelligence but rather systems that optimize flawed, incomplete representations of reality, leading to institutional drift, automated misclassification, and invisible governance failures.

0 favorites 0 likes

#optimization

@tan_stack: TanStack Devtools just migrated to @OxcProject parser + magic-string! The results: Per-file transform: 1.65 ms → 0.46 m…

X AI KOLs Following ↗ · 2026-05-13 Cached

TanStack Devtools migrated to OxcProject parser and magic-string, achieving a 3.56× speedup with per-file transform dropping from 1.65 ms to 0.46 ms.

0 favorites 0 likes

#optimization

Crustimate

Product Hunt ↗ · 2026-05-13

Crustimate is a tool that helps optimize your LinkedIn profile to be discovered by AI-powered recruiters.

0 favorites 0 likes

#optimization

@_vmlops: MICROSOFT RESEARCHERS BUILT THIS TO TRAIN 530B PARAMETER MODELS Deepspeed is a deep learning optimization library that …

X AI KOLs Timeline ↗ · 2026-05-13 Cached

DeepSpeed is an open-source deep learning optimization library from Microsoft that enables efficient distributed training and inference of large-scale models with features like ZeRO, 3D parallelism, and Mixture-of-Experts.

0 favorites 0 likes

#optimization

Using OR-Tools CP-SAT for Scheduling Problems

Hacker News Top ↗ · 2026-05-13 Cached

The article discusses using Google's OR-Tools CP-SAT solver to optimize maintenance scheduling for cloud infrastructure at Akamai, addressing complex constraints like capacity and concurrency.

0 favorites 0 likes

#optimization

Partial static single information form

Lobsters Hottest ↗ · 2026-05-13 Cached

The article discusses Partial Static Single Information (SSI) form, an extension to SSA in compilers that captures path-dependent type information. It proposes a practical shortcut for implementing Partial SSI during SSA construction in dynamic languages, specifically referencing an implementation in Ruby's ZJIT.

0 favorites 0 likes

optimization

Submit Feedback