simd

#simd

Improving std::simd::swizzle_dyn

Lobsters Hottest ↗ · 14h ago Cached

A detailed analysis of Rust's std::simd::swizzle_dyn implementation, exposing performance shortcomings and proposing optimizations to better leverage hardware shuffle instructions.

0 favorites 0 likes

#simd

Everyone Should Know SIMD

Hacker News Top ↗ · yesterday Cached

A blog post by Mitchell Hashimoto arguing that SIMD (Single Instruction, Multiple Data) is simpler than often assumed, demonstrating a common pattern for using SIMD in loops with Zig examples.

0 favorites 0 likes

#simd

This post discusses the use of SIMD to optimize collision detection for convex hulls in Box3D, particularly using wide SIMD and the Separating Axis Test to improve performance for hulls with many edges.

0 favorites 0 likes

#simd

Gigatoken: A new open source tokenizer ~100x faster than Tiktoken, -500-1000x faster than Huggingface

Reddit r/LocalLLaMA ↗ · 2d ago Cached

Gigatoken is an open-source tokenizer that achieves up to 1000x speedup over HuggingFace tokenizers and 100x over Tiktoken, using SIMD and caching optimizations. It supports drop-in replacement for existing tokenizer APIs.

0 favorites 0 likes

#simd

Why l (new runtime for k and q)

Lobsters Hottest ↗ · 3d ago Cached

A new runtime for the K and Q programming languages that reimagines execution using SIMD, parallelism, fusion, and compression to better leverage modern hardware.

0 favorites 0 likes

#simd

How to pack ternary numbers in 8-bit bytes

Hacker News Top ↗ · 2026-07-14 Cached

A blog post describing an efficient method to pack ternary numbers into 8-bit bytes using SIMD-friendly unpacking, achieving 1.6 bits per trit, with applications in LLM weight quantization like BitNet b1.58.

0 favorites 0 likes

#simd

Quadrupling code performance with a "useless" if

Lobsters Hottest ↗ · 2026-07-13 Cached

A blog post demonstrates how adding a conditional check that appears useless can dramatically improve loop performance by allowing the CPU's branch predictor to eliminate data dependencies, achieving up to 4x speedup in a specific compression algorithm.

0 favorites 0 likes

#simd

Talos-XII: hand-written autograd + small RL/MLP stack in Rust, applied to gacha probability modeling (no tch-rs/ndarray/PyTorch) — looking for benchmark help on ARM/AVX-512/GPU [P]

Reddit r/MachineLearning ↗ · 2026-07-09

Talos-XII is a CLI simulator for Arknights: Endfield's gacha system, built entirely in Rust with a custom autograd engine and small RL/MLP stack (no external ML frameworks). It uses neural networks for environment modeling and pull-decision policy, and includes sophisticated SIMD dispatch and an open experiment called ACHF for adaptive caching.

0 favorites 0 likes

#simd

l: A new runtime for k and q

Hacker News Top ↗ · 2026-07-07 Cached

l is a new runtime for k4, q, and qSQL that provides transparent SIMD, compressed vectors, and automatic parallelism while maintaining full compatibility with existing code. It targets high-performance computing on Wall Street.

0 favorites 0 likes

#simd

Finding a needle in a 4 GB haystack: from 0.75 GB/s to 49 GB/s in Go

Lobsters Hottest ↗ · 2026-07-07 Cached

A developer details the process of optimizing a Go file search from 0.75 GB/s to 49 GB/s, leveraging techniques like SIMD and understanding memory hierarchy, including Go 1.26's new `simd/archsimd` package.

0 favorites 0 likes

#simd

@Shanshrew: 1 Year of Research coming to an end. 8 Months seeing no results, Almost quitting twice. November/December of last year …

X AI KOLs Following ↗ · 2026-07-06 Cached

After a year of research, a universal 2x performance improvement for all modern JavaScript parsers is being implemented, starting with oxc_parser, which will speed up tools like OXLint, Vite, and Deno, and potentially major browser engines.

0 favorites 0 likes

#simd

Single header Parser Combinators for C

Hacker News Top ↗ · 2026-07-01 Cached

CParseC is a single-header C99 library for parser combinators inspired by Haskell's Parsec, offering zero-copy parsing, no hidden allocations, and SIMD-optimized combinators. It aims to provide a flexible, performant alternative to handwritten parsers and lex/yacc tools.

0 favorites 0 likes

#simd

A Tiny Compiler for Data-Parallel Kernels

Hacker News Top ↗ · 2026-06-25 Cached

A blog post describing a tiny compiler that demonstrates how to lower data-parallel kernels by converting for loops into vectorized loops with lanes and masks, implemented in ~180 lines of Python.

0 favorites 0 likes

#simd

@shubh6200: To understand how massive files are processed, read "Parsing Gigabytes of JSON per Second" by @geofflangdale and @lemir…

X AI KOLs Timeline ↗ · 2026-06-24 Cached

The paper presents simdjson, the first validating JSON parser capable of processing gigabytes per second on a single core using SIMD instructions, achieving substantial speedups over existing parsers like RapidJSON.

0 favorites 0 likes

#simd

Safe SIMD in Rust, even on the inside

Lobsters Hottest ↗ · 2026-06-20 Cached

Rust's SIMD abstractions now allow safe usage without unsafe code by leveraging CPU feature tokens introduced in Rust 1.87, enabling concise and portable vector operations.

0 favorites 0 likes

#simd

@Modular: Two paths feed the game's fields. One precomputes the fields with SIMD. The other compiles a Mojo kernel to WebAssembly…

X AI KOLs Following ↗ · 2026-06-15

Modular demonstrates two approaches for computing game fields: using SIMD precomputation or compiling a Mojo kernel to WebAssembly for live browser rendering.

0 favorites 0 likes

#simd

@Modular: A Mojo program generates unique levels 8 SIMD lanes at a time representing the thermal field of a GPU. Heat creates big…

X AI KOLs Following ↗ · 2026-06-15

A Mojo program uses 8 SIMD lanes to generate game levels based on GPU thermal fields, with heat creating obstacles and coolant adding boosts.

0 favorites 0 likes

#simd

Clojure is almost as fast as C (with some help)

Lobsters Hottest ↗ · 2026-06-15 Cached

This article details how Clojure, with the JVM's Vector API and careful optimization, achieved frame rates within 20% of C for a 3D stress test, demonstrating that a dynamic language can approach low-level performance on hot loops.

0 favorites 0 likes

#simd

PivCo-Huffman

Lobsters Hottest ↗ · 2026-06-05 Cached

This paper presents PivCo-Huffman, a new approach to Huffman coding using pivot coding from wavelet trees, enabling high-performance SIMD-friendly encoding and decoding. It consistently outperforms state-of-the-art Huffman codecs and shows how ANS coding can be selectively applied to skewed nodes to approach ANS compression ratios while preserving high decompression speeds.

0 favorites 0 likes

#simd

Accelerating std::copy_if using SIMD

Lobsters Hottest ↗ · 2026-05-26 Cached

Blog post analyzing and implementing a SIMD-accelerated version of std::copy_if using AVX-512 instructions on AMD Zen 4, with performance analysis and comparisons to compiler auto-vectorization.

0 favorites 0 likes

simd

Improving std::simd::swizzle_dyn

Everyone Should Know SIMD

SIMD for Collision

Gigatoken: A new open source tokenizer ~100x faster than Tiktoken, -500-1000x faster than Huggingface

Why l (new runtime for k and q)

How to pack ternary numbers in 8-bit bytes

Quadrupling code performance with a "useless" if

Talos-XII: hand-written autograd + small RL/MLP stack in Rust, applied to gacha probability modeling (no tch-rs/ndarray/PyTorch) — looking for benchmark help on ARM/AVX-512/GPU [P]

l: A new runtime for k and q

Finding a needle in a 4 GB haystack: from 0.75 GB/s to 49 GB/s in Go

@Shanshrew: 1 Year of Research coming to an end. 8 Months seeing no results, Almost quitting twice. November/December of last year …

Single header Parser Combinators for C

A Tiny Compiler for Data-Parallel Kernels

@shubh6200: To understand how massive files are processed, read "Parsing Gigabytes of JSON per Second" by @geofflangdale and @lemir…

Safe SIMD in Rust, even on the inside

@Modular: Two paths feed the game's fields. One precomputes the fields with SIMD. The other compiles a Mojo kernel to WebAssembly…

@Modular: A Mojo program generates unique levels 8 SIMD lanes at a time representing the thermal field of a GPU. Heat creates big…

Clojure is almost as fast as C (with some help)

PivCo-Huffman

Accelerating std::copy_if using SIMD

Submit Feedback