Tag
This paper accelerates the NeurASP neurosymbolic AI framework by implementing vectorization, batch processing, and caching, achieving multiple orders of magnitude speedup on larger tasks.
Blog post analyzing and implementing a SIMD-accelerated version of std::copy_if using AVX-512 instructions on AMD Zen 4, with performance analysis and comparisons to compiler auto-vectorization.
This article explores the fastest methods for matching characters on ARM processors using SIMD instructions, comparing traditional NEON approaches with newer SVE2 capabilities available on modern ARM chips like AWS Graviton4, Google Axion, and others.