cuda-kernels

Tag

Cards List
#cuda-kernels

@hardmaru: The human brain is incredibly efficient because it only activates the specific neurons needed for a thought. Modern LLM…

X AI KOLs Timeline · 2026-05-08 Cached

This paper introduces TwELL and Hybrid sparse formats with custom CUDA kernels to efficiently leverage unstructured sparsity in LLMs, achieving over 20% faster training and inference on H100 GPUs while reducing energy and memory usage.

0 favorites 0 likes
#cuda-kernels

@QingQ77: Pure Rust LLM inference engine with custom CUDA kernels for each hardware × model × quantization combination, achieving higher inference speed than vLLM and TensorRT-LLM. https://github.com/Avarok-Cybersecurity/a…

X AI KOLs Timeline · 2026-05-08 Cached

Atlas is a pure Rust LLM inference engine that delivers faster inference than vLLM and TensorRT-LLM by customizing CUDA kernels for each hardware × model × quantization combination.

0 favorites 0 likes
← Back to home

Submit Feedback