Tag
zml/llmd now runs fully on Apple's Metal API, serving 8 simultaneous requests at full bf16 precision, with continuous batching and other modern features.
The author open-sourced a custom AI accelerator (atik) implemented on FPGA with native BF16 and attention support, demonstrating significant speedups over PyTorch for various models.
A page from Modal's LLM Engineer's Almanac that provides an interactive explorer for understanding low-precision floating-point formats like bf16 and fp4.