Built an AI Accelerator and opensourced it. [P]

Reddit r/MachineLearning Tools

Summary

The author open-sourced a custom AI accelerator (atik) implemented on FPGA with native BF16 and attention support, demonstrating significant speedups over PyTorch for various models.

There is a huge gap in open source AI accelerators, so I implemented [mine](https://github.com/AhmedZeer/atik). Popular and well known ones are already legacy and doesn't support contemporary operations like Attention. Here is what makes mine special: * **Attention** mechanism smelted directly into silicon * Prototyped end-to-end on **FPGA** (AWS F2) * Benchmarked against **PyTorch**\-based workloads * Built on the **RocketChip** architecture (RISC-V) * Native **BF16** support * Up to **225×** speedup on vanilla attention mechanism * Up to **96×** speedup on TinyBERT * Up to **50×** speedup on ViT Up to **30×** speedup on GPT-2 prefill I would really appreciate it if you check the [repo](https://github.com/AhmedZeer/atik) and give me feedback!
Original Article

Similar Articles

A hackable compiler to generate efficient fused GPU kernels for AI models [P]

Reddit r/MachineLearning

The author presents a custom, hackable ML compiler written in Python that lowers LLMs to optimized CUDA kernels through a multi-stage IR pipeline, achieving performance competitive with or superior to PyTorch on specific operations. The article details the compiler's optimization passes, lowering rules, and CLI usage for generating efficient fused GPU kernels.