Tag
At least seven Chinese companies are shipping H100/H200-class AI accelerators, most having recently IPO'd, with several founded by former NVIDIA/AMD architects. Huawei's Ascend 950 targets H200-class performance, and China's domestic market share is rising as NVIDIA's declines.
A user asks about buying Chinese AI accelerators/GPUs for inference, specifically looking for Huawei alternatives to Nvidia, with support for vLLM or Llama.cpp.
KForge is a cross-platform framework that uses two collaborating LLM-based agents to automatically generate and optimize high-performance compute kernels for diverse AI accelerators, achieving significant speedups on NVIDIA B200 and Intel Arc B580 hardware.
This paper introduces TRAM, a method that jointly optimizes approximate multiplier structures and AI model parameters to reduce power consumption in AI accelerators while maintaining accuracy.
AccelOpt is a self-improving LLM agentic system that autonomously optimizes AI accelerator kernels through iterative generation and optimization memory, achieving 49-61% peak throughput improvements on AWS Trainium while being 26x cheaper than Claude Sonnet 4.