Tag
Andrew Ng has launched a new course on LLM production deployment. The free version provides access to all videos and base code. The course dives deep into LLM internals, inference optimization (such as quantization, KV Cache, Flash Attention, speculative decoding), and hardware-aware optimization. Taught by AMD's VP of Engineering, it aims to help developers transform Transformer from an academic concept into a debuggable, optimizable engineering tool.