Tag
A review of Philip Kiely's book 'Inference Engineering', recommending it to avoid common mistakes in AI inference engineering.
This guide explains the discipline of AI inference engineering, covering the split between prefill and decoding phases, the shift from closed to open models, and optimization techniques for latency, throughput, and cost.