Tag
The author compares various GPUs for LLM inference, critiquing common benchmarks and emphasizing the importance of prefill performance over generation speed, offering recommendations for different budgets and use cases.