Tag
This post from NVIDIA explains how to use the NVIDIA Model Optimizer library to quantize a CLIP model to FP8 using post-training quantization, reducing VRAM usage and improving inference performance on consumer GPUs.