Tag
This article explores the GGUF file format used by llama.cpp for language models, highlighting its single-file convenience and the role of embedded chat templates and special tokens. It also compares different Jinja implementations and discusses what is still missing from the format.