Tag
Pull request adds optimized x86 and generic CPU q1_0 dot-product kernels to ggml-cpu, improving quantized LLM inference speed.
GGML and llama.cpp have joined Hugging Face to ensure long-term sustainability of local AI development. Georgi Gerganov's team will maintain full autonomy over the projects while receiving resources to scale community support and improve integration between llama.cpp inference and transformers model definitions.