模型:Granite4 Vision,作者 gabe-l-hart · 拉取请求 #23545 · ggml-org/llama.cpp
摘要
此拉取请求为 llama.cpp(一个开源 LLM 推理引擎)增加了对 Granite4 Vision 模型的支持。
查看缓存全文
缓存时间: 2026/06/05 19:15
ggml-org/llama.cpp 源代码:https://github.com/ggml-org/llama.cpp # llama.cpp llama 许可证:MIT (https://opensource.org/licenses/MIT) 发布 (https://github.com/ggml-org/llama.cpp/releases) 服务端 (https://github.com/ggml-org/llama.cpp/actions/workflows/server.yml) Docker (https://github.com/ggml-org/llama.cpp/actions/workflows/docker.yml) Winget (https://github.com/ggml-org/llama.cpp/actions/workflows/winget.yml) 宣言 (https://github.com/ggml-org/llama.cpp/discussions/205) / ggml (https://github.com/ggml-org/ggml) / 运算 (https://github.com/ggml-org/llama.cpp/blob/master/docs/ops.md) 用 C/C++ 进行 LLM 推理
近期 API 变更
libllamaAPI 更新日志 (https://github.com/ggml-org/llama.cpp/issues/9289)llama-serverREST API 更新日志 (https://github.com/ggml-org/llama.cpp/issues/9291)
热门话题
- Hugging Face 缓存迁移:使用
-hf下载的模型现在存储在标准的 Hugging Face 缓存目录中,便于与其他 HF 工具共享。 - 指南:使用 llama.cpp 的新 WebUI (https://github.com/ggml-org/llama.cpp/discussions/16938)
- 指南:使用 llama.cpp 运行 gpt-oss (https://github.com/ggml-org/llama.cpp/discussions/15396)
- [反馈] 更好的打包方式以支持下游消费者 🤗
- 已添加对原生 MXFP4 格式的
gpt-oss模型的支持 | PR (https://github.com/ggml-org/llama.cpp/pull/15091) | 与 NVIDIA 合作 (https://blogs.nvidia.com/blog/rtx-ai-garage-openai-oss) | 评论 (https://github.com/ggml-org/llama.cpp/discussions/15095) - 多模态支持已登陆
llama-server:#12898 (https://github.com/ggml-org/llama.cpp/pull/12898) | 文档 - 用于 FIM 补全的 VS Code 扩展:https://github.com/ggml-org/llama.vscode
- 用于 FIM 补全的 Vim/Neovim 插件:https://github.com/ggml-org/llama.vim
- Hugging Face 推理端点现可直接支持 GGUF!https://github.com/ggml-org/llama.cpp/discussions/9669
- Hugging Face GGUF 编辑器:讨论 (https://github.com/ggml-org/llama.cpp/discussions/9268) | 工具 (https://huggingface.co/spaces/CISCai/gguf-editor)
- 浏览器中现已支持 WebGPU,请参见此处的介绍博客/演示 (https://reeselevine.github.io/llamas-on-the-web/)。
快速开始
开始使用 llama.cpp 非常简单。以下是几种在你的机器上安装的方式:
- 使用 brew、nix 或 winget 安装
llama.cpp - 使用 Docker 运行——请参阅我们的 Docker 文档
- 从发布页面 (https://github.com/ggml-org/llama.cpp/releases) 下载预编译二进制文件
- 克隆此仓库并从源码构建——请查看 我们的构建指南
安装完成后,你需要一个模型来使用。前往获取和量化模型部分了解更多。
示例命令:
# 使用本地模型文件
llama-cli -m my_model.gguf
# 或者直接下载并运行来自 Hugging Face 的模型
llama-cli -hf ggml-org/gemma-3-1b-it-GGUF
# 启动兼容 OpenAI 的 API 服务器
llama-server -hf ggml-org/gemma-3-1b-it-GGUF
描述
llama.cpp 的主要目标是在各种硬件上(本地和云端)以最少的设置和最先进的性能实现 LLM 推理。
- 纯 C/C++ 实现,无任何依赖
- Apple silicon 是一等公民——通过 ARM NEON、Accelerate 和 Metal 框架进行优化
- 对 x86 架构支持 AVX、AVX2、AVX512 和 AMX
- 对 RISC-V 架构支持 RVV、ZVFH、ZFH、ZICBOP 和 ZIHINTPAUSE
- 1.5 位、2 位、3 位、4 位、5 位、6 位和 8 位整数量化,实现更快的推理和更低的内存使用
- 自定义 CUDA 内核,用于在 NVIDIA GPU 上运行 LLM(通过 HIP 支持 AMD GPU,通过 MUSA 支持 Moore Threads GPU)
- Vulkan 和 SYCL 后端支持
- CPU+GPU 混合推理,可部分加速超过总 VRAM 容量的模型
llama.cpp 项目是开发 ggml (https://github.com/ggml-org/ggml) 库新功能的主要试验场。
支持的模型
通常也支持以下基础模型的微调版本。添加新模型支持的说明:HOWTO-add-model.md
纯文本模型
- LLaMA 🦙
- LLaMA 2 🦙🦙
- LLaMA 3 🦙🦙🦙
- Mistral 7B (https://huggingface.co/mistralai/Mistral-7B-v0.1)
- Mixtral MoE (https://huggingface.co/models?search=mistral-ai/Mixtral)
- DBRX (https://huggingface.co/databricks/dbrx-instruct)
- Jamba (https://huggingface.co/ai21labs)
- Falcon (https://huggingface.co/models?search=tiiuae/falcon)
- Chinese LLaMA / Alpaca (https://github.com/ymcui/Chinese-LLaMA-Alpaca) 和 Chinese LLaMA-2 / Alpaca-2 (https://github.com/ymcui/Chinese-LLaMA-Alpaca-2)
- Vigogne(法语)(https://github.com/bofenghuang/vigogne)
- BERT (https://github.com/ggml-org/llama.cpp/pull/5423)
- Koala (https://bair.berkeley.edu/blog/2023/04/03/koala/)
- Baichuan 1 & 2 (https://huggingface.co/models?search=baichuan-inc/Baichuan) + 衍生版本 (https://huggingface.co/hiyouga/baichuan-7b-sft)
- Aquila 1 & 2 (https://huggingface.co/models?search=BAAI/Aquila)
- Starcoder 模型 (https://github.com/ggml-org/llama.cpp/pull/3187)
- Refact (https://huggingface.co/smallcloudai/Refact-1_6B-fim)
- MPT (https://github.com/ggml-org/llama.cpp/pull/3417)
- Bloom (https://github.com/ggml-org/llama.cpp/pull/3553)
- Yi 模型 (https://huggingface.co/models?search=01-ai/Yi)
- StableLM 模型 (https://huggingface.co/stabilityai)
- Deepseek 模型 (https://huggingface.co/models?search=deepseek-ai/deepseek)
- Qwen 模型 (https://huggingface.co/models?search=Qwen/Qwen)
- PLaMo-13B (https://github.com/ggml-org/llama.cpp/pull/3557)
- Phi 模型 (https://huggingface.co/models?search=microsoft/phi)
- PhiMoE (https://github.com/ggml-org/llama.cpp/pull/11003)
- GPT-2 (https://huggingface.co/gpt2)
- Orion 14B (https://github.com/ggml-org/llama.cpp/pull/5118)
- InternLM2 (https://huggingface.co/models?search=internlm2)
- CodeShell (https://github.com/WisdomShell/codeshell)
- Gemma (https://ai.google.dev/gemma)
- Mamba (https://github.com/state-spaces/mamba)
- Grok-1 (https://huggingface.co/keyfan/grok-1-hf)
- Xverse (https://huggingface.co/models?search=xverse)
- Command-R 模型 (https://huggingface.co/models?search=CohereForAI/c4ai-command-r)
- SEA-LION (https://huggingface.co/models?search=sea-lion)
- GritLM-7B (https://huggingface.co/GritLM/GritLM-7B) + GritLM-8x7B (https://huggingface.co/GritLM/GritLM-8x7B)
- OLMo (https://allenai.org/olmo)
- OLMo 2 (https://allenai.org/olmo)
- OLMoE (https://huggingface.co/allenai/OLMoE-1B-7B-0924)
- Granite 模型 (https://huggingface.co/collections/ibm-granite/granite-code-models-6624c5cec322e4c148c8b330)
- GPT-NeoX (https://github.com/EleutherAI/gpt-neox) + Pythia (https://github.com/EleutherAI/pythia)
- Snowflake-Arctic MoE (https://huggingface.co/collections/Snowflake/arctic-66290090abe542894a5ac520)
- Smaug (https://huggingface.co/models?search=Smaug)
- Poro 34B (https://huggingface.co/LumiOpen/Poro-34B)
- Bitnet b1.58 模型 (https://huggingface.co/1bitLLM)
- Flan T5 (https://huggingface.co/models?search=flan-t5)
- Open Elm 模型 (https://huggingface.co/collections/apple/openelm-instruct-models-6619ad295d7ae9f868b759ca)
- ChatGLM3-6b (https://huggingface.co/THUDM/chatglm3-6b) + ChatGLM4-9b (https://huggingface.co/THUDM/glm-4-9b) + GLMEdge-1.5b (https://huggingface.co/THUDM/glm-edge-1.5b-chat) + GLMEdge-4b (https://huggingface.co/THUDM/glm-edge-4b-chat)
- GLM-4-0414 (https://huggingface.co/collections/THUDM/glm-4-0414-67f3cbcb34dd9d252707cb2e)
- SmolLM (https://huggingface.co/collections/HuggingFaceTB/smollm-6695016cad7167254ce15966)
- EXAONE-3.0-7.8B-Instruct (https://huggingface.co/LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct)
- FalconMamba 模型 (https://huggingface.co/collections/tiiuae/falconmamba-7b-66b9a580324dd1598b0f6d4a)
- Jais (https://huggingface.co/inceptionai/jais-13b-chat)
- Bielik-11B-v2.3 (https://huggingface.co/collections/speakleash/bielik-11b-v23-66ee813238d9b526a072408a)
- RWKV-7 (https://huggingface.co/collections/shoumenchougou/rwkv7-gxx-gguf)
- RWKV-6 (https://github.com/BlinkDL/RWKV-LM)
- QRWKV-6 (https://huggingface.co/recursal/QRWKV6-32B-Instruct-Preview-v0.1)
- GigaChat-20B-A3B (https://huggingface.co/ai-sage/GigaChat-20B-A3B-instruct)
- Trillion-7B-preview (https://huggingface.co/trillionlabs/Trillion-7B-preview)
- Ling 模型 (https://huggingface.co/collections/inclusionAI/ling-67c51c85b34a7ea0aba94c32)
- LFM2 模型 (https://huggingface.co/collections/LiquidAI/lfm2-686d721927015b2ad73eaa38)
- Hunyuan 模型 (https://huggingface.co/collections/tencent/hunyuan-dense-model-6890632cda26b19119c9c5e7)
- BailingMoeV2 (Ring/Ling 2.0) 模型 (https://huggingface.co/collections/inclusionAI/ling-v2-68bf1dd2fc34c306c1fa6f86)
- Mellum 模型 (https://huggingface.co/JetBrains/models?search=mellum)
多模态模型
- LLaVA 1.5 模型 (https://huggingface.co/collections/liuhaotian/llava-15-653aac15d994e992e2677a7e),LLaVA 1.6 模型 (https://huggingface.co/collections/liuhaotian/llava-16-65b9e40155f60fd046a5ccf2)
- BakLLaVA (https://huggingface.co/models?search=SkunkworksAI/Bakllava)
- Obsidian (https://huggingface.co/NousResearch/Obsidian-3B-V0.5)
- ShareGPT4V (https://huggingface.co/models?search=Lin-Chen/ShareGPT4V)
- MobileVLM 1.7B/3B 模型 (https://huggingface.co/models?search=mobileVLM)
- Yi-VL (https://huggingface.co/models?search=Yi-VL)
- Mini CPM (https://huggingface.co/models?search=MiniCPM)
- Moondream (https://huggingface.co/vikhyatk/moondream2)
- Bunny (https://github.com/BAAI-DCAI/Bunny)
- GLM-EDGE (https://huggingface.co/models?search=glm-edge)
- Qwen2-VL (https://huggingface.co/collections/Qwen/qwen2-vl-66cee7455501d7126940800d)
- LFM2-VL (https://huggingface.co/collections/LiquidAI/lfm2-vl-68963bbc84a610f7638d5ffa)
绑定
- Python:ddh0/easy-llama (https://github.com/ddh0/easy-llama)
- Python:abetlen/llama-cpp-python (https://github.com/abetlen/llama-cpp-python)
- Go:go-skynet/go-llama.cpp (https://github.com/go-skynet/go-llama.cpp)
- Node.js:withcatai/node-llama-cpp (https://github.com/withcatai/node-llama-cpp)
- JS/TS(llama.cpp 服务端客户端):lgrammel/modelfusion (https://modelfusion.dev/integration/model-provider/llamacpp)
- JS/TS(可编程提示引擎 CLI):offline-ai/cli (https://github.com/offline-ai/cli)
- JavaScript/Wasm(可在浏览器中运行):tangledgroup/llama-cpp-wasm (https://github.com/tangledgroup/llama-cpp-wasm)
- Typescript/Wasm(更友好的 API,可在 npm 上获取):ngxson/wllama (https://github.com/ngxson/wllama)
- Ruby:yoshoku/llama_cpp.rb (https://github.com/yoshoku/llama_cpp.rb)
- Ruby:docusealco/rllama (https://github.com/docusealco/rllama)
- Rust(更多功能):edgenai/llama_cpp-rs (https://github.com/edgenai/llama_cpp-rs)
- Rust(更友好的 API):mdrokz/rust-llama.cpp (https://github.com/mdrokz/rust-llama.cpp)
- Rust(更直接的绑定):utilityai/llama-cpp-rs (https://github.com/utilityai/llama-cpp-rs)
- Rust(从 crates.io 自动构建):ShelbyJenkins/llm_client (https://github.com/ShelbyJenkins/llm_client)
- C#/.NET:SciSharp/LLamaSharp (https://github.com/SciSharp/LLamaSharp)
- C#/VB.NET(更多功能 - 社区许可证):LM-Kit.NET (https://docs.lm-kit.com/lm-kit-net/index.html)
- Scala 3:donderom/llm4s (https://github.com/donderom/llm4s)
- Clojure:phronmophobic/llama.clj (https://github.com/phronmophobic/llama.clj)
- React Native:mybigday/llama.rn (https://github.com/mybigday/llama.rn)
- Java:kherud/java-llama.cpp (https://github.com/kherud/java-llama.cpp)
- Java:QuasarByte/llama-cpp-jna (https://github.com/QuasarByte/llama-cpp-jna)
- Zig:deins/llama.cpp.zig (https://github.com/Deins/llama.cpp.zig)
- Flutter/Dart:netdur/llama_cpp_dart (https://github.com/netdur/llama_cpp_dart)
- Flutter:xuegao-tzx/Fllama (https://github.com/xuegao-tzx/Fllama)
- PHP(基于 llama.cpp 构建的 API 绑定和功能):distantmagic/resonance (https://github.com/distantmagic/resonance)(更多信息)(https://github.com/ggml-org/llama.cpp/pull/6326)
- Guile Scheme:guile_llama_cpp (https://savannah.nongnu.org/projects/guile-llama-cpp)
- Swift:srgtuszy/llama-cpp-swift (https://github.com/srgtuszy/llama-cpp-swift)
- Swift:ShenghaiWang/SwiftLlama (https://github.com/ShenghaiWang/SwiftLlama)
- Delphi:Embarcadero/llama-cpp-delphi (https://github.com/Embarcadero/llama-cpp-delphi)
- Go(无需 CGo):hybridgroup/yzma (https://github.com/hybridgroup/yzma)
- Android:llama.android
用户界面
(要使项目列于此,它应明确声明依赖于 llama.cpp)
- AI Sublime Text 插件 (https://github.com/yaroslavyaroslav/OpenAI-sublime-text) (MIT)
- BonzAI App (https://apps.apple.com/us/app/bonzai-your-local-ai-agent/id6752847988)(专有)
- cztomsik/ava (https://github.com/cztomsik/ava) (MIT)
- Dot (https://github.com/alexpinel/Dot) (GPL)
- eva (https://github.com/ylsdamxssjxxdd/eva) (MIT)
- iohub/collama (https://github.com/iohub/coLLaMA) (Apache-2.0)
- janhq/jan (https://github.com/janhq/jan) (AGPL)
- johnbean393/Sidekick (https://github.com/johnbean393/Sidekick) (MIT)
- KanTV (https://github.com/zhouwg/kantv?tab=readme-ov-file) (Apache-2.0)
- KodiBot (https://github.com/firatkiral/kodibot) (GPL)
- llama.vim (https://github.com/ggml-org/llama.vim) (MIT)
- LARS (https://github.com/abgulati/LARS) (AGPL)
- Llama Assistant (https://github.com/vietanhdev/llama-assistant) (GPL)
- LlamaLib (https://github.com/undreamai/LlamaLib) (Apache-2.0)
- LLMFarm (https://github.com/guinmoon/LLMFarm?tab=readme-ov-file) (MIT)
- LLMUnity (https://github.com/undreamai/LLMUnity) (MIT)
- LMStudio (https://lmstudio.ai/)(专有)
- LocalAI (https://github.com/mudler/LocalAI) (MIT)
- LostRuins/koboldcpp (https://github.com/LostRuins/koboldcpp) (AGPL)
- MindMac (https://mindmac.app)(专有)
- MindWorkAI/AI-Studio (https://github.com/MindWorkAI/AI-Studio) (FSL-1.1-MIT)
- Mobile-Artificial-Intelligence/maid (https://github.com/Mobile-Artificial-Intelligence/maid) (MIT)
- Mozilla-Ocho/llamafile (https://github.com/Mozilla-Ocho/llamafile) (Apache-2.0)
- nat/openplayground (https://github.com/nat/openplayground) (MIT)
- nomic-ai/gpt4all (https://github.com/nomic-ai/gpt4all) (MIT)
- ollama/ollama (https://github.com/ollama/ollama) (MIT)
- oobabooga/text-generation-webui (https://github.com/oobabooga/text-generation-webui) (AGPL)
- PocketPal AI (https://github.com/a-ghorbani/pocketpal-ai) (MIT)
- psugihara/FreeChat (https://github.com/psugihara/FreeChat) (MIT)
- ptsochantaris/emeltal (https://github.com/ptsochantaris/emeltal) (MIT)
- pythops/tenere (https://github.com/pythops/tenere) (AGPL)
- ramalama (https://github.com/containers/ramalama) (MIT)
- semperai/amica (https://github.com/semperai/amica) (MIT)
- withcatai/catai (https://github.com/withcatai/catai) (MIT)
- Autopen (https://github.com/blackhole89/autopen) (GPL)
工具
- akx/ggify (https://github.com/akx/ggify) – 从 Hugging Face Hub 下载 PyTorch 模型并转换为 GGML
- akx/ollama-dl (https://github.com/akx/ollama-dl) – 从 Ollama 库下载模型,直接与 llama.cpp 一起使用
- crashr/gppm (https://github.com/crashr/gppm) – 启动利用 NVIDIA Tesla P40 或 P100 GPU 的 llama.cpp 实例,降低空闲功耗
- gpustack/gguf-parser (https://github.com/gpustack/gguf-parser-go/tree/main/cmd/gguf-parser) – 查看/检查 GGUF 文件并估算内存使用
- Styled Lines (https://marketplace.unity.com/packages/tools/generative-ai/styled-lines-llama-cpp-model-292902)(专有许可,推理的异步封装)
相似文章
特性:AesSedai 为 llama.cpp 添加 Mimo v2.5 模型支持 · 拉取请求 #22493 · ggml-org/llama.cpp
一个拉取请求已合并到 llama.cpp 中,用于添加对 Mimo v2.5 模型的支持,增强了该框架对此特定 AI 架构的兼容性。
server, webui: 支持在推理模型上继续生成,由 ServeurpersoCom · 拉取请求 #22727 · ggml-org/llama.cpp
此拉取请求在 llama.cpp 服务器和 WebUI 中添加了对推理模型继续生成的支持。
Granite 4.1 LLMs:技术架构解析
本文详细介绍了 IBM Granite 4.1 大语言模型的技术架构与训练流程,涵盖预训练、SFT(监督微调)及 RL(强化学习)阶段。文章指出,该 8B 稠密模型在性能上超越了更大的 MoE(混合专家)模型,并提及模型以 Apache 2.0 许可证开源发布。
StepFun 3.5 MTP 由 pwilkin 提交 · 拉取请求 #23274 · ggml-org/llama.cpp
为 llama.cpp 添加 StepFun 3.5 MTP 模型支持的拉取请求。
llama : 网站 + 统一的 `llama` 二进制文件 · ggml-org/llama.cpp · 讨论 #23875
Llama.cpp 宣布推出新网站和统一的 'llama' 二进制文件,以简化 LLM 推理,同时还包括 Hugging Face 缓存迁移和多模态支持等更新。