@UnslothAI: GLM-5.2 现在可以本地运行！2-bit 模型在从 1.51TB 缩小到 238GB（-84% 大小）后保留了约 82% 的准确率…

X AI KOLs Timeline 2026/06/18 12:40 模型

glm-5.2 unsloth open-source quantization local-inference large-language-model gguf

摘要

UnslothAI 宣布 GLM-5.2，Z.ai 的最强开源模型，拥有 744B 参数，现在可以通过动态 GGUF 量化在本地运行，将大小减少约 84% 至 239GB，同时保留约 82% 的准确率。它适用于 256GB Mac 以及 RAM/VRAM 配置，并支持长上下文、推理和代理任务。

GLM-5.2 现在可以本地运行！2-bit 模型在从 1.51TB 缩小到 238GB（-84% 大小）后保留了约 82% 的准确率。可以在 256GB Mac 或 RAM/VRAM 配置上运行。GLM-5.2 是迄今为止最强的开源模型。指南：https://unsloth.ai/docs/models/glm-5.2… GGUF：https://huggingface.co/unsloth/GLM-5.2-GGUF…

查看原文

查看缓存全文

缓存时间: 2026/06/18 14:16

GLM-5.2 现在可以本地运行！2-bit 模型在从 1.51TB 压缩至 238GB（体积缩小 84%）后，仍保留约 82% 的精度。可在 256GB Mac 或 RAM/VRAM 配置上运行。GLM-5.2 是迄今为止最强的开源模型。指南：https://unsloth.ai/docs/models/glm-5.2… GGUF：https://huggingface.co/unsloth/GLM-5.2-GGUF… — # GLM-5.2 - 如何本地运行 | Unsloth 文档来源：https://unsloth.ai/docs/models/glm-5.2 查看完整文档索引，请参阅 llms.txt (https://unsloth.ai/docs/llms.txt)。此页面也提供 Markdown 格式 (https://unsloth.ai/docs/models/glm-5.2.md)。 1. 模型 (https://unsloth.ai/docs/models) ## GLM-5.2 - 如何本地运行在本地硬件上运行智谱 AI 的新 GLM-5.2 模型！ GLM-5.2 是智谱 AI 发布的新开源模型，在长周期编程、推理和智能体任务上均展现了 SOTA 性能。拥有 744B 参数、40B 激活参数和 1M 上下文窗口，现在可以通过 Unsloth Dynamic (https://unsloth.ai/docs/basics/unsloth-dynamic-2.0-ggufs) GGUFs 在本地运行。 GLM-5.2 是迄今为止最强的开源模型，在 Artificial Analysis 及众多其他基准测试中，性能与 Claude 4.8 Opus、GPT-5.5 和 Gemini 3.1 Pro 持平。完整模型需要 1.51TB 的磁盘空间，而 Unsloth Dynamic 2-bit GGUF 通过将重要层上采样至 8 或 16bit，将此需求降低至 239GB（体积缩小 84%）。Dynamic 1-bit 则进一步降低至 217GB（缩小 86%）。感谢智谱 AI 为 Unsloth 提供零日访问权限。 GLM-5.2-GGUF (https://huggingface.co/unsloth/GLM-5.2-GGUF) 运行 GLM-5.2 教程 (https://unsloth.ai/docs/models/glm-5.2#run-glm-5.2-tutorials) 量化结果 (https://unsloth.ai/docs/models/glm-5.2#quantization-analysis) 2-bit 动态量化 UD-IQ2_M 仅使用 239GB 磁盘空间 —— 可直接适配 256GB 统一内存的 Mac，并在 1x24GB GPU 和 256GB RAM（启用 MoE 卸载）配置下表现良好。1-bit 量化可适配 223GB RAM，而 8-bit 则需要 810GB RAM。 表：推理硬件需求（单位 = 总内存：RAM + VRAM，或统一内存）为获得最佳性能，请确保您的总可用内存（包括 VRAM 和系统 RAM）远超量化后的模型文件大小。 GLM-5.2 提供了 3 种思考模式。非思考模式和思考模式分为两种：High + Max。复杂任务请使用 Max Thinking。在 Unsloth Studio (https://unsloth.ai/docs/models/glm-5.2#run-glm-5.2-in-unsloth-studio) 中，您可以通过界面轻松切换 High、Max Thinking 和非思考模式。对于大多数用例，推荐这些设置：默认设置（大多数任务） - 最大上下文窗口： 1,048,576。GLM 5.2 默认使用思考模式。并支持将 reasoning_effort 设置为 “high”、“max” 或禁用思考。要禁用思考模式，请使用 --chat-template-kargs '{"enable_thinking":false}' 如果您在 Windows Powershell 上，请使用：--chat-template-kargs "\"\\"enable_thinking\\":false\"" 可互换使用 ‘true’ 和 ‘false’。您现在也可以在 llama.cpp 中使用 --reasoning on 或 --reasoning off！我们还运行了 KLD（KL 散度）来评估 GLM-5.2-GGUF 量化的精度。总体而言，动态 4-bit UD-Q4_K_XL 和动态 5-bit UD-Q5_K_XL 通常是无损的，更小的量化表现也很好。在纯 Top-1% 精度上，动态 1-bit 获得约 76.2% 的精度，同时体积缩小 86%！动态 2-bit 获得约 82% 的精度，同时体积缩小 84%。 99.9% KLD 通常也很好——但从 4bit 开始有更大的提升，因此对于大规模的分布外任务，动态 4-bit 可能是最佳选择。平均 KLD 通常遵循随磁盘空间变化的清晰单调趋势，表明即使在 1-bit 下，GLM 5.2 也能良好工作！您现在可以在 llama.cpp (https://unsloth.ai/docs/models/glm-5.2#run-in-llama.cpp) 和 Unsloth Studio (https://unsloth.ai/docs/models/glm-5.2#run-glm-5.2-in-unsloth-studio) 中运行 GLM-5.2。我们将使用 239GB 的 UD-IQ2_M (https://huggingface.co/unsloth/GLM-5.2-GGUF/tree/main/UD-IQ2_M) 量化，以在可访问性和精度之间取得最佳效果。 ### 🦥 在 Unsloth Studio 中运行 GLM-5.2 GLM-5.2 可以在 Unsloth Studio (https://unsloth.ai/docs/new/studio) 中运行，这是一个用于本地 AI 的开源 Web UI。Unsloth Studio 会自动卸载到 RAM 并检测多 GPU 设置。使用 Unsloth Studio，您可以在 MacOS、Windows、Linux 上本地运行模型，并支持： - 搜索、下载、运行 GGUFs (https://unsloth.ai/docs/new/studio#run-models-locally) 和 safetensor 模型 - 通过 llama.cpp 实现快速 CPU + GPU 推理 安装并启动 Unsloth 安装方法：在终端中运行： MacOS、Linux、WSL： Windows PowerShell： 启动 Unsloth MacOS、Linux、WSL 和 Windows：然后在浏览器中打开 http://127.0.0.1:8888（或您的特定 URL）。 通过 HTTPS 和 Cloudflare 安全启动 Unsloth 新功能！ Unsloth 现在提供通过免费的 Cloudflare 隧道在 HTTPS 上安全启动 Studio 的方法。使用以下命令（适用于 Windows、Mac 和 Linux）： 搜索并下载 GLM-5.2 Unsloth Studio 会自动卸载到 RAM 并检测多 GPU 设置。首次启动时，您需要创建密码以保护您的账户，并在之后重新登录。然后转至 Studio 聊天 (https://unsloth.ai/docs/new/studio/chat) 选项卡，在搜索栏中搜索 GLM-5.2，并下载您所需的模型和量化版本。请确保您有足够的算力来运行该模型。 运行 GLM-5.2 使用 Unsloth Studio 时，推理参数会自动设置，但您也可以手动更改。您还可以编辑上下文长度、聊天模板和其他设置。有关更多信息，请查看我们的 Unsloth Studio 推理指南 (https://unsloth.ai/docs/new/studio/chat)。 Qwen3.6 运行工具调用的示例 ### 🦙 在 llama.cpp 中运行 GLM-5.2 在本指南中，我们将运行 UD-IQ2_M 量化，这至少需要 245GB RAM。您可以根据需要更改量化类型。在这些教程中，我们将使用 llama.cpp 进行快速本地推理。 GGUF：GLM-5.2-GGUF (https://huggingface.co/unsloth/GLM-5.2-GGUF) 获取最新的 llama.cpp 在****GitHub 此处 (https://github.com/ggml-org/llama.cpp)。您也可以按照下面的构建说明操作。如果没有 GPU 或仅使用 CPU 推理，请将 -DGGML_CUDA=ON 改为 -DGGML_CUDA=OFF。对于 Apple Mac / Metal 设备，设置 -DGGML_CUDA=OFF 然后照常操作——Metal 支持默认开启。您现在可以直接使用 llama.cpp 加载和下载模型，就像 ollama run 一样。首先，选择您想要的量化类型，例如 UD-IQ2_M。同时使用 export LLAMA_CACHE="unsloth/GLM-5.2-GGUF" 来强制 llama.cpp 保存到特定位置。注意，此下载过程可能非常慢，因此最好使用下一节中的手动下载过程。如果您想手动下载模型 （更快！），我们可以通过以下代码下载模型（安装 pip install huggingface_hub 后）。如果下载卡住，请参阅：Hugging Face Hub, XET 调试 (https://unsloth.ai/docs/basics/troubleshooting-and-faqs/hugging-face-hub-xet-debugging) 如果您想使用动态 1bit，请执行：然后以对话模式运行模型。使用 unsloth/GLM-5.2-GGUF/UD-IQ2_M/GLM-5.2-UD-IQ2_M-00001-of-00006.gguf 用于 2bit，或 unsloth/GLM-5.2-GGUF/UD-IQ1_S/GLM-5.2-UD-IQ1_S-00001-of-00006.gguf 用于 1bit。当您启动 llama-cli 时，您将看到：然后，在提示它制作一个简单的 Flappy Bird 游戏后，我们得到：以下是完整的对话和游戏：完整游戏 HTML https://unsloth.ai/docs/models/glm-5.2#full-game-in-html 完整对话 https://unsloth.ai/docs/models/glm-5.2#full-conversation 游戏带有声音并且运行完美！请记住，这是 1-bit 量化，效果很好！ ### 📐 通过 KV 缓存量化实现长上下文为了在 llama.cpp 中利用长上下文，我们需要使用 KV 缓存量化来减少内存使用。最近，llama.cpp 为 KV 缓存量化添加了更高精度的技巧——请参阅 (https://github.com/ggml-org/llama.cpp/pull/21038) 和其他 PR！目前，支持以下 KV 缓存数据类型：默认使用 f16。如果使用 q4_0（每个权重约 4.5 bits），您可以扩展约 16 / 4.5 = 3.5 倍的上下文长度！因此，如果您的模型原本支持 10K，那么可能达到 35K！q4_1 可能更好，因为它还带有一个移位参数，每个权重为 5 bits——因此可实现 3.2 倍的更长上下文。使用方式如下：您可以进一步查看下方 GLM-5.2 的表格格式基准测试： Terminal Bench 2.1 (Terminus-2) Terminal Bench 2.1 (最佳报告 Harness) - 推荐设置 (https://unsloth.ai/docs/models/glm-5.2#recommended-settings) - 📈 量化分析 (https://unsloth.ai/docs/models/glm-5.2#quantization-analysis) - 运行 GLM-5.2 教程： (https://unsloth.ai/docs/models/glm-5.2#run-glm-5.2-tutorials) - 🦥 在 Unsloth Studio 中运行 GLM-5.2 (https://unsloth.ai/docs/models/glm-5.2#run-glm-5.2-in-unsloth-studio) - 🦙 在 llama.cpp 中运行 GLM-5.2 (https://unsloth.ai/docs/models/glm-5.2#run-glm-5.2-in-llama.cpp) - 📐 通过 KV 缓存量化实现长上下文 (https://unsloth.ai/docs/models/glm-5.2#long-context-via-kv-cache-quantization) - 📊 基准测试 (https://unsloth.ai/docs/models/glm-5.2#benchmarks) curl -fsSL https://unsloth.ai/install.sh | sh irm https://unsloth.ai/install.ps1 | iex unsloth studio -H 0.0.0.0 -p 8888 apt-get update apt-get install pciutils build-essential cmake curl libcurl4-openssl-dev -y git clone https://github.com/ggml-org/llama.cpp cmake llama.cpp -B llama.cpp/build \ -DBUILD_SHARED_LIBS=OFF -DGGML_CUDA=ON cmake --build llama.cpp/build --config Release -j --clean-first --target llama-cli llama-mtmd-cli llama-server llama-gguf-split cp llama.cpp/build/bin/llama-* llama.cpp export LLAMA_CACHE="unsloth/GLM-5.2-GGUF" ./llama.cpp/llama-cli \ -hf unsloth/GLM-5.2-GGUF:UD-IQ2_M \ --temp 1.0 \ --top-p 0.95 \ --min-p 0.01 hf download unsloth/GLM-5.2-GGUF \ --local-dir unsloth/GLM-5.2-GGUF \ --include "*UD-IQ2_M*" # 使用 "*UD-Q8_K_XL*" 获取接近全精度版本 hf download unsloth/GLM-5.2-GGUF \ --local-dir unsloth/GLM-5.2-GGUF \ --include "*UD-IQ1_S*" ./llama.cpp/llama-cli \ --model unsloth/GLM-5.2-GGUF/UD-IQ2_M/GLM-5.2-UD-IQ2_M-00001-of-00006.gguf \ --temp 1.0 \ --top-p 0.95 \ --min-p 0.01/ `` Sunset Flier :root { –sunset-1: #ff6b6b; –sunset-2: #feca50; –sunset-3: #ff9ff3; –dusk: #36306b; –night: #1a1746; –accent: #ffd93b; –coral: #ff6b6b; –pipe: #4a902b; –pipe-dark: #2d5a1a; } * { margin: 0; padding: 0; box-sizing: border-box; } html, body { height: 100%; width: 100%; overflow: hidden; background: var(–night); font-family: ‘Fred’, sans-serif; -webkit-user-select: none; user-select: none; touch-action: manipulation; } #game-wrap { position: relative; width: 100vw; height: 100vh; display: flex; justify-content: center; align-items: center; background: linear-gradient(180deg, #1a1746 0%, #36306b 40%, #ff6b6b 70%, #feca50 100%); } #game-frame { position: relative; width: min(100vw, 480px); height: min(100vh, 720px); max-height: 100vh; box-shadow: 0 30px 80px rgba(0,0,0,0.6), inset 0 0 0 1px rgba(255,255,255,0.05); overflow: hidden; background: linear-gradient(180deg, #4a3a8e 0%, #ff6b6b 60%, #feca50 100%); } canvas { position: absolute; inset: 0; width: 100%; height: 100%; display: block; } .overlay { position: absolute; inset: 0; display: flex; flex-direction: column; justify-content: center; align-items: center; pointer-events: none; z-index: 10; transition: opacity 0.3s ease; } .overlay.hidden { opacity: 0; pointer-events: none; } .overlay.visible { opacity: 1; pointer-events: auto; } .panel { background: rgba(26, 23, 70, 0.85); border: 3px solid var(–accent); border-radius: 16px; padding: 28px 36px; text-align: center; color: #fff; box-shadow: 0 12px 0 rgba(0,0,0,0.3), 0 0 40px rgba(255, 217, 59, 0.4); backdrop-filter: blur(4px); transform: translateY(0); animation: bob 3s ease-in-out infinite; } @keyframes bob { 0%, 100% { transform: translateY(0); } 50% { transform: translateY(-6px); } } .title { font-family: ‘Press Start 2P’, monospace; font-size: 26px; color: var(–accent); text-shadow: 3px 3px 0 #b87c0a, 6px 6px 0 rgba(0,0,0,0.3); letter-spacing: 1px; margin-bottom: 6px; line-height: 1.3; } .subtitle { font-size: 14px; color: #ffe8a8; margin-bottom: 20px; font-weight: 700; } .tap-icon { font-size: 42px; margin: 8px 0; animation: tap 1.2s ease-in-out infinite; } @keyframes tap { 0%, 50% { transform: translateY(0) scale(1); } 20% { transform: translateY(-8px) scale(1.1); } } .instructions { font-size: 13px; color: #fff; opacity: 0.85; margin-top: 10px; font-weight: 400; } .score-row { display: flex; gap: 24px; justify-content: center; margin: 12px 0 20px; } .score-box { background: rgba(0,0,0,0.4); border: 2px solid var(–accent); border-radius: 10px; padding: 10px 18px; min-width: 80px; } .score-box .label { font-family: ‘Press Start 2p’, monospace; font-size: 9px; color: var(–accent); margin-bottom: 4px; letter-spacing: 1px; } .score-box .value { font-family: ‘Press Start 2p’, monospace; font-size: 18px; color: #fff; } .score-box.best .value { color: var(–coral); } .btn { font-family: ‘Press Start 2p’, monospace; font-size: 12px; color: var(–night); background: var(–accent); border: none; padding: 12px 22px; border-radius: 8px; cursor: pointer; letter-spacing: 1px; box-shadow: 0 6px 0 #b87c0a, 0 8px 12px rgba(0,0,0,0.3); transition: transform 0.1s, box-shadow 0.1s; pointer-events: auto; } .btn:hover { transform: translateY(2px); box-shadow: 0 4px 0 #b87c0a, 0 6px 10px rgba(0,0,0,0.3); } .btn:active { transform: translateY(6px); box-shadow: 0 0 0 #b87c0a, 0 2px 6px rgba(0,0,0,0.3); } #hud { position: absolute; top: 24px; left: 50%; transform: translateX(-50%); z-index: 5; font-family: ‘Press Start 2P’, monospace; font-size: 36px; color: #fff; text-shadow: 3px 3px 0 #b87c0a, 5px 5px 0 rgba(0,0,0,0.5); pointer-events: none; transition: opacity 0.3s; opacity: 0; } #hud.visible { opacity: 1; } #hud .new-best { font-size: 11px; color: var(–coral); text-shadow: 2px 2px 0 #000; margin-top: 8px; opacity: 0; transition: opacity 0.3s; } #hud.has-new-best .new-best { opacity: 1; animation: pulse 0.6s ease infinite alternate; } @keyframes pulse { from { transform: translateX(-50%) scale(1); } to { transform: translateX(-50%) scale(1.15); } } .medal { font-family: ‘Press Start 2P’, monospace; font-size: 48px; margin: 10px 0; text-shadow: 3px 3px 0 #000; } .flash { position: absolute; inset: 0; background: #fff; opacity: 0; pointer-events: none; z-index: 8; } .footer { position: absolute; bottom: 12px; left: 50%; transform: translateX(-50%); font-size: 11px; color: rgba(255,255,255,0.7); font-weight: 400; text-align: center; pointer-events: none; z-index: 9; } 0 NEW BEST! SUNSETFLIER — dusk flight — ✊ TAP / SPACE / CLICK to flap GAME OVER ★ SCORE 0 BEST 0 RETRY Sound on • Tap to fly (() => { // ============ Setup ============ const canvas = document.getElementById(‘canvas’); const ctx = canvas.getContext(‘2d’); const frame = document.getElementById(‘game-frame’); const hud = document.getElementById(‘hud’); const hudScore = document.getElementById(‘hud-score’); const startScreen = document.getElementById(‘start-screen’); const endScreen = document.getElementById(‘end-screen’); const endScore = document.getElementById(‘end-score’); const endBest = document.getElementById(‘end-best’); const medalEl = document.getElementById(‘medal’); const flashEl = document.getElementById(‘flash’); const restartBtn = document.getElementById(‘restart-btn’); let W = 480, H = 720; const dpr = window.devicePixelRatio || 1; function resize() { const rect = frame.getBoundingClientRect(); W = rect.width; H = rect.height; canvas.width = W * dpr; canvas.height = H * dpr; ctx.setTransform(dpr, 0, 0, dpr, 0, 0); } window.add

@UnslothAI: GLM-5.2 现在可以本地运行！2-bit 模型在从 1.51TB 缩小到 238GB（-84% 大小）后保留了约 82% 的准确率…

相似文章

GLM-5.2 是本地人工智能的一次胜利

@aisearchio: GLM 5.2 GGUF 已经来了！8位版本大小约为完整模型的一半。更小版本即将推出 https://huggingfa…

在仅有CPU的情况下本地运行GLM-5.2！（穷人的大型模型方案）

GLM-5.2: 专为长程任务打造

@AdinaYakup: GLM 5.2 来了 753B (比你想象的要小？) 1M上下文 MIT许可证 GLM IndexShare: 跨层复用索引器…

提交意见反馈