I benchmarked PrismML's 1-bit Bonsai-8B against IBM's Granite on CPU tool calling. The 1-bit model won, but only with grammar-constrained decoding
Summary
An independent benchmark of PrismML's 1-bit Bonsai-8B against IBM's Granite and other models on CPU tool calling shows that with grammar-constrained decoding, Bonsai-8B achieves a 92% pass rate, outperforming larger models, but fails without constraints. Granite is the best raw model at 72%.
Similar Articles
prism-ml/bonsai-image-ternary-4B-gemlite-2bit
Prism ML releases Bonsai Image, a 1.21 GB text-to-image diffusion transformer using ternary weights (1.58-bit) for NVIDIA GPUs, offering 4.5s / 1024² on RTX 3080 and much smaller than FP16.
1-Bit Bonsai Image 4B Image Generation for Local Devices
PrismML releases Bonsai Image 4B, a family of compact image generation models using 1-bit and ternary weights, enabling high-quality diffusion inference on local devices like laptops and iPhones with significantly reduced memory footprint.
PrismML just released Binary and Ternary Bonsai Image 4B: 1-bit/ternary text-to-image diffusion transformers that can even run 100% locally in your browser on WebGPU.
PrismML released Bonsai Image 4B models in binary and ternary quantized versions, enabling text-to-image generation to run locally in a browser via WebGPU with only 3GB size, under Apache-2.0 license.
Ternary Bonsai: Top Intelligence at 1.58 Bits
A highly efficient AI model architecture using ternary weights (-1, 0, 1) that achieves competitive performance while requiring only 1.58 bits per parameter, enabling deployment on extremely constrained devices.
@no_stp_on_snek: someone will wave the card at me: the 9B crushes its base on the coding benchmarks (SWE-bench 69 vs 53). true. but on m…
A commentator discusses the performance of a 9B model on coding benchmarks, noting that while it beats its base on SWE-bench (69 vs 53), the advantage narrows on behavioral and long-horizon tests, suggesting limited gains outside benchmark distributions.