Lemonade v10.8: auto memory management, cloud offload, Omni improvements, and call your local models as MCP tools
Summary
Lemonade v10.8 introduces auto memory management, cloud offload, improvements to Omni, and the ability to call local AI models as MCP tools.
Similar Articles
Lemonade v10.7 release and project organization update
Lemonade v10.7 release introduces LMX-Omni virtual models for omni-modal chat, a bench CLI tool for LLM performance comparison across backends, and expanded GPU support on AMD, Apple Silicon, Nvidia, and Intel systems.
macOS support in Lemonade has graduated out of beta!
Lemonade, an open-source local AI solution, has graduated macOS support from beta, now offering all major capabilities including OmniRouter, coding, image/speech generation and transcription on macOS.
AMD's Lemonade SDK for local AI adds NVIDIA CUDA support
AMD's Lemonade SDK for local AI adds NVIDIA CUDA support in version 10.7, enabling the same local AI server experience on competitor GPUs. The release also introduces lemonade bench for cross-backend LLM benchmarking and broader Vulkan support.
Lemonade v10.5.1: an MTP + ROCm 7.13 quick start for Strix Halo
Lemonade v10.5.1 adds MTP support and ROCm 7.13 quick start for Strix Halo, along with a Fedora 43 fix.
@vllm_project: Meet vLLM-Omni v0.22.0, a major upgrade for omnimodal world models and production-grade multimodal serving. Day-0 @NVID…
vLLM-Omni v0.22.0 is a major upgrade adding robust support for NVIDIA Cosmos world models, production TTS (Qwen3-TTS, Qwen3-Omni, VoxCPM2), faster diffusion model serving (Wan 2.2, HunyuanVideo 1.5, LTX-2.3), and broader quantization and hardware coverage with 339 commits from 124 contributors.