LiquidAI/LFM2.5-8B-A1B-GGUF

Hugging Face Models Trending 05/24/26, 10:16 PM Models

liquid-ai lfm2.5-8b gguf-format llama-cpp vllm ollama

Summary

LiquidAI releases a GGUF quantized version of their LFM2.5-8B-A1B model, with instructions for use across multiple inference engines.

Task: text-generation Tags: gguf, liquid, lfm2, edge, llama.cpp, text-generation, en, ar, zh, fr, de, ja, ko, es, base_model:LiquidAI/LFM2.5-8B-A1B, base_model:quantized:LiquidAI/LFM2.5-8B-A1B, license:other, endpoints_compatible, region:us, conversational

Original Article

View Cached Full Text

Cached at: 05/29/26, 08:10 AM

LiquidAI/LFM2.5-8B-A1B-GGUF · Hugging Face

Source: https://huggingface.co/LiquidAI/LFM2.5-8B-A1B-GGUF Librariesllama-cpp-pythonHow to use LiquidAI/LFM2.5-8B-A1B-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="LiquidAI/LFM2.5-8B-A1B-GGUF",
	filename="LFM2.5-8B-A1B-BF16.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

NotebooksGoogle Colab KaggleLocal Appshttps://huggingface.co/settings/local-apps#local-apps llama.cppHow to use LiquidAI/LFM2.5-8B-A1B-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf LiquidAI/LFM2.5-8B-A1B-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf LiquidAI/LFM2.5-8B-A1B-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf LiquidAI/LFM2.5-8B-A1B-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf LiquidAI/LFM2.5-8B-A1B-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf LiquidAI/LFM2.5-8B-A1B-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf LiquidAI/LFM2.5-8B-A1B-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf LiquidAI/LFM2.5-8B-A1B-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf LiquidAI/LFM2.5-8B-A1B-GGUF:Q4_K_M

Use Docker

docker model run hf.co/LiquidAI/LFM2.5-8B-A1B-GGUF:Q4_K_M

LM Studio Jan vLLMHow to use LiquidAI/LFM2.5-8B-A1B-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "LiquidAI/LFM2.5-8B-A1B-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LiquidAI/LFM2.5-8B-A1B-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/LiquidAI/LFM2.5-8B-A1B-GGUF:Q4_K_M

OllamaHow to use LiquidAI/LFM2.5-8B-A1B-GGUF with Ollama:

ollama run hf.co/LiquidAI/LFM2.5-8B-A1B-GGUF:Q4_K_M

Unsloth StudionewHow to use LiquidAI/LFM2.5-8B-A1B-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for LiquidAI/LFM2.5-8B-A1B-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for LiquidAI/LFM2.5-8B-A1B-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for LiquidAI/LFM2.5-8B-A1B-GGUF to start chatting

PinewHow to use LiquidAI/LFM2.5-8B-A1B-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf LiquidAI/LFM2.5-8B-A1B-GGUF:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "LiquidAI/LFM2.5-8B-A1B-GGUF:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes AgentnewHow to use LiquidAI/LFM2.5-8B-A1B-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf LiquidAI/LFM2.5-8B-A1B-GGUF:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default LiquidAI/LFM2.5-8B-A1B-GGUF:Q4_K_M

Run Hermes

hermes

Docker Model RunnerHow to use LiquidAI/LFM2.5-8B-A1B-GGUF with Docker Model Runner:

docker model run hf.co/LiquidAI/LFM2.5-8B-A1B-GGUF:Q4_K_M

LemonadeHow to use LiquidAI/LFM2.5-8B-A1B-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull LiquidAI/LFM2.5-8B-A1B-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.LFM2.5-8B-A1B-GGUF-Q4_K_M

List all available models

lemonade list

LiquidAI/LFM2.5-8B-A1B-GGUF

LiquidAI/LFM2.5-8B-A1B-GGUF · Hugging Face

Install from brew

Install from WinGet (Windows)

Use pre-built binary

Build from source code

Use Docker

Install from pip and serve model

Use Docker

Install Unsloth Studio (macOS, Linux, WSL)

Install Unsloth Studio (Windows)

Using HuggingFace Spaces for Unsloth

Start the llama.cpp server

Configure the model in Pi

Run Pi

Start the llama.cpp server

Configure Hermes

Run Hermes

Pull the model

Run and chat with the model

List all available models

Similar Articles

LiquidAI/LFM2.5-230M

Liquid AI releases LFM2.5-8B-A1B

@liquidai: Introducing LFM2.5-230M: our smallest model yet, built to run fast anywhere (CPUs, NPUs, and GPUs) to enable agentic ta…

When you don't have a data center GPU

@noctus91: I recently switched from Qwen 3.5 9B to LFM2.5-8B-A1B by @liquidai, and it's quickly become my default local model in H…

Submit Feedback

Similar Articles

Liquid AI releases LFM2.5-8B-A1B

@liquidai: Introducing LFM2.5-230M: our smallest model yet, built to run fast anywhere (CPUs, NPUs, and GPUs) to enable agentic ta…

When you don't have a data center GPU
LiquidAI releases LFM2.5-230M, a 230M parameter language model designed to run on limited hardware, with support for transformers, vLLM, and SGLang.

@noctus91: I recently switched from Qwen 3.5 9B to LFM2.5-8B-A1B by @liquidai, and it's quickly become my default local model in H…