Granite 4.1 LLMs: How They’re Built

Hugging Face Blog 04/29/26, 03:01 PM Models

llm open-source ibm-granite training-methodology reinforcement-learning apache-2-0

Summary

This article details the technical architecture and training pipeline of IBM's Granite 4.1 LLMs, covering pre-training, SFT, and RL stages. It highlights that the 8B dense model outperforms larger MoE counterparts and notes the release under Apache 2.0 license.

No content available

Original Article Export to Word Export to PDF

View Cached Full Text

Cached at: 05/08/26, 08:59 AM

Granite 4.1 LLMs: How They’re Built

Source: https://huggingface.co/blog/ibm-granite/granite-4-1 Back to Articles

An in-depth technical walkthrough of data engineering, pre-training, supervised fine-tuning, and reinforcement learning behind the Granite 4.1 LLMs.

**Authors:**Granite Team, IBM

TL;DR— Granite 4.1 is a family of dense, decoder‑only LLMs (3B, 8B, and 30B) trained on ~15T tokens using a multi‑stage pre‑training pipeline, including long‑context extension of up to 512K tokens. The models are further refined with supervised fine‑tuning on ~4.1M high‑quality curated samples and reinforcement learning via on‑policy GRPO with DAPO loss (Yu et al., 2025). Notably, the 8B instruct model matches or surpasses the previous Granite 4.0‑H‑Small (32B‑A9B MoE) despite using a simpler dense architecture with fewer parameters. All Granite 4.1 models are released under the Apache 2.0 license.

Links:

https://huggingface.co/blog/ibm-granite/granite-4-1#overviewOverview

Building high‑quality small language models goes beyond simply scaling compute—it requires rigorous data curation throughout training. For Granite 4.1, we prioritized data quality over quantity, progressively refining the data mixture across five pre‑training stages. We further curated supervised fine‑tuning data using an LLM‑as‑Judge framework and applied a multi‑stage reinforcement learning pipeline to systematically strengthen performance in math, coding, instruction following, and general chat.

https://huggingface.co/blog/ibm-granite/granite-4-1#model-architectureModel Architecture

Granite 4.1 models use a decoder-only dense transformer architecture. The core design choices includeGrouped Query Attention (GQA),Rotary Position Embeddings (RoPE),SwiGLU activations,RMSNorm, andshared input/output embeddings.

Component3B Dense8B Dense30B DenseEmbedding size256040964096Number of layers404064Attention head size64128128Number of attention heads403232Number of KV heads888MLP hidden size81921280032768MLP activationSwiGLUSwiGLUSwiGLUPosition embeddingRoPERoPERoPE All three model sizes share the same training pipeline and data strategy, differing only in architecture dimensions.

https://huggingface.co/blog/ibm-granite/granite-4-1#pre-trainingPre-Training

Granite 4.1 is trained from scratch on approximately 15 trillion tokens using a five‑phase training strategy. Phases 1–2 focus on foundational pre‑training, phases 3–4 perform mid‑training with progressively higher‑quality data annealing, and phase 5 introduces long‑context training, extending the context window to 512K tokens. Each phase employs a distinct data mixture and learning‑rate schedule, gradually shifting from broad web‑scale data to more curated, domain‑specific content.

***Figure 2:*The five-phase pre-training pipeline. Phases 1–2 are pre-training, Phases 3–4 are mid-training (high-quality data annealing), and Phase 5 is long context training (LCE).

https://huggingface.co/blog/ibm-granite/granite-4-1#phase-1-general-pre-training-10t-tokensPhase 1: General Pre-Training (10T tokens)

The first phase establishes broad language understanding using a general mixture of training data with a power learning rate schedule and warmup.

Data composition:

CommonCrawl~59% — general web data
Code~20% — programming languages and repositories
Math~7% — mathematical reasoning data
Technical~10.5% — scientific papers, technical documentation and manuals
Multilingual~2% — non-English language data
Domain Specific~1.5% — domain-specific content

https://huggingface.co/blog/ibm-granite/granite-4-1#phase-2-mathcode-pre-training-2t-tokensPhase 2: Math/Code Pre-Training (2T tokens)

Phase 2 sharply increases the proportion of code and mathematical data, pivoting toward stronger reasoning capabilities while still maintaining general language coverage.

Data composition:

Math~35% — a 5x increase over Phase 1
Code~30% — a 1.5x increase
CommonCrawl-HQ~12% — high-quality common crawl subset
Synthetic~9% — synthetic high-quality data
Technical~10%
Multilingual~3%
Domain~1%

https://huggingface.co/blog/ibm-granite/granite-4-1#phase-3-high-quality-data-annealing-2t-tokensPhase 3: High-Quality Data Annealing (2T tokens)

Phase 3 transitions intomid-trainingwith a more balanced, high-quality mixture and an exponential decay learning rate schedule. This is where we start blending in chain-of-thought and synthetic instruction data.

Data composition:

CommonCrawl-HQ~16.67%
Math~16.67%
Code~16.67%
Synthetic~8.5%
Technical~12.5%
Multilingual~4.5%
Long Chain-of-Thought~12.5% — reasoning trajectories
Language Instructions~7.5% — instruction tuning data
Code Instructions~4.5% — instruction tuning data

https://huggingface.co/blog/ibm-granite/granite-4-1#phase-4-high-quality-data-annealing–refinement-05t-tokensPhase 4: High-Quality Data Annealing — Refinement (0.5T tokens)

The fourth phase continues mid-training with a linear learning rate decay to zero, focusing the model on the highest-quality data available.

Data composition:

CommonCrawl-HQ~40%
Code~20%
Math~20%
Long Chain-of-Thought~6%
Code Instructions~5%
Language Instructions~9%

***Figure 3:*How the data mix evolves across the pre-training phases. Notice the progressive shift from web-heavy (Phase 1) to quality-heavy with instruction and reasoning data (Phases 3–4).

https://huggingface.co/blog/ibm-granite/granite-4-1#phase-5-long-context-training-lcePhase 5: Long Context Training (LCE)

The fifth and final phase also part of of mid-training extends the context window from4Kto512Kthrough a staged long-context extension process:

32K extension— using the same data mix as Phase 4
128K extension— same data mix as Phase 4
512K extension— 80% books + 20% code repository data (8b and 30b only)

The LCE phase uses an exponential learning rate schedule starting at1e\-4and decaying to0. To ensure the model natively handles long sequences without degrading short-context performance, we do a model merge after each LCE stage. RULER benchmark of base models:

Model name32K64K128Kgranite-4.1-3b-base75.066.658.0granite-4.1-8b-base83.679.173.0granite-4.1-30b-base85.284.676.7

https://huggingface.co/blog/ibm-granite/granite-4-1#sft-data-preparation–quality-controlSFT: Data Preparation & Quality Control

Supervised fine‑tuning (SFT) is what turns the base model into a reliable instruction‑following assistant, making data quality critically important—since even a small number of incorrect or hallucinated samples can instill undesirable behaviors. To address this, we apply a rigorous LLM‑as‑Judge framework alongside rule‑based filtering to curate high-quality samples. Together, the pipeline automatically assess each sample against structural, semantic, and behavioral criteria, fixing issues when possible and filtering out samples that fail to meet our quality standards.

***Figure 4:*The SFT data quality pipeline. Raw conversation data passes through an LLM-as-Judge with a multi-dimensional rubric, producing accept/borderline/reject verdicts. Hard-reject defects (hallucination, false premise, incorrect computation) trigger automatic rejection regardless of score.

Our rigorous LLM‑as‑Judge framework evaluates only assistant responses, treating system prompts, user inputs, retrieved documents, and tool outputs strictly as contextual information. This ensures that the judge assesses what the model says, rather than what it was asked to do. In RAG settings, responses that are not grounded in the retrieved context are flagged as hallucinations, while tool‑use outputs are validated against the set of allowed tools and their parameter schemas.

We employ specialized judge prompts tailored to different SFT data types, including multi‑turn dialogue, RAG‑augmented responses, tool‑calling interactions, and multilingual conversations. Each response is scored across six weighted dimensions—instruction following, correctness, completeness, conciseness, naturalness, and calibration (with optional critical‑thinking checks). Samples are accepted, flagged as borderline, or rejected based on deterministic score thresholds, with hard‑reject rules overriding scores for severe defects such as hallucinations, false premises, or incorrect computations.

To complement semantic evaluation, we apply a deterministic rule‑based pipeline that enforces structural integrity through text normalization, truncation and length filtering, schema validation, and leakage detection. A final global deduplication step ensures dataset‑wide uniqueness. All filtering and correction actions are fully auditable.

https://huggingface.co/blog/ibm-granite/granite-4-1#sft-training-detailsSFT Training Details

After passing through the LLM-as-Judge, rule-based filtering, and global deduplication pipeline, we fine-tune base models on these approximately4.1 millionhigh-quality samples. The following details apply to all three model variants:

Training Configuration:

ParameterValueCompute16 nodes, 4x GB200 per nodeEpochs3Learning rate5e-6 (linear warmup 3%, linear decay over ~25K steps)Sequence length16,384 tokensTotal samples~4.1MEffective batch size256 samples/iter (~4.2M tokens/iter)

https://huggingface.co/blog/ibm-granite/granite-4-1#reinforcement-learning-multi-stage-rl-pipelineReinforcement Learning: Multi-Stage RL Pipeline

After SFT, we apply a multi-stage reinforcement learning pipeline to further improve the model’s capabilities across specific domains. Rather than a single RL pass, we runmultiple targeted RL stages, each optimizing for different capabilities.

https://huggingface.co/blog/ibm-granite/granite-4-1#training-methodologyTraining Methodology

We useOn-policy GRPO (Group Relative Policy Optimization)(Shao et al., 2024) withDAPO (Decoupled Clip and Dynamic sAmpling Policy Optimization) loss(Yu et al., 2025) which provides more stable training signals compared to standard GRPO. However, due to computationally intensive nature of dynamic sampling, we switch it off during our training runs.

https://huggingface.co/blog/ibm-granite/granite-4-1#rl-training-configurationRL training configuration

ParameterValueAlgorithmOn-policy GRPO with DAPO lossTraining stackSkyRL (NovaSky-AI, 2025)Samples per prompt16Train batch size1024Context length8,192

https://huggingface.co/blog/ibm-granite/granite-4-1#rl-pipelineRL Pipeline

Figure 10depicts our Reinforcement Learning pipeline for training Granite 4.1 models. Through extensive experimentation with a variety of reinforcement learning recipes, we found that this sequence of steps minimizes catastrophic forgetting while simultaneously maximizing performance across multiple domains.

***Figure 10:*The Granite 4.1 reinforcement learning pipeline consisting of four sequential stages: Multi-domain RL, RLHF, Identity and Knowledge-calibration RL, and Math RL.

https://huggingface.co/blog/ibm-granite/granite-4-1#multi-domain-rlMulti-domain RL

In this stage, the model is trained jointly on a unified mixture of data drawn from multiple domains. Every gradient update therefore reflects the full diversity of tasks, which prevents catastrophic forgetting, boosts overall benchmark performance, and minimizes regressions on any individual task.

The different domains covered in this stage include:

DomainDescriptionMathMathematical reasoning and computationScienceScientific knowledge and reasoningLogical ReasoningDeductive and inductive logicInstruction Following (IF)Adherence to complex instructionsStructured OutputStructured data outputText2SQLDatabase query generationTemporal ReasoningTime-based logic and orderingGeneral ChatGeneral conversational qualityIn-context LearningLearning from in-context examples During this stage, we trained the models on 45,504 unique prompts (averaged across all Granite 4.1 models) and found that a learning rate of5e‑7with a KL‑loss coefficient ( $\\beta$ ) of0\.05performed best for multi‑domain reinforcement learning.

https://huggingface.co/blog/ibm-granite/granite-4-1#rlhfRLHF

To further improve the model’s helpfulness and chat ability, we train our model on generic-chat prompts using a multilingual scalar reward model. With this stage, we observed an average improvement of**~18.9 points**(averaged across the three Granite 4.1 models) in Alpaca-Eval compared to the SFT checkpoints.

To mitigate policy drift from its previously learned knowledge, we use a conservative learning rate of3e\-7and higher KL-loss coeff $\\beta$ of0\.09in this stage. We use an average of 17,920 unique prompts in this RLHF stage.

https://huggingface.co/blog/ibm-granite/granite-4-1#identity–knowledge-calibration-rlIdentity & Knowledge-Calibration RL

In this stage, we train the model for a few steps (~40 training steps) on identity and knowledge calibration prompts. We observed that this small training stage significantly improves the model’s self-identification capabilities.

Similar to the RLHF stage, we used a learning rate of3e\-7and KL-loss coeff $\\beta$ of0\.09, and we use 1728 unique prompts in this stage.

https://huggingface.co/blog/ibm-granite/granite-4-1#math-rlMath RL

During our RL training, we found that the RLHF stage causes a drop in math benchmark scores (e.g., in GSM8K, DeepMind-Math). The Math RL stage enables the model to recover from this drop and surpasses the original SFT performance on math benchmarks:~3.8 pointson average for GSM8K, and**~23.48 points**on average for DeepMind-Math. We use an average of 13,504 unique prompts in this stage and similar to the multi-domain RL stage, we used a learning rate of5e\-7and KL-loss coeff $\\beta$ of0\.05.

https://huggingface.co/blog/ibm-granite/granite-4-1#resultsResults

https://huggingface.co/blog/ibm-granite/granite-4-1#base-model-benchmarksBase Model Benchmarks

BenchmarkMetric3B8B30BGeneral TasksMMLU5-shot66.4773.6078.44MMLU-Pro5-shot, CoT37.1644.5849.51BBH3-shot, CoT63.8473.8380.66AGI EVAL3-shot54.3261.6869.20DROP5-shot66.0472.3678.57Math TasksGSM8K8-shot72.9373.5483.78Minerva Math4-shot38.0043.4245.66Code TasksHumanEvalpass@1 (StarCoder)[email protected][email protected]+ Avg65.9462.0563.90Multilingual TasksMMMLU5-shot56.5964.7373.36INCLUDE5-shot51.7757.6067.07MGSM8-shot58.4863.6874.40

https://huggingface.co/blog/ibm-granite/granite-4-1#instruct-model-benchmarksInstruct Model Benchmarks

BenchmarkMetric3B8B30BGeneral TasksMMLU5-shot67.0273.8480.16MMLU-Pro5-shot, CoT49.8355.9964.09BBH3-shot, CoT75.8380.5183.74AGI EVAL0-shot, CoT65.1672.4377.80GPQA0-shot, CoT31.7041.9645.76SimpleQA3.684.826.81Alignment TasksAlpacaEval 2.038.5750.0856.16IFEval Avg82.3087.0689.65ArenaHard37.8068.9871.02MTBench Avg7.538.508.53Math TasksGSM8K8-shot86.8892.4994.16GSM Symbolic8-shot81.3283.7075.70Minerva Math0-shot, CoT67.9480.1081.32DeepMind Math0-shot, CoT64.6480.0781.93Code Tasks[email protected][email protected]@[email protected]@[email protected]@152.5460.2662.31Eval+ [email protected]Tool CallingBFCL v360.8068.2773.68Multilingual TasksMMMLU5-shot57.6164.8473.71INCLUDE5-shot52.0558.8967.26MGSM8-shot70.0082.3271.12SafetySALAD-Bench93.9595.8096.41AttaQ81.8881.1985.76Tulu3 Safety Eval Avg66.8475.5778.19 **Supported languages:**English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese.

https://huggingface.co/blog/ibm-granite/granite-4-1#granite41-comparison-with-leading-opensource-modelsGranite 4.1 Comparison with Leading Open‑Source Models

Granite 4.1 delivers competitive instruction‑following and tool‑calling capabilities without relying on long chains of thought. By avoiding extended reasoning traces, it provides predictable latency, stable token usage, and lower operational cost. This makes Granite 4.1 a production‑ready, open‑source choice for enterprise workloads where efficiency, reliability, and cost control are critical.

https://huggingface.co/blog/ibm-granite/granite-4-1#granite-41-8b-vs-granite-40-h-small-32b-a9bGranite 4.1-8B vs. Granite 4.0-H-Small (32B-A9B)

A striking result: the Granite 4.1-8B dense modelconsistently matches or outperformsthe previous-generation Granite 4.0-H-Small, a 32B-parameter Mixture-of-Experts model with 9B active parameters.

***Figure 13:*Granite 4.1-8B (dark blue) vs. Granite 4.0-H-Small 32B-A9B (light blue) across benchmarks. The 8B dense model matches or exceeds the larger MoE model on IFEval, AlpacaEval, MMLU-Pro, BBH, GSM8K, DeepMind-Math, Evalplus, ArenaHard, BFCL V3 and MBPP(+).

https://huggingface.co/blog/ibm-granite/granite-4-1#granite-41-model-family-comparisonGranite 4.1 Model Family Comparison

***Figure 14:*Comparison across the Granite 4.1 family — 30B, 8B, and 3B models. Scores scale predictably with model size, with the 30B model leading across all benchmarks.

https://huggingface.co/blog/ibm-granite/granite-4-1#fp8-quantizationFP8 Quantization

We also released fp8 quantized variants of the Granite 4.1 models, optimized for inference with vLLM. The precision is reduced from 16‑bit to 8‑bit, resulting in approximately a 50% reduction in both disk footprint and GPU memory usage. Quantization is applied only to the weights and activations of linear operators within the transformer blocks using LLM Compressor, while all other layers are preserved at their original precision.

https://huggingface.co/blog/ibm-granite/granite-4-1#infrastructureInfrastructure

We trained the Granite 4.1 Language Models on anNVIDIA GB200 NVL72 clusterhosted on CoreWeave:

**Intra-rack communication:**72-GPU NVLink domain
**Inter-rack communication:**Non-blocking, full Fat-Tree NDR 400 Gb/s InfiniBand network
**Scale:**Thousands of GPUs across the cluster

This infrastructure provides the scalable, high-bandwidth interconnect needed for efficient distributed training at the token volumes required (15T+ tokens across pre-training alone).

https://huggingface.co/blog/ibm-granite/granite-4-1#getting-startedGetting Started

Granite 4.1 models are available under theApache 2.0 license. Here’s how to get started with the 30B instruct model wiht tool calling example:

pip install torch torchvision torchaudio
pip install accelerate
pip install transformers

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

device = "cuda"
model_path = "ibm-granite/granite-4.1-30b"
tokenizer = AutoTokenizer.from_pretrained(model_path)
# drop device_map if running on CPU
model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
model.eval()

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather for a specified city.",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "Name of the city"
                    }
                },
                "required": ["city"]
            }
        }
    }
]

# change input text as desired
chat = [
    { "role": "user", "content": "What's the weather like in London right now?" },
]
chat = tokenizer.apply_chat_template(chat, \
                                     tokenize=False, \
                                     tools=tools, \
                                     add_generation_prompt=True)
# tokenize the text
input_tokens = tokenizer(chat, return_tensors="pt").to(device)
# generate output tokens
output = model.generate(**input_tokens, 
                        max_new_tokens=100)
# decode output tokens into text
output = tokenizer.batch_decode(output)
# print output
print(output[0])

Expected Output:

<|start_of_role|>system<|end_of_role|>You are a helpful assistant with access to the following tools. You may call one or more tools to assist with the user query.

You are provided with function signatures within <tools></tools> XML tags:
<tools>
{"type": "function", "function": {"name": "get_current_weather", "description": "Get the current weather for a specified city.", "parameters": {"type": "object", "properties": {"city": {"type": "string", "description": "Name of the city"}}, "required": ["city"]}}}
</tools>

For each tool call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
<tool_call>
{"name": <function-name>, "arguments": <args-json-object>}
</tool_call>. If a tool does not exist in the provided list of tools, notify the user that you do not have the ability to fulfill the request.<|end_of_text|>
<|start_of_role|>user<|end_of_role|>What's the weather like in London right now?<|end_of_text|>
<|start_of_role|>assistant<|end_of_role|><tool_call>
{"name": "get_current_weather", "arguments": {"city": "London"}}
</tool_call><|end_of_text|>

Resources:

Granite 4.1 marks a significant step forward for high‑quality, open‑source language models. By prioritizing data quality and rigor at every stage—from pre‑training curation to supervised fine‑tuning and multi‑stage reinforcement learning—we deliver a substantially improved post‑training pipeline. The result is stronger instruction following, tool use, and conversational performance, showing that carefully trained dense 8B models can rival much larger MoE architectures. We’re excited to see how the community adopts and builds on these models.

Granite 4.1 LLMs: How They’re Built

Granite 4.1 LLMs: How They’re Built

https://huggingface.co/blog/ibm-granite/granite-4-1#overviewOverview

https://huggingface.co/blog/ibm-granite/granite-4-1#model-architectureModel Architecture

https://huggingface.co/blog/ibm-granite/granite-4-1#pre-trainingPre-Training

https://huggingface.co/blog/ibm-granite/granite-4-1#phase-1-general-pre-training-10t-tokensPhase 1: General Pre-Training (10T tokens)

https://huggingface.co/blog/ibm-granite/granite-4-1#phase-2-mathcode-pre-training-2t-tokensPhase 2: Math/Code Pre-Training (2T tokens)

https://huggingface.co/blog/ibm-granite/granite-4-1#phase-3-high-quality-data-annealing-2t-tokensPhase 3: High-Quality Data Annealing (2T tokens)

https://huggingface.co/blog/ibm-granite/granite-4-1#phase-4-high-quality-data-annealing–refinement-05t-tokensPhase 4: High-Quality Data Annealing — Refinement (0.5T tokens)

https://huggingface.co/blog/ibm-granite/granite-4-1#phase-5-long-context-training-lcePhase 5: Long Context Training (LCE)

https://huggingface.co/blog/ibm-granite/granite-4-1#sft-data-preparation–quality-controlSFT: Data Preparation & Quality Control

https://huggingface.co/blog/ibm-granite/granite-4-1#sft-training-detailsSFT Training Details

https://huggingface.co/blog/ibm-granite/granite-4-1#reinforcement-learning-multi-stage-rl-pipelineReinforcement Learning: Multi-Stage RL Pipeline

https://huggingface.co/blog/ibm-granite/granite-4-1#training-methodologyTraining Methodology

https://huggingface.co/blog/ibm-granite/granite-4-1#rl-training-configurationRL training configuration

https://huggingface.co/blog/ibm-granite/granite-4-1#rl-pipelineRL Pipeline

https://huggingface.co/blog/ibm-granite/granite-4-1#multi-domain-rlMulti-domain RL

https://huggingface.co/blog/ibm-granite/granite-4-1#rlhfRLHF

https://huggingface.co/blog/ibm-granite/granite-4-1#identity–knowledge-calibration-rlIdentity & Knowledge-Calibration RL

https://huggingface.co/blog/ibm-granite/granite-4-1#math-rlMath RL

https://huggingface.co/blog/ibm-granite/granite-4-1#resultsResults

https://huggingface.co/blog/ibm-granite/granite-4-1#base-model-benchmarksBase Model Benchmarks

https://huggingface.co/blog/ibm-granite/granite-4-1#instruct-model-benchmarksInstruct Model Benchmarks

https://huggingface.co/blog/ibm-granite/granite-4-1#granite41-comparison-with-leading-opensource-modelsGranite 4.1 Comparison with Leading Open‑Source Models

https://huggingface.co/blog/ibm-granite/granite-4-1#granite-41-8b-vs-granite-40-h-small-32b-a9bGranite 4.1-8B vs. Granite 4.0-H-Small (32B-A9B)

https://huggingface.co/blog/ibm-granite/granite-4-1#granite-41-model-family-comparisonGranite 4.1 Model Family Comparison

https://huggingface.co/blog/ibm-granite/granite-4-1#fp8-quantizationFP8 Quantization

https://huggingface.co/blog/ibm-granite/granite-4-1#infrastructureInfrastructure

https://huggingface.co/blog/ibm-granite/granite-4-1#getting-startedGetting Started

Similar Articles

Granite 4.0 3B Vision: Compact Multimodal Intelligence for Enterprise Documents

ibm-granite/granite-4.1-8b · Hugging Face

Granite 4.1 3B SVG Pelican Gallery

MinT: Managed Infrastructure for Training and Serving Millions of LLMs

@tom_doerr: Runs 70B LLMs on single 4GB GPU https://github.com/lyogavin/airllm

Submit Feedback

Similar Articles

Granite 4.0 3B Vision: Compact Multimodal Intelligence for Enterprise Documents

ibm-granite/granite-4.1-8b · Hugging Face
IBM releases Granite-4.1-8B, an Apache 2.0 licensed 8B parameter long-context instruct model with enhanced tool-calling and multilingual support.

Granite 4.1 3B SVG Pelican Gallery

MinT: Managed Infrastructure for Training and Serving Millions of LLMs

@tom_doerr: Runs 70B LLMs on single 4GB GPU https://github.com/lyogavin/airllm