training-methodology

#training-methodology

Granite 4.1 LLMs: How They’re Built

Hugging Face Blog ↗ · 2026-04-29 Cached

This article details the technical architecture and training pipeline of IBM's Granite 4.1 LLMs, covering pre-training, SFT, and RL stages. It highlights that the 8B dense model outperforms larger MoE counterparts and notes the release under Apache 2.0 license.

0 favorites 0 likes

#training-methodology

SPS: Steering Probability Squeezing for Better Exploration in Reinforcement Learning for Large Language Models

arXiv cs.CL ↗ · 2026-04-21 Cached

Researchers propose SPS (Steering Probability Squeezing), a training paradigm combining reinforcement learning with inverse reinforcement learning to address probability squeezing in LLM reasoning training, where probability mass concentrates too narrowly on high-reward trajectories, limiting exploration and multi-sample performance (Pass@k). Experiments on five reasoning benchmarks demonstrate improved exploration and Pass@k metrics.

0 favorites 0 likes

training-methodology

Granite 4.1 LLMs: How They’re Built

SPS: Steering Probability Squeezing for Better Exploration in Reinforcement Learning for Large Language Models

Submit Feedback