@SergioPaniego: you can now train @liquidai's LFM2-VL in TRL GRPO and RLOO included, with an example script
Summary
You can now train Liquid AI's LFM2-VL model using TRL's GRPO and RLOO methods, with an example script provided.
View Cached Full Text
Cached at: 06/26/26, 02:10 PM
you can now train @liquidai’s LFM2-VL in TRL
GRPO and RLOO included, with an example script https://t.co/H65pK20Q7H
Similar Articles
Liquid AI releases LFM2.5-8B-A1B
Liquid AI released LFM2.5-8B-A1B, an edge model with a 128K context window, 38T tokens of pre-training, and large-scale reinforcement learning, capable of tool calling and complex tasks while fitting on an entry-level laptop.
@SergioPaniego: https://x.com/SergioPaniego/status/2067270222671741360
OpenReward environments now integrate directly into TRL's GRPOTrainer via a single OpenRewardSpec, allowing zero-glue-code training against a catalog of RL environments. The integration is experimental and part of a broader effort to make environment and agent RL first-class in TRL.
@didier_lopes: Incredible how Z. ai literally has their RL infrastructure open source. The entire OPD post-training of GLM-5.2 took on…
Z. ai has open-sourced its RL infrastructure, the slime framework, which enabled efficient OPD post-training of GLM-5.2 in about two days. slime is an LLM post-training framework for RL scaling that integrates Megatron and SGLang, and has been battle-tested by frontier models like GLM, Qwen, DeepSeek, and Llama.
When you don't have a data center GPU
LiquidAI releases LFM2.5-230M, a 230M parameter language model designed to run on limited hardware, with support for transformers, vLLM, and SGLang.
Developing open source LLM from ground up from pretrain - rlhf(PPO/GRPO)
A developer shares progress on training a 7B parameter open source LLM from scratch using a DeepSeek architecture optimized for low VRAM, with the goal of democratizing AI development and eventually surpassing large proprietary models.