Stable-Layers: Fine-Tuning Image Layer Decomposition Models with VLM-Scored Reinforcement Learning
Summary
Stable-Layers is a reinforcement learning framework that fine-tunes a pretrained image layer decomposition model using VLM feedback instead of paired supervision, employing Flow-GRPO with LoRA and a two-stage reward calibration pipeline to improve layer quality on the Crello dataset.
Similar Articles
Fine-tuning Multi-modal LLMs with ART: Art-based Reinforcement Training
ART (Art-based Reinforcement Training) enables parameter-efficient fine-tuning of frozen multimodal LLMs by optimizing raw visual input via gradient backpropagation, achieving performance comparable to LoRA while supporting pre-compiled computational graphs for high-throughput engines like vLLM.
Skip a Layer or Loop It? Learning Program-of-Layers in LLMs
This paper introduces Program-of-Layers (PoLar), a method that allows LLMs to dynamically skip or loop pretrained layers per input, improving accuracy and efficiency over fixed-depth inference.
When LLM Reward Design Fails: Diagnostic-Driven Refinement for Sparse Structured RL
This paper frames LLM-generated reward shaping for sparse structured RL as a debugging problem, identifying failure modes like reward flooding and semantic misunderstanding. The authors propose diagnostic-driven iterative refinement, achieving dramatic success rate improvements (e.g., DoorKey-8×8 from 2.3% to 97.6%) compared to one-shot generation.
@HuggingPapers: Stable-GFlowNet: Toward Diverse and Robust LLM Red-Teaming via Contrastive Trajectory Balance Naver AI eliminates unsta…
Naver AI introduces Stable-GFlowNet, a method to improve LLM red-teaming by eliminating unstable partition function estimation in Generative Flow Networks through contrastive trajectory balance.
Seeing Before Colliding: Anticipatory Safe RL with Frozen Vision-Language Models
This paper presents VLM-Safe-RL, a framework that integrates frozen vision-language models into constrained MDP Lagrangian updates to provide anticipatory cost signals for safe reinforcement learning in high-speed visual control tasks. The method outperforms standard constraint-aware baselines on Safety-Gymnasium FormulaOne L2 and generalizes to held-out environments.