NemoStation/Marlin-2B
Summary
NemoStation/Marlin-2B is a fine-tuned model based on Qwen3.5-2B for video-text-to-text tasks, supporting video captioning and temporal grounding.
Similar Articles
@HappyyPablo: open sourcing Marlin-2B a tiny VLM to extract structured information from videos Marlin is finetuned for two questions …
Open-sourcing Marlin-2B, a tiny VLM for extracting structured information from videos, fine-tuned to answer 'what is happening and when'. Best open model in its weight class, competitive with Gemini-2.5-flash.
Motif-Video 2B: Technical Report
Motif-Video 2B is a 2B parameter text-to-video generation model that achieves 83.76% on VBench, surpassing Wan2.1 14B while using 7x fewer parameters and trained on fewer than 10M clips with less than 100,000 H200 GPU hours. The model uses a specialized architecture with shared cross-attention and a three-part backbone to separate prompt alignment, temporal consistency, and detail refinement.
nvidia/nemotron-3.5-asr-streaming-0.6b
NVIDIA releases Nemotron 3.5 ASR, a 600M parameter multilingual streaming speech recognition model supporting 40 language-locales with a Cache-Aware FastConformer-RNNT architecture for low-latency transcription. The model supports configurable chunk sizes and is ready for commercial use under the OpenMDW-1.1 license.
nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-BF16
NVIDIA releases Nemotron 3 Nano Omni, a 30B parameter multimodal model capable of processing video, audio, image, and text with integrated reasoning capabilities for enterprise workflows.
Mellum 2 12B A2.5B
JetBrains released Mellum 2 12B A2.5B, a coding-focused small MoE model with reasoning performance comparable to Qwen 3.5 9B but weaker in other tasks.