Tag
The author demonstrates that the Gemma-4-26B-A4B model runs efficiently on a CPU-only system using Koboldcpp, achieving 7 tokens per second on an old desktop, suggesting that powerful GPUs may not be necessary for local LLM inference.
We just launched Gemma 4 12B, a mid-sized multimodal model with native audio inputs, requiring only 16GB memory and released under Apache 2.0.
Presents WeCon, a weight-conditioned neural solver for multi-objective combinatorial optimization problems that achieves comparable hypervolume to the state-of-the-art while reducing inference time by 40%.
OpenBMB releases MiniCPM-V 4.6, a 1.3B-parameter multimodal LLM with 262k context and significantly reduced visual encoding FLOPs, achieving strong benchmark performance and broad inference framework support.
Proposes delta-Mem, a lightweight online memory mechanism that uses a compact state matrix updated by delta-rule learning to improve long-context performance of frozen LLMs without full fine-tuning or context extension.
SANA World Model is a new AI model that uses hybrid linear attention for efficiency and speed.
SANA-WM is a 2.6B-parameter open-source world model that generates high-fidelity 720p minute-scale videos with precise camera control, achieving industrial-level quality while significantly reducing computational requirements.
The paper introduces δ-mem, a lightweight online memory mechanism that augments frozen LLMs with a compact associative memory state updated by delta-rule learning, achieving significant improvements on memory-heavy benchmarks without fine-tuning or context extension.
OpenAI released a 1.5B-parameter MoE model with only 50M active parameters that can filter private data from trillion-token datasets while maintaining 128k context length.
NVIDIA introduces Nemotron OCR v2, a fast multilingual OCR model built using synthetic data generation. The model achieves 34.7 pages/second on a single A100 GPU by using a unified FOTS-based architecture with feature reuse across detection, recognition, and relational components.