efficient

Tag

Cards List
#efficient

You don't need a GPU to run gemma-4-26B-A4B

Reddit r/LocalLLaMA · yesterday

The author demonstrates that the Gemma-4-26B-A4B model runs efficiently on a CPU-only system using Koboldcpp, achieving 7 tokens per second on an old desktop, suggesting that powerful GPUs may not be necessary for local LLM inference.

0 favorites 0 likes
#efficient

@_philschmid: We just launched a Gemma 4 12B! Our first mid-sized model with native audio inputs. Gemma 4 12 B is a unified, encoder-…

X AI KOLs Following · 5d ago Cached

We just launched Gemma 4 12B, a mid-sized multimodal model with native audio inputs, requiring only 16GB memory and released under Apache 2.0.

0 favorites 0 likes
#efficient

WeCon: An Efficient Weight-Conditioned Neural Solver for Multi-Objective Combinatorial Optimization Problems

arXiv cs.LG · 2026-05-25 Cached

Presents WeCon, a weight-conditioned neural solver for multi-objective combinatorial optimization problems that achieves comparable hypervolume to the state-of-the-art while reducing inference time by 40%.

0 favorites 0 likes
#efficient

@FeitengLi: OpenBMB open-sources MiniCPM-V 4.6, 1.3B parameters (SigLIP2-400M + Qwen3.5-0.8B), 262k context, visual encoding FLOPs 50%+ less than previous generation. Token cost for the same task is lower than Qwen3.5-0…

X AI KOLs Timeline · 2026-05-16 Cached

OpenBMB releases MiniCPM-V 4.6, a 1.3B-parameter multimodal LLM with 262k context and significantly reduced visual encoding FLOPs, achieving strong benchmark performance and broad inference framework support.

0 favorites 0 likes
#efficient

Δ-Mem: Efficient Online Memory for Large Language Models

Hacker News Top · 2026-05-16 Cached

Proposes delta-Mem, a lightweight online memory mechanism that uses a compact state matrix updated by delta-rule learning to improve long-context performance of frozen LLMs without full fine-tuning or context extension.

0 favorites 0 likes
#efficient

@songhan_mit: Explore SANA World Model, using hybrid linear attention, efficient and fast!

X AI KOLs Following · 2026-05-15

SANA World Model is a new AI model that uses hybrid linear attention for efficiency and speed.

0 favorites 0 likes
#efficient

SANA-WM: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer

Hugging Face Daily Papers · 2026-05-14 Cached

SANA-WM is a 2.6B-parameter open-source world model that generates high-fidelity 720p minute-scale videos with precise camera control, achieving industrial-level quality while significantly reducing computational requirements.

0 favorites 0 likes
#efficient

@dair_ai: // δ-mem: Efficient Online Memory for LLMs // One of the more elegant memory mechanisms I've seen this month. Most long…

X AI KOLs Following · 2026-05-13 Cached

The paper introduces δ-mem, a lightweight online memory mechanism that augments frozen LLMs with a compact associative memory state updated by delta-rule learning, achieving significant improvements on memory-heavy benchmarks without fine-tuning or context extension.

0 favorites 0 likes
#efficient

@eliebakouch: very nice release by @OpenAI! a 50M active, 1.5B total gpt-oss arch MoE, to filter private information from trillion sc…

X AI KOLs Following · 2026-04-22

OpenAI released a 1.5B-parameter MoE model with only 50M active parameters that can filter private data from trillion-token datasets while maintaining 128k context length.

0 favorites 0 likes
#efficient

Building a Fast Multilingual OCR Model with Synthetic Data

Hugging Face Blog · 2026-04-17 Cached

NVIDIA introduces Nemotron OCR v2, a fast multilingual OCR model built using synthetic data generation. The model achieves 34.7 pages/second on a single A100 GPU by using a unified FOTS-based architecture with feature reuse across detection, recognition, and relational components.

0 favorites 0 likes
← Back to home

Submit Feedback