efficient-ai

Tag

Cards List
#efficient-ai

Stratum: System-Hardware Co-Design with 3D-Stackable DRAM for Efficient Moe

Hacker News Top · 3d ago

Introduces Stratum, a system-hardware co-design approach utilizing 3D-stackable DRAM to efficiently accelerate Mixture of Experts (MoE) models.

0 favorites 0 likes
#efficient-ai

@antoine_chaffin: Reason-ModernColBERT nearly solved BrowseComp-Plus, smashing SOTA and outperforming models models 54× bigger Not bad fo…

X AI KOLs Following · 6d ago Cached

Reason-ModernColBERT achieves near-perfect results on BrowseComp-Plus, surpassing SOTA and models 54× larger, then Agent-ModernColBERT further improves with minimal training.

0 favorites 0 likes
#efficient-ai

MiniCPM-V 4.6

Product Hunt · 2026-05-12

MiniCPM-V 4.6 is an ultra-efficient 1.3B vision-language model optimized for mobile devices.

0 favorites 0 likes
#efficient-ai

@ickma2311: Efficient AI Lecture 12: Transformer and LLM This lecture is not only about how LLMs work. It also explains the buildin…

X AI KOLs Timeline · 2026-05-09 Cached

Lecture notes from an Efficient AI course covering Transformer and LLM fundamentals, including multi-head attention, positional encoding, KV cache, and the connection between model architecture and inference efficiency. The content explains how design choices in transformers affect memory, latency, and hardware efficiency.

0 favorites 0 likes
#efficient-ai

11.67% ARC-AGI-2 Local Eval on a Single 4090: The TOPAS Recursive Architecture

Reddit r/LocalLLaMA · 2026-05-07

The authors present TOPAS, a recursive AI architecture achieving 11.67% on ARC-AGI-2 using a single RTX 4090, aiming to demonstrate that architectural efficiency can outweigh raw compute power.

0 favorites 0 likes
#efficient-ai

Ternary Bonsai: Top Intelligence at 1.58 Bits

Hacker News Top · 2026-04-18

A highly efficient AI model architecture using ternary weights (-1, 0, 1) that achieves competitive performance while requiring only 1.58 bits per parameter, enabling deployment on extremely constrained devices.

0 favorites 0 likes
#efficient-ai

MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe

Papers with Code Trending · 2025-09-16 Cached

MiniCPM-V 4.5 is an 8B multimodal large language model that achieves high efficiency and strong performance through a unified 3D-Resampler architecture, a novel data strategy, and a hybrid reinforcement learning approach. The model reportedly surpasses larger proprietary and open-source benchmarks while significantly reducing GPU memory usage and inference time.

0 favorites 0 likes
← Back to home

Submit Feedback