Tag
The author successfully ran GLM-5.2 with MTP speculative decoding on a 4× DGX Spark (GB10) setup, revealing a missing component in the public build recipe.
Nvidia's AI chips are selling at record high prices in China due to US export restrictions, while the company also announced a new liquid-cooling system to reduce data center water usage.
The article discusses a sell-off in AI-related tech stocks, raising doubts about whether the massive spending on artificial intelligence will yield returns. It highlights market volatility, with major companies like Micron, Nvidia, and Alphabet experiencing significant drops.
NVIDIA and AWS announce new EC2 G7 instances with NVIDIA RTX PRO 4500 Blackwell GPUs and GPU-accelerated vector search in Amazon OpenSearch Serverless, enabling enterprises to deploy AI at production scale with improved performance and reduced operational complexity.
NVIDIA announces DFlash, an open source block diffusion model for speculative decoding that achieves up to 15x higher inference throughput on Blackwell GPUs while maintaining interactivity.
NVIDIA's new chips enable running 500B parameter models locally, highlighting that AI safety measures are merely behavioral speed bumps that vanish offline, posing unprecedented risks for deception and manipulation at scale.
Nvidia has quietly acquihired the team from Essential AI, including Transformer paper coauthor Ashish Vaswani, who was struggling to raise funds for his startup. Vaswani will work on Nvidia's Nemotron open-source models.
NVIDIA launches the BioNeMo Agent Toolkit, an open toolkit that enables AI agents to perform tasks like protein structure prediction, molecular docking, and generative chemistry, accelerating programmable biology in collaboration with Arc Institute.
Nvidia claims a 15x speedup in text generation using a diffusion model, generating entire blocks at once.
Analyzes hiring data across major AI labs to infer strategic directions, noting xAI's focus on scientific tutors, Nvidia's data center push, and OpenAI's engineering growth.
New GGUF quantizations of Qwen3.6-27B optimized for 16GB VRAM NVIDIA GPUs, including an experimental Trellis variant, with perplexity benchmarks.
Spark Doctor is an open-source diagnostic CLI for NVIDIA DGX Spark that collects system, GPU, memory, Docker, and recipe data, applies specific rules, and outputs the likely cause and next steps for common issues.
SGLang provided Day-0 support for DeepSeek-V4, and collaboration between LMSys and NVIDIA engineering teams achieved up to 5x throughput increase in production, with improvements shown on the SemiAnalysis InferenceX dashboard.
NVIDIA introduces the Agent Toolkit, an open modular foundation with models, tools, skills, and a secure runtime to help businesses build specialized, trustworthy AI agents for various industries.
Valve is working with Intel and Nvidia to expand SteamOS support to more GPUs and handhelds, with initial firmware for Intel handhelds and ongoing driver work for Nvidia.
NVIDIA quietly released Nemotron-3.5-ASR, a lightweight 0.6B parameter open-source speech recognition model designed for real-time streaming with support for 40+ languages, low latency, and cache-aware architecture.
NVIDIA technology now powers over 400 of the world's 500 fastest supercomputers (81% of the TOP500), with record GPU and networking adoption and top efficiency on the Green500 list.
NVIDIA announces new AI agents and tools for telecom operations, including synthetic data generation and secure agent runtimes, showcased at DTW Ignite 2026. The platform aims to enable autonomous networks by combining domain-specific models, privacy-safe synthetic data, and policy-based guardrails.
Discusses using NVFP4 4-bit floating point weights for maximum performance, achieved via in-house quantization from FP8 using NVIDIA ModelOpt, highlighting the data format's dual scale factors for high dynamic range.
Baseten announces the world's fastest API for the GLM-5.2 open model, achieving over 280 tokens per second via NVFP4 quantization, disaggregated inference, and other optimizations.