Tag
A half-day tutorial at ISC High Performance 2026 on using compiler-assisted tools (FPChecker/LLVM) for floating-point error analysis and profiling in C/C++ scientific codes.
Chinese supercomputer LineShine has claimed the top spot in the global supercomputer rankings, marking a significant achievement in high-performance computing.
The LineShine supercomputer in Shenzhen, China, claims the number 1 spot on the TOP500 list with 2.198 Exaflops of sustained FP64 performance, powered by a custom Armv9 CPU with 13 million cores. It also leads the HPCG benchmark, surpassing El Capitan.
China has built the world's fastest supercomputer, LineShine, overtaking the US system El Capitan in the TOP500 ranking. The system uses only CPUs and entirely Chinese hardware/software, demonstrating technological self-sufficiency despite US export restrictions.
China's LineShine supercomputer becomes the world's fastest, displacing the US's El Capitan for the first time since 2017, marking a significant shift in high-performance computing rankings.
China has surpassed the US with the world's fastest supercomputer, though the machine is not optimized for AI workloads.
NVIDIA technology now powers over 400 of the world's 500 fastest supercomputers (81% of the TOP500), with record GPU and networking adoption and top efficiency on the Green500 list.
NVIDIA announces its Vera CPU will power new supercomputers at Los Alamos National Laboratory, delivering significant performance improvements for agentic AI simulations and scientific workloads.
A discussion on whether foundational AI research can be done without access to high-performance computing, given that early work like 'Attention is all you need' used consumer GPUs.
This paper argues that using FP8 tensor cores with Ozaki Scheme II can replace native FP64 hardware for high-performance scientific computing on AI-optimized GPUs like NVIDIA's B300, achieving full double-precision accuracy at much higher throughput. The authors present a Tensor-Memory Equilibrium model and show that emulated FP64 performance can exceed native FP64 by orders of magnitude across all workloads.
Expanse is a startup that improves GPU/HPC cluster utilization by predicting job resource needs and providing optimizations, addressing the common problem of over-requesting resources that leads to 30-40% effective utilization.
EngiAI introduces a multi-agent framework and benchmark suite for LLM-driven engineering design, evaluating workflow, RAG, and HPC dimensions. Proprietary models achieve 96-97% task completion on Beams2D, while conditional branching remains challenging with 20-53% for Photonics2D.
This paper presents HPC-LLM, a retrieval-augmented and domain-adapted assistant for HPC workflows, fine-tuning Llama 3.1 8B with QLoRA on HPC documentation. It demonstrates performance comparable to larger general-purpose models with significantly lower resource requirements.
This paper introduces the Generative Quantum-inspired Kolmogorov-Arnold Eigensolver (GQKAE), a parameter-efficient architecture that replaces traditional neural components with Kolmogorov-Arnold modules to significantly reduce memory usage and improve convergence in quantum chemistry simulations.
A review of the DeskPi Super4C, a 4-node Raspberry Pi CM5 cluster board, highlighting its improved remote management and redundant power/Ethernet, while noting that SBC clusters are a poor value but fun for hobbyist HPC tinkering.
An OpenAI backend engineer shares their personal journey into programming and describes their work maintaining and optimizing OpenAI's large-scale supercomputing clusters used for AI model training. The post highlights the complexity and scale of infrastructure challenges encountered at OpenAI.