high-performance-computing

#high-performance-computing

HPC-LLM: Practical Domain Adaptation and Retrieval-Augmented Generation for HPC Support

arXiv cs.LG ↗ · 2026-05-19 Cached

This paper presents HPC-LLM, a retrieval-augmented and domain-adapted assistant for HPC workflows, fine-tuning Llama 3.1 8B with QLoRA on HPC documentation. It demonstrates performance comparable to larger general-purpose models with significantly lower resource requirements.

0 favorites 0 likes

#high-performance-computing

@zostaff: 20 years ago Jane Street's entire compute cluster was six Dell boxes stacked on the floor at the end of an office row. …

X AI KOLs Timeline ↗ · 2026-05-18 Cached

Jane Street allowed Dwarkesh Patel to tour their new Texas data center with 4,032 GPUs, each rack pulling 140 kilowatts, highlighting the massive scale and unique networking choices.

0 favorites 0 likes

#high-performance-computing

@Merocle: M5 Max cluster 72 CPU and 128 GPU cores, 512GB unified Ram Each MacBook is connected to all the others with Thunderbolt…

X AI KOLs Timeline ↗ · 2026-05-11 Cached

A user showcases a DIY cluster of M5 Max MacBooks connected via Thunderbolt 5, highlighting the aggregate compute power and connectivity challenges.

0 favorites 0 likes

#high-performance-computing

Sparse Cholesky Elimination Tree

Hacker News Top ↗ · 2026-05-10 Cached

The article derives the column elimination tree for the right-looking sparse Cholesky algorithm, explaining how it predicts fill-in and task dependencies without performing dense factorization.

0 favorites 0 likes

#high-performance-computing

@vivekgalatage: Roadmap from Cornell - Introduction to CUDA http://cvw.cac.cornell.edu/cuda-intro

X AI KOLs Timeline ↗ · 2026-05-07 Cached

This article introduces the Cornell Virtual Workshop's free online tutorial on basic CUDA programming using C, covering prerequisites and additional resources.

0 favorites 0 likes

#high-performance-computing

Making Julia as Fast as C++ (2019)

Hacker News Top ↗ · 2026-05-06 Cached

A 2019 blog post from FLOW Lab at BYU explores how to optimize Julia code to match C++ performance using a real-world aerodynamics application (vortex particle method) as a benchmark. The author shares lessons learned about achieving high-performance computing in Julia through type declarations, JIT compilation, and code optimization techniques.

0 favorites 0 likes

#high-performance-computing

deepseek-ai/DeepGEMM

GitHub Trending (daily) ↗ · 2026-04-21 Cached

DeepSeek releases DeepGEMM, a high-performance CUDA kernel library for LLM computation primitives including FP8/FP4/BF16 GEMMs, fused MoE with overlapped communication, and MQA scoring, compiled at runtime via JIT with no installation-time CUDA compilation required. The library achieves up to 1550 TFLOPS on H800 and matches or exceeds expert-tuned libraries across various matrix shapes.

0 favorites 0 likes

high-performance-computing

HPC-LLM: Practical Domain Adaptation and Retrieval-Augmented Generation for HPC Support

@zostaff: 20 years ago Jane Street's entire compute cluster was six Dell boxes stacked on the floor at the end of an office row. …

@Merocle: M5 Max cluster 72 CPU and 128 GPU cores, 512GB unified Ram Each MacBook is connected to all the others with Thunderbolt…

Sparse Cholesky Elimination Tree

@vivekgalatage: Roadmap from Cornell - Introduction to CUDA http://cvw.cac.cornell.edu/cuda-intro

Making Julia as Fast as C++ (2019)

deepseek-ai/DeepGEMM

Submit Feedback