Tag
An open-source tool converts PDFs, DOCX, PPTX, XLSX, EPUB, and images to markdown at 122 pages per second, supporting tables, equations, and forms on GPU, CPU, or Mac.
Hugging Face has added a new filter to its Models page that lets users filter by hardware compatibility (GPU, CPU, Apple Silicon), ensuring they only see models that can run on their machine. The filter stacks with other filters and is shareable via URL.
OpenAI co-founder Andrej Karpathy released llm.c, an open-source guide to training LLMs from scratch with simple code that runs on any hardware, including CPUs and MacBooks, and is 7% faster than standard approaches.
Discusses multi-tier caching strategies for MoE models to improve inference speed by keeping frequently activated experts on GPU, referencing existing implementations like PowerInfer and llama.cpp branches.
The author shares progress on building a CPU-only tensor library in C, covering basics like add/mul, reduce, strides, and 2D matmul, along with insights from reading Arcee's technical blogs on foundation models.
A detailed guide on building and understanding the TD4 4-bit DIY CPU kit from Aliexpress, covering soldering, schematics, and operation principles.
Bernstein research report predicts that the era of Agentic AI will drive a reversal of CPU roles in data centers, with the server CPU addressable market possibly reaching $223 billion by 2030, favoring stocks such as Haiguang Information and Arm.
A blog post explaining a counterintuitive optimization where using float division (DIVSD) instead of integer division (IDIVQ) yields faster performance on modern CPUs, with benchmarks and assembly analysis.
NVIDIA introduces the Vera CPU with a neural branch predictor to accelerate agentic AI and reinforcement learning workloads by reducing CPU execution time and increasing throughput in AI factories.
The release of PyLate introduces MaxSim kernels for GPU-accelerated training with lower memory requirements and TACHIOM for fast multi-vector indexing and search on CPU.
TorchCodec 0.14 adds HDR video decoding for CPU and CUDA, along with a fast WAV decoder, enabling efficient conversion of video and audio data into PyTorch tensors for ML workflows.
A detailed breakdown of the original PlayStation's hardware architecture, covering the CPU design and its historical context.
Nvidia announced the RTX Spark CPU for AI agent PCs, partnering with major PC makers like Microsoft, Dell, and HP, aiming to capture a $200B market.
At Computex 2026, AMD announced extended AM5 socket support through at least 2029 and re-released the Ryzen 7 5800X3D as a 10th Anniversary Edition for AM4, offering budget-friendly CPU upgrade options.
AMD at Computex 2026 promises AM5 motherboard support through 2029, relaunches the Ryzen 7 5800X3D for AM4's 10th anniversary, introduces the Ryzen 7 7700X3D, and expands the Radeon RX 9070 GRE globally, emphasizing the longevity of its existing platforms.
Explores the behavior of floor and ceil functions when applied to denormalized floating-point numbers, highlighting differences between CPU and GPU implementations and potential pitfalls.
A benchmark comparing Needle 26M and Qwen3-0.6B on CPU function calling shows the smaller Needle model wins in accuracy and speed, but with distinct failure modes: Needle picks the wrong tool while Qwen3 often fails to emit tool calls.
The article details z386, an open-source FPGA implementation of an 80386 CPU built using the original Intel microcode. It can boot DOS 6/7, run protected-mode programs, and play classic games like Doom, serving as both an educational reconstruction and a usable FPGA CPU.
A blog post detailing the successful disassembly and analysis of the Intel 80386 microcode, revealing 215 instruction entry points and the complex internal architecture.
A deep dive into how L1 instruction cache set conflicts and code alignment caused an unexpected performance regression in Go, and the investigation process.