cupy/cupy
Summary
CuPy is a GPU-accelerated library that serves as a drop-in replacement for NumPy/SciPy, enabling efficient array operations on NVIDIA CUDA and AMD ROCm platforms.
View Cached Full Text
Cached at: 06/28/26, 11:18 AM
cupy/cupy
Source: https://github.com/cupy/cupy

CuPy : NumPy & SciPy for GPU
Website | Install | Tutorial | Examples | Documentation | API Reference | Forum
CuPy is a NumPy/SciPy-compatible array library for GPU-accelerated computing with Python. CuPy acts as a drop-in replacement to run existing NumPy/SciPy code on NVIDIA CUDA or AMD ROCm platforms.
>>> import cupy as cp
>>> x = cp.arange(6).reshape(2, 3).astype('f')
>>> x
array([[ 0., 1., 2.],
[ 3., 4., 5.]], dtype=float32)
>>> x.sum(axis=1)
array([ 3., 12.], dtype=float32)
CuPy also provides access to low-level CUDA features.
You can pass ndarray to existing CUDA C/C++ programs via RawKernels, use Streams for performance, or even call CUDA Runtime APIs directly.
Installation
Pip
Binary packages (wheels) are available for Linux and Windows on PyPI. Choose the right package for your platform.
| Platform | Architecture | Command |
|---|---|---|
| CUDA 12.x | x86_64 / aarch64 | pip install cupy-cuda12x |
| CUDA 13.x | x86_64 / aarch64 | pip install cupy-cuda13x |
| ROCm 7.0 (experimental) | x86_64 | pip install cupy-rocm-7-0 |
[!NOTE]
To install pre-releases, append--pre -U -f https://pip.cupy.dev/pre(e.g.,pip install cupy-cuda12x --pre -U -f https://pip.cupy.dev/pre).
Conda
Binary packages are also available for Linux and Windows on Conda-Forge.
| Platform | Architecture | Command |
|---|---|---|
| CUDA | x86_64 / aarch64 / ppc64le | conda install -c conda-forge cupy |
If you need a slim installation (without also getting CUDA dependencies installed), you can do conda install -c conda-forge cupy-core.
If you need to use a particular CUDA version (say 12.0), you can use the cuda-version metapackage to select the version, e.g. conda install -c conda-forge cupy cuda-version=12.0.
[!NOTE]
If you encounter any problem with CuPy installed fromconda-forge, please feel free to report to cupy-feedstock, and we will help investigate if it is just a packaging issue inconda-forge’s recipe or a real issue in CuPy.
Docker
Use NVIDIA Container Toolkit to run CuPy container images.
$ docker run --gpus all -it cupy/cupy
Resources
- Installation Guide - instructions on building from source
- Release Notes
- Projects using CuPy
- Contribution Guide
- GPU Acceleration in Python using CuPy and Numba (GTC November 2021 Technical Session)
- [GPU-Acceleration of Signal Processing Workflows using CuPy and cuSignal1 (ICASSP’21 Tutorial)](https://github.com/awthomp/cusignal-icassp-tutorial)
cuSignal is now part of CuPy starting v13.0.0.
License
MIT License (see LICENSE file).
CuPy is designed based on NumPy’s API and SciPy’s API (see docs/source/license.rst file).
CuPy is being developed and maintained by Preferred Networks and community contributors.
Reference
Ryosuke Okuta, Yuya Unno, Daisuke Nishino, Shohei Hido and Crissman Loomis. CuPy: A NumPy-Compatible Library for NVIDIA GPU Calculations. Proceedings of Workshop on Machine Learning Systems (LearningSys) in The Thirty-first Annual Conference on Neural Information Processing Systems (NIPS), (2017). [PDF]
@inproceedings{cupy_learningsys2017,
author = "Okuta, Ryosuke and Unno, Yuya and Nishino, Daisuke and Hido, Shohei and Loomis, Crissman",
title = "CuPy: A NumPy-Compatible Library for NVIDIA GPU Calculations",
booktitle = "Proceedings of Workshop on Machine Learning Systems (LearningSys) in The Thirty-first Annual Conference on Neural Information Processing Systems (NIPS)",
year = "2017",
url = "http://learningsys.org/nips17/assets/papers/paper_16.pdf"
}
Similar Articles
@charles_irl: New articles in the GPU Glossary for CuTe DSL, CUTLASS, and CuTe -- the tools used to write some of the highest-perform…
New articles in the GPU Glossary cover CuTe DSL, CUTLASS, and CuTe – tools for writing high-performance GPU kernels on data center GPUs, with examples in Python.
Faster physics in Python
OpenAI open-sources mujoco-py, a high-performance Python library for robotic simulation using the MuJoCo engine, featuring ~40x speedup with headless GPU rendering and VR interaction support.
CUDA-oxide: Nvidia's official Rust to CUDA compiler
CUDA-oxide is an experimental Rust-to-CUDA compiler developed by NVIDIA that enables writing safe GPU kernels in idiomatic Rust, compiling directly to PTX without requiring domain-specific languages or foreign bindings.
cuda-oxide: cuda-oxide is an experimental Rust-to-CUDA compiler
cuda-oxide is an experimental Rust-to-CUDA compiler backend released by NVIDIA, enabling pure Rust GPU kernel development without foreign language bindings.
Show HN: cuTile Rust: Safe, data-race-free GPU kernels in Rust
NVIDIA Labs releases cuTile Rust, a tile-based system for writing memory-safe, data-race-free GPU kernels in idiomatic Rust. It extends Rust's ownership model to GPU kernels, JIT-compiles Rust AST to GPU code, and achieves performance close to native CUDA.