46 releases

0.13.0 Jan 8, 2025
0.12.2 Dec 17, 2024
0.12.1 Sep 5, 2024
0.11.8 Jul 16, 2024
0.2.0 Sep 27, 2022

#10 in Machine learning

Download history 2899/week @ 2024-09-27 2544/week @ 2024-10-04 2964/week @ 2024-10-11 3058/week @ 2024-10-18 2778/week @ 2024-10-25 3285/week @ 2024-11-01 2323/week @ 2024-11-08 2625/week @ 2024-11-15 3002/week @ 2024-11-22 3014/week @ 2024-11-29 4308/week @ 2024-12-06 4544/week @ 2024-12-13 2730/week @ 2024-12-20 2304/week @ 2024-12-27 3973/week @ 2025-01-03 3957/week @ 2025-01-10

13,615 downloads per month
Used in 94 crates (26 directly)

MIT/Apache

24MB
717K SLoC

cudarc: minimal and safe api over the cuda toolkit

crates.io docs.rs

Checkout cudarc on crates.io and docs.rs.

Safe abstractions over:

  1. CUDA driver API
  2. NVRTC API
  3. cuRAND API
  4. cuBLAS API
  5. cuBLASLt API
  6. NCCL API

Pre-alpha state, expect breaking changes and not all cuda functions contain a safe wrapper. Contributions welcome for any that aren't included!

Design

Goals are:

  1. As safe as possible (there will still be a lot of unsafe due to ffi & async)
  2. As ergonomic as possible
  3. Allow mixing of high level safe apis, with low level sys apis

To that end there are three levels to each wrapper (by default the safe api is exported):

use cudarc::driver::{safe, result, sys};
use cudarc::nvrtc::{safe, result, sys};
use cudarc::cublas::{safe, result, sys};
use cudarc::cublaslt::{safe, result, sys};
use cudarc::curand::{safe, result, sys};
use cudarc::nccl::{safe, result, sys};

where:

  1. sys is the raw ffi apis generated with bindgen
  2. result is a very small wrapper around sys to return Result from each function
  3. safe is a wrapper around result/sys to provide safe abstractions

Heavily recommend sticking with safe APIs

API Preview

It's easy to create a new device and transfer data to the gpu:

let dev = cudarc::driver::CudaDevice::new(0)?;

// allocate buffers
let inp = dev.htod_copy(vec![1.0f32; 100])?;
let mut out = dev.alloc_zeros::<f32>(100)?;

You can also use the nvrtc api to compile kernels at runtime:

let ptx = cudarc::nvrtc::compile_ptx("
extern \"C\" __global__ void sin_kernel(float *out, const float *inp, const size_t numel) {
    unsigned int i = blockIdx.x * blockDim.x + threadIdx.x;
    if (i < numel) {
        out[i] = sin(inp[i]);
    }
}")?;

// and dynamically load it into the device
dev.load_ptx(ptx, "my_module", &["sin_kernel"])?;

cudarc provides a very simple interface to launch kernels, tuples are the arguments!

let sin_kernel = dev.get_func("my_module", "sin_kernel").unwrap();
let cfg = LaunchConfig::for_num_elems(100);
unsafe { sin_kernel.launch(cfg, (&mut out, &inp, 100usize)) }?;

And of course it's easy to copy things back to host after you're done:

let out_host: Vec<f32> = dev.dtoh_sync_copy(&out)?;
assert_eq!(out_host, [1.0; 100].map(f32::sin));

License

Dual-licensed to be compatible with the Rust project.

Licensed under the Apache License, Version 2.0 http://www.apache.org/licenses/LICENSE-2.0 or the MIT license http://opensource.org/licenses/MIT, at your option. This file may not be copied, modified, or distributed except according to those terms.

Dependencies