2 unstable releases
new 0.4.0 | Jan 14, 2025 |
---|---|
0.3.0 | Oct 28, 2024 |
#285 in Science
1,806 downloads per month
Used in 19 crates
(via cubecl)
1MB
25K
SLoC
ROCm HIP runtime
Runtime that runs on ROCm HIP supported AMD GPUs.
Matrix multiplication acceleration is based on rocwmma by default. Note that kernel compilation time with rocwmma might be slow.
For RDNA3 GPUs, a dedicated compiler using WMMA intrinsics is available with the feature wmma-intrinsics
.
It offers much faster kernel compilation time and better performances on some kernels. Feel free to benchmark
with your use cases.
Dependencies
~7–21MB
~225K SLoC