5 releases (1 stable)
1.1.0 | Nov 17, 2024 |
---|---|
1.0.0 |
|
0.11.0 | Oct 8, 2024 |
0.10.0 | Jun 21, 2024 |
0.1.0 |
|
#222 in Algorithms
459 downloads per month
Used in 2 crates
110KB
1.5K
SLoC
kmeans
kmeans is a small and fast library for k-means clustering calculations. It requires a nightly compiler with the portable_simd feature to work.
Here is a small example, using kmean++ as initialization method and lloyd as k-means variant:
use kmeans::*;
fn main() {
let (sample_cnt, sample_dims, k, max_iter) = (20000, 200, 4, 100);
// Generate some random data
let mut samples = vec![0.0f64;sample_cnt * sample_dims];
samples.iter_mut().for_each(|v| *v = rand::random());
// Calculate kmeans, using kmean++ as initialization-method
// KMeans<_, 8> specifies to use f64 SIMD vectors with 8 lanes (e.g. AVX512)
let kmean: KMeans<f64, 8, _> = KMeans::new(samples, sample_cnt, sample_dims, EuclideanDistance);
let result = kmean.kmeans_lloyd(k, max_iter, KMeans::init_kmeanplusplus, &KMeansConfig::default());
println!("Centroids: {:?}", result.centroids);
println!("Cluster-Assignments: {:?}", result.assignments);
println!("Error: {}", result.distsum);
}
Datastructures
For performance-reasons, all calculations are done on bare vectors, using hand-written SIMD intrinsics from the packed_simd
crate. All vectors are stored row-major, so each sample is stored in a consecutive block of memory.
Supported variants / algorithms
- lloyd (standard kmeans)
- minibatch
Supported centroid initialization methods
- KMean++
- random partition
- random sample
Supported distance functions
- Euclidean distance
- Histogram distance
Dependencies
~2.2–3MB
~61K SLoC