1 unstable release
0.0.1 | Dec 13, 2024 |
---|
#59 in #similarity
142 downloads per month
155KB
3K
SLoC
fast-distances
Rust Similarity and Distance Metrics Library
This Rust package provides a wide range of functions for computing various distance and similarity metrics between vectors or points in a high-dimensional space. These metrics are widely used in fields such as machine learning, statistics, data science, and computational biology.
Modules
Each module in this package implements a specific distance or similarity measure, some with gradient computations for optimization tasks. Below is a list of available modules:
- approx_log_gamma: Approximation of the logarithm of the Gamma function.
- bray_curtis: Bray-Curtis dissimilarity, a measure for ecological distance.
- bray_curtis_grad: Gradient of the Bray-Curtis dissimilarity.
- canberra: Canberra distance, a city block-like metric with a normalization.
- canberra_grad: Gradient of the Canberra distance.
- chebyshev: Chebyshev distance (L∞ distance), the maximum distance along any coordinate axis.
- chebyshev_grad: Gradient of the Chebyshev distance.
- correlation: Pearson correlation coefficient, a measure of linear correlation between two vectors.
- cosine: Cosine similarity, measuring the cosine of the angle between two vectors.
- cosine_grad: Gradient of the cosine similarity.
- dice: Dice coefficient, a similarity measure often used in bioinformatics.
- euclidean: Euclidean distance, the straight-line distance between two points.
- euclidean_grad: Gradient of the Euclidean distance.
- hamming: Hamming distance, the number of differing positions between two strings of equal length.
- haversine: Haversine distance, used to calculate the great-circle distance between two points on a sphere.
- haversine_grad: Gradient of the Haversine distance.
- hellinger: Hellinger distance, a measure for comparing probability distributions.
- hellinger_grad: Gradient of the Hellinger distance.
- hyperboloid_grad: Gradient of the hyperboloid distance, a metric on hyperbolic spaces.
- jaccard: Jaccard similarity coefficient, a measure of the intersection between two sets divided by their union.
- kulsinski: Kulsinski similarity coefficient, a distance measure for binary vectors.
- ll_dirichlet: Log-Likelihood of the Dirichlet distribution, used for probabilistic comparison of Dirichlet-distributed data.
- log_beta: Log of the Beta distribution, used in statistical modeling.
- log_single_beta: Logarithmic computation of a single Beta distribution.
- mahalanobis: Mahalanobis distance, a distance metric that accounts for correlations between variables.
- mahalanobis_grad: Gradient of the Mahalanobis distance.
- manhattan: Manhattan distance (L1 distance), the sum of the absolute differences between coordinates.
- manhattan_grad: Gradient of the Manhattan distance.
- matching: Matching distance, a similarity measure based on matching elements in two sets.
- minkowski: Minkowski distance, a generalization of both Euclidean and Manhattan distances.
- minkowski_grad: Gradient of the Minkowski distance.
- poincare: Poincaré distance, used for hyperbolic spaces and geometries.
- rogers_tanimoto: Rogers-Tanimoto similarity, a distance measure for binary data.
- russellrao: Russell-Rao similarity, a measure for binary vectors.
- sokal_michener: Sokal-Michener similarity, a metric for categorical data.
- sokal_sneath: Sokal-Sneath similarity, another metric for categorical data.
- standardised_euclidean: Standardized Euclidean distance, which normalizes the Euclidean distance by the variance.
- standardised_euclidean_grad: Gradient of the standardized Euclidean distance.
- weighted_minkowski: Weighted Minkowski distance, a variant of Minkowski with weightings for each dimension.
- weighted_minkowski_grad: Gradient of the weighted Minkowski distance.
- yule: Yule's coefficient, used to measure association between two binary vectors.
Installation
Add this package to your Cargo.toml to use it in your project:
[dependencies] fast-distances = "0.1" Usage
To use one of the available distance or similarity metrics, import the respective module in your Rust code:
use distances::{cosine, euclidean, manhattan};
fn main() {
let vector1 = vec![1.0, 2.0, 3.0];
let vector2 = vec![4.0, 5.0, 6.0];
// Compute cosine similarity
let cosine_sim = cosine(&vector1, &vector2);
println!("Cosine Similarity: {}", cosine_sim);
// Compute Euclidean distance
let euclidean_dist = euclidean(&vector1, &vector2);
println!("Euclidean Distance: {}", euclidean_dist);
// Compute Manhattan distance
let manhattan_dist = manhattan(&vector1, &vector2);
println!("Manhattan Distance: {}", manhattan_dist);
}
Contributing
Contributions are welcome! If you'd like to contribute a new metric or improve an existing one, feel free to open an issue or a pull request.
- Fork the repository.
- Clone your fork locally.
- Make changes and run tests to ensure they pass.
- Submit a pull request with a clear description of your changes.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgements
This package draws from many well-established distance and similarity metrics commonly used in data analysis, machine learning, and information retrieval.
Dependencies
~1.5MB
~33K SLoC