#hasher #hash #hash-map

no-std gxhash

GxHash non-cryptographic algorithm

12 stable releases

3.4.1 May 27, 2024
3.3.2 May 27, 2024
3.1.1 Jan 21, 2024
3.1.0 Dec 30, 2023
1.1.1 Nov 17, 2023

#117 in Algorithms

Download history 1426/week @ 2024-06-11 988/week @ 2024-06-18 1158/week @ 2024-06-25 1610/week @ 2024-07-02 996/week @ 2024-07-09 791/week @ 2024-07-16 910/week @ 2024-07-23 959/week @ 2024-07-30 1261/week @ 2024-08-06 800/week @ 2024-08-13 1274/week @ 2024-08-20 1141/week @ 2024-08-27 1133/week @ 2024-09-03 1101/week @ 2024-09-10 900/week @ 2024-09-17 1075/week @ 2024-09-24

4,392 downloads per month
Used in 17 crates (16 directly)

MIT license

40KB
646 lines

GxHash

Build & Test Cross Compile Rust Version Compatibility

GxHash is a blazingly fast and robust non-cryptographic hashing algorithm.

Usage

cargo add gxhash

Used directly as a hash function:

let bytes: &[u8] = "hello world".as_bytes();
let seed = 1234;
println!(" 32-bit hash: {:x}", gxhash::gxhash32(&bytes, seed));
println!(" 64-bit hash: {:x}", gxhash::gxhash64(&bytes, seed));
println!("128-bit hash: {:x}", gxhash::gxhash128(&bytes, seed));

GxHash provides an implementation of the Hasher trait. For convenience, this crate also provides the type aliases gxhash::HashMap and gxhash::HashSet.

use gxhash::{HashMap, HashMapExt};

let mut map: HashMap<&str, i32> = HashMap::new();
map.insert("answer", 42);

Features

Blazingly Fast 🚀

Up to this date, GxHash is the fastest non-cryptographic hashing algorithm of its class, for all input sizes. This performance is possible mostly thanks to heavy usage of SIMD intrinsics, high ILP construction, a small bytecode (easily inlined and cached) and some (outrageously unsafe) tricks.

See the benchmarks.

Highly Robust 🗿

GxHash uses several rounds of hardware-accelerated AES block cipher for efficient bit mixing.
Thanks to this, GxHash passes all SMHasher tests, which is the de facto quality benchmark for non-cryptographic hash functions, gathering most of the existing algorithms. GxHash has low collisions, uniform distribution and high avalanche properties.

Check out the paper for more technical details.

0 Dependencies 📦

GxHash has 0 cargo dependency. The Hasher and Hashset/Hashmap convenience types require the standard library, enabled by default with the std feature.

Portability

Important Because GxHash relies on aes hardware acceleration, you must make sure the aes feature is enabled when building (otherwise it won't build). This can be done by setting the RUSTFLAGS environment variable to -C target-feature=+aes or -C target-cpu=native (the latter should work if your CPU is properly recognized by rustc, which is the case most of the time).

Architecture Compatibility

GxHash is compatible with:

  • X86 processors with AES-NI & SSE2 intrinsics
  • ARM processors with AES & NEON intrinsics

Warning Other platforms are currently not supported (there is no fallback). GxHash will not build on these platforms.

Hashes Stability

All generated hashes for a given version of GxHash are stable, meaning that for a given input the output hash will be the same across all supported platforms.

no_std

The std feature flag enables the HashMap/HashSet container convenience type aliases. This is on by default. Disable to make the crate no_std:

[dependencies.gxhash]
...
default-features = false

hybrid

The hybrid feature flag enables a hybrid implementation of GxHash. This is disabled by default. When hybrid feature is enabled and for CPUs that supports it, GxHash will use wider registers and instructions (VAES + AVX2), which can lead to a significant performance improvement for large inputs. This preserves hashes stability, meaning that hashes generated with or without the hybrid feature are the same for a given input and seed.

Note: Even without this feature enabled GxHash is already the fastest option out there. We recommend enabling this feature only when inputs can be larger than a few hundred bytes, see the benchmarks below.

Benchmarks

Benchmark
GxHash is continuously benchmarked on X86 and ARM Github runners.

To run the benchmarks locally use one of the following:

# Benchmark throughput
# Add --features bench-md for markdown output or --features bench-plot for .svg plots
cargo bench --bench throughput

# Benchmark performance of GxHash's Hasher when used in a HashSet
cargo bench --bench hashset

Throughput

Throughput is measured as the number of bytes hashed per second.

Some prefer talking latency (time for generating a hash) or hashrate (the number of hashes generated per second) for measuring hash function performance, but those are all equivalent in the end as they all boil down to measuring the time it takes to hash some input and then apply different scalar transformation. For instance, if latency for a 4 bytes hash is 1 ms, then the throughput is 1 / 0.001 * 4 = 4000 bytes per second. Throughput allows us to conveniently compare the performance of a hash function for any input size on a single graph.

Lastest Benchmark Results:
aarch64 x86_64 x86_64-hybrid

Security

DOS Resistance

GxHash is a seeded hashing algorithm, meaning that depending on the seed used, it will generate completely different hashes. The default HasherBuilder (GxHasherBuilder::default()) uses seed randomization, making any HashMap/HashSet more DOS resistant, as it will make it much more difficult for attackers to be able to predict which hashes may collide without knowing the seed used. This does not mean however that it is completely DOS resistant. This has to be analyzed further.

Multicollisions Resistance

GxHash uses a 128-bit internal state. This makes GxHash a widepipe construction when generating hashes of size 64-bit or smaller, which had amongst other properties to be inherently more resistant to multicollision attacks. See this paper for more details.

Cryptographic Properties

GxHash is a non-cryptographic hashing algorithm, thus it is not recommended to use it as a cryptographic algorithm (it is not a replacement for SHA). It has not been assessed if GxHash is preimage resistant and how difficult it is to be reversed.

Contributing

  • Feel free to submit PRs
  • Repository is entirely usable via cargo commands
  • Versioning is the following
    • Major for stability breaking changes (output hashes for a same input are different after changes)
    • Minor for API changes/removal
    • Patch for new APIs, bug fixes and performance improvements

ℹ️ cargo-asm is an easy way to view the actual generated assembly code (cargo asm gxhash::gxhash::gxhash64) (method #[inline] should be removed otherwise it won't be seen by the tool)
ℹ️ AMD μProf gives some useful insights on time spent per instruction.

Publication

Author note: I'm committed to the open dissemination of scientific knowledge. In an era where access to information is more democratized than ever, I believe that science should be freely available to all – both for consumption and contribution. Traditional scientific journals often involve significant financial costs, which can introduce biases and can shift the focus from purely scientific endeavors to what is currently trendy.

To counter this trend and to uphold the true spirit of research, I have chosen to share my work on "gxhash" directly on GitHub, ensuring that it's openly accessible to anyone interested. Additionally, the use of a free Zenodo DOI ensures that this research is citable and can be referenced in other works, just as traditional publications are.

I strongly believe in a world where science is not behind paywalls, and I am in for a more inclusive, unbiased, and open scientific community.

Publication:
PDF

Cite this publication / algorithm:
DOI

Dependencies