#floating-point #compression #lossless #adaptive #et #al #integer-compression

nightly alp

A pure Rust implementation of Adaptive Lossless floating-Point Compression by Afroozeh et al

1 unstable release

0.0.1 Oct 11, 2024
0.0.0 Oct 11, 2024

#9 in #et

Apache-2.0

34KB
714 lines

ALP: Adaptive Lossless floating-Point

As modern data and analytics workloads have shifted from SQL to general-purpose programming languages such as Python, the amount of floating point data has grown massively. It is a problem for modern database systems to effectively compress this data without loss of precision, while preserving desirable traits such as random access and auto-vectorization.

In 2023, Afroozeh et al. published ALP, a response to these issues. The code was written in C++ and integrated into DuckDB. To ease the integration into other tools, we present a Rust implementation of both variants of ALP (ALP and ALP for "real doubles").


lib.rs:

This crate contains an implementation of the floating point compression algorithm from the paper "ALP: Adaptive Lossless floating-Point Compression" by Afroozeh et al.

The compressor has two variants, classic ALP which is well-suited for data that does not use the full precision, and "real doubles", values that do.

Classic ALP will return small integers, and it is meant to be cascaded with other integer compression techniques such as bit-packing and frame-of-reference encoding. Combined this allows for significant compression on the order of what you can get for integer values.

ALP-RD is generally terminal, and in the ideal case it can represent an f64 is just 49 bits, though generally it is closer to 54 bits per value or ~12.5% compression.

Dependencies

~165–400KB