#local #lookup #sequences #space #element #repeating #log

prefix-sum-vec

Compressed storage for highly repeating elements, with O(log n) lookups

3 releases

0.1.2 Jan 23, 2023
0.1.1 Oct 19, 2022
0.1.0 Oct 11, 2022

#451 in Data structures

Download history 2905/week @ 2024-07-22 3468/week @ 2024-07-29 3094/week @ 2024-08-05 2306/week @ 2024-08-12 2268/week @ 2024-08-19 2228/week @ 2024-08-26 2533/week @ 2024-09-02 2156/week @ 2024-09-09 2355/week @ 2024-09-16 2841/week @ 2024-09-23 1667/week @ 2024-09-30 1578/week @ 2024-10-07 3600/week @ 2024-10-14 4127/week @ 2024-10-21 3641/week @ 2024-10-28 3888/week @ 2024-11-04

15,286 downloads per month
Used in 36 crates (4 directly)

Apache-2.0 OR BSD-3-Clause

21KB
379 lines

prefix-sum-vec

A Rust crate for compressed storage of highly repeating elements, with O(log n) lookups

What is this?

The data structure provided by this crate allow users to space-efficiently store repeating sequences of the same value. An example of this sort of data comes up in WebAssembly, where the locals for a function are represented as a sequence of types and their repetition count. The binary encoding of function locals such as these

(locals i32 i32 i32 i32 i64 i64 f64)

is actually encoded as a sequence along the lines of 0x04 i32 0x02 i64 0x01 f64. Were the decoded representation of the locals naively stuff everything into a vector, a potential denial-of-service hazard could arise. A crafted input with a representation of millions of locals in just a couple of bytes of encoded space would force a naive implementation to allocate gigabytes of memory space.

The data structure provided by this crate stores just one copy of the repeating element alongside with a prefix-sum of the indices at which the element can be found. So, given the locals example above, the storage would look something like this:

[(4, i32), (6, i64), (7, f64)]

From there on, looking up an element with an index 5 is a binary search away over the prefix-sum indices.

Usage

First, specify this crate in your Cargo.toml:

[dependencies]
prefix-sum-vec = "0.1"

This crate aims to be extremely lightweight and straightforward. As such there are no optional features to enable at this time. Then, refer to the documentation of the PrefixSumVec type to discover its usage in your code.

License

This project is licensed under the Apache 2.0 OR BSD 3-clause license at your choice.

No runtime deps