3 unstable releases

new 0.7.7 Feb 17, 2025
0.7.6 Jan 29, 2025
0.6.2 Jan 14, 2025
0.5.1 Nov 26, 2024
0.1.2 Sep 26, 2024

#305 in Algorithms

Download history 104/week @ 2024-10-30 169/week @ 2024-11-06 307/week @ 2024-11-13 360/week @ 2024-11-20 111/week @ 2024-11-27 251/week @ 2024-12-04 113/week @ 2024-12-11 7/week @ 2024-12-18 236/week @ 2025-01-01 299/week @ 2025-01-08 64/week @ 2025-01-15 833/week @ 2025-01-29 45/week @ 2025-02-05 28/week @ 2025-02-12

912 downloads per month
Used in 2 crates

BSD-3-Clause

190KB
3K SLoC

Parallel Construction of Suffix Arrays in Rust

See https://docs.rs/libsufr/latest/libsufr/ for API documentation.

This is a Rust library crate for creating suffix arrays.

The basic ideas are as follow:

  • Read the input file as u8 (unsigned 8-bit integer values).
  • Select the suffixes, which are normally 0 to the length of the text but there is the option to skip suffixes that don't start with A/C/G/T if the input is DNA. Note: Following the C++ implementation, we use 32-bit integers if the input text length is less than 2^32 and 64-bit integers, otherwise.
  • Create partitions by randomly choosing suffixes and copying the suffixes to the highest possible partition bounded by any pivot.
  • Sort the partitions in parallel.
  • Merge the partitions. Because the values fall into nonoverlapping ranges, these subarrays can be appended in order to produce the final SA.
  • Produce a binary-encoded output file with the suffix/LCP arrays and other metadata.

Some advantages to this algorithm:

  • The various partitioned subarrays can be processed independently by separate threads, and no thread will ever have to merge the entire input.
  • Suffix comparisons are made faster by caching LCPs.
  • Using u8 for the input text and 32-bits (when possible) for SA/LCP results in lower memory usage.

See the repository for documentation: https://github.com/TravisWheelerLab/sufr

See Also

Use cargo install sufr for a CLI.

Authors

Dependencies

~9–20MB
~266K SLoC