#sorting #parallel #unix #line

app sortpar

Unix sort but in parallel

2 releases

0.1.1 Aug 30, 2018
0.1.0 Aug 8, 2018

#256 in #parallel

MIT license

17KB
364 lines

sortpar

sortpar is a command line tool that can sort text files in parallel. It does this by taking advantage of the rayon crate's implementation on slices.

Build Status

Installation

sortpar requires a nightly Rust compiler as it is testing out editions. You can install it by running:

cargo +nightly install sortpar

This will place a binary called sp in the $HOME/.cargo/bin directory on your machine. You can add that path to the $PATH variable to have easier access. There are plans to have more installation options if the tool becomes useful to those without Rust on their system.

Usage

Currently sortpar supports a subset of the options that the Unix sort command does. They can be listed by running:

sp --help

Benchmarks

It would be great to have more benchmarks but it is also hard to get an accurate measure across multiple cases. Just to give an idea of the performance at the moment, I sorted Peter Norvig's big text file. WARNING, the link leads to a 6.2MB file.

Using hyperfine I got these results:

Benchmark #1: sp big.txt

  Time (mean ± σ):     445.1 ms ±   7.6 ms    [User: 857.0 ms, System: 90.8 ms]

  Range (min … max):   436.6 ms … 457.4 ms
Benchmark #1: gsort --parallel=4 big.txt

  Time (mean ± σ):      2.604 s ±  0.023 s    [User: 2.550 s, System: 0.032 s]

  Range (min … max):    2.558 s …  2.632 s

Why didn't I use LC_ALL=C for the GNU sort benchmark? Because it would be unfair to allow GNU sort to avoid the overhead of UTF-8 decoding. Perhaps in the future sortpar can have an option to do this as well.

Issues

Please feel free to open issues if any bugs are encountered or if you would like to add a feature.

License

MIT

Dependencies

~6.5MB
~134K SLoC