1 unstable release
0.1.0 | Nov 28, 2021 |
---|
#1855 in Math
11KB
211 lines
samplr
samplr is a CLI tool to randomly sample data; generating a fixed size sample of input lines with uniform probabilities.
Installation
Source
Requires Rust to be installed.
git clone https://github.com/SteadBytes/sample.git
cd sample
cargo install --path .
Examples
Sample 15 lines from a file:
sample -n 15 things.txt
Sample 15 lines from standard input:
<things.txt | sample -n 15
Sample 15 lines from multiple files:
sample -n 15 things.txt other_things.txt
Sampling Algorithm
samplr uses a Reservoir Sampling algorithm to generate fixed size samples from an input stream of unknown length. For more details, see the implementation and the linked blog article.
Development
Tests
Run unit tests:
cargo test
Run all tests (including potentially CPU intensive statistical tests):
cargo test --all-features --release
Dependencies
~1MB
~12K SLoC