#line #reader #fs-file #file-io #io-write

ripline

This is not the greatest line reader in the world, this is just a tribute. Fast line based iteration almost entirely lifted from ripgreps grep_searcher. All credit to Andrew Gallant and the ripgrep contributors

1 unstable release

0.1.0 Jul 3, 2021

#9 in #lines

Download history 9/week @ 2024-09-11 23/week @ 2024-09-18 25/week @ 2024-09-25 19/week @ 2024-10-02 9/week @ 2024-10-09 7/week @ 2024-10-16 3/week @ 2024-10-23 6/week @ 2024-10-30 2/week @ 2024-11-06 1/week @ 2024-11-13 9/week @ 2024-11-20 71/week @ 2024-11-27 28/week @ 2024-12-04 15/week @ 2024-12-11 7/week @ 2024-12-18

90 downloads per month
Used in 2 crates

Unlicense/MIT

52KB
808 lines

🌊 ripline

Build Status license Version info
This is not the greatest line reader in the world, this is just a tribute.

Fast line based iteration almost entirely lifted from ripgrep's grep_searcher.

All credit to Andrew Gallant and the ripgrep contributors.

Why?

  • Doesn't rely on a clousre like the bstr::for_line* methods (useful in some award lifetime scenarios).
  • No silently capped line lengths unlike rust-linereader
  • Brings the LineIter with for working with memmap files

Not all of this functionality was exposed in the grep_searcher crate, and rightly so as a lot of it had grep specific configurations embeded into the logic (i.e. binary detection).

What have I changed?

Not much. I took out some of the ripgrep specific logic such as the binary detection, some search related configs, and consolidated a few of the helper stucts from the other grep_* crates.

Example

See examples for more.

use grep_cli::stdout;
use ripline::{
    line_buffer::{LineBufferBuilder, LineBufferReader},
    lines::LineIter,
    LineTerminator,
};
use std::{env, error::Error, fs::File, io::Write, path::PathBuf};
use termcolor::ColorChoice;

fn main() -> Result<(), Box<dyn Error>> {
    let path = PathBuf::from(env::args().nth(1).expect("Failed to provide input file"));

    let mut out = stdout(ColorChoice::Never);

    let reader = File::open(&path)?;
    let terminator = LineTerminator::byte(b'\n');
    let mut line_buffer = LineBufferBuilder::new().build();
    let mut lb_reader = LineBufferReader::new(reader, &mut line_buffer);

    while lb_reader.fill()? {
        let lines = LineIter::new(terminator.as_byte(), lb_reader.buffer());
        for line in lines {
            out.write_all(line)?;
        }
        lb_reader.consume_all();
    }

    Ok(())
}

Crude and untrustworthy benchmarks

From examples/ripline_benchmarks.rs. Initial benchmark script take from rust-linereader, which is also included in the benchmarks as LR:*.

The input used was all_train.csv, unzipped can catted together five times createing a ~25G file.

Method Time Lines/sec Bandwidth
read() 2.01s 17439155/s 12303.42 MB/s
LR::next_batch() 2.11s 16576174/s 11694.59 MB/s
LR::next_line() 2.65s 13196734/s 9310.37 MB/s
ripline_line_buffer() 2.64s 13277194/s 9367.14 MB/s
ripline_mmap() 2.16s 16183503/s 11417.55 MB/s
bstr_for_line() 2.47s 14174502/s 10000.19 MB/s
read_until() 2.86s 12230594/s 8628.75 MB/s
read_line() 4.16s 8417415/s 5938.53 MB/s
lines() 5.05s 6930901/s 4889.79 MB/s

Note that read and next_batch are not counting lines.

Hardware: Ubuntu 20 AMD Ryzen 9 3950X 16-Core Processor w/ 64 GB DDR4 memory and 1TB NVMe Drive

Dependencies

~675KB