5 releases (3 breaking)
new 0.6.0 | Jan 11, 2025 |
---|---|
0.5.0 | Apr 11, 2022 |
0.4.0 | Apr 6, 2022 |
0.2.1 | Dec 6, 2021 |
0.2.0 | Dec 6, 2021 |
#199 in Compression
153 downloads per month
67KB
1.5K
SLoC
compress_io
Convenience library for reading and writing compressed files/streams
The aim of compress_io
is to make
it simple for an application to support multiple compression formats with a minimal effort
from the developer and also from the user (i.e., an application can accept uncompressed
or compressed input in a range of different formats and neither the developer nor the user
have to specify which formats have been used). compress_io
does not provide the compression/decompression itself but uses external utilities
such as gzip, bzip2 or zstd as read or write filters.
lib.rs
:
Convenience library for reading and writing compressed files / streams
compress_io`` does not provide the compression/decompression itself but uses external utilities such as [gzip], [bzip2] or [zstd] as read or write filters. The aim of
compress_io` is to make
it simple for an application to support multiple compression formats with a minimal effort
from the developer and also from the user (i.e., an application can accept uncompressed
or compressed input in a range of different formats and neither the developer nor the user
have to specify which formats have been used).
Overview
The main way to work with compress_io
is via CompressIo
(or AsyncCompressIo
in the
case of async
code). A reader (implementing Read
), buffered reader (implementing
BufRead
), writer or buffered writer (both implementing Write
) can be generated from
CompressIo
(or AsyncCompressIo
). By default readers and writers use stdin
and
stdout
, but a file path can also be specified with path
. By default compress_io
will
detect the compression format of compressed input files automatically based on the initial
contents of the file/stream and select an appropriate utility if available in the users
$PATH
, and the format of output files based on the file extension. These automatic methods
can be overridden by ctype
. compress_io
will make use of parallel versions of
compression utilities if available. By default the compression utilities will be run using
with the default threading options, but this behvaiour can be changed using cthreads
.
Examples
use std::io::{self, BufRead, Write};
use compress_io::compress::CompressIo;
fn main() -> io::Result<()> {
// Read from a (presumably) gzipped file foo.gz and write out to file `foo.xz` which will be
// compressed using [xz] (assuming both [gzip] and [xz] are in the users Path.
// In this example both read and write streams are buffered
let mut reader = CompressIo::new().path("foo.gz").bufreader()?;
let mut writer = CompressIo::new().path("foo.xz").bufwriter()?;
for s in reader.lines().map(|l| l.expect("Read error")) {
writeln!(writer, "{}", s)?
}
Ok(())
}
Decompression utilities can be specified by the user, or can be selected automatically based on an examination of the first few bytes of the input.
use compress_io::{
compress::CompressIo,
compress_type::CompressType,
};
// Open a reader from `stdin`, using the first bytes from the file to determine whether the
// file is compressed or not
let mut rd1 = CompressIo::new().reader()?;
// Open a buffered reader from `foo.bz2` using [bzip2] to decompress
let mut rd2 = CompressIo::new().path("foo.bz2").ctype(CompressType::Bzip2).bufreader()?;
Compression utilities can also either be explicitly selected, or they can
be set automatically based on the file name (so a file called test.zst
would be
compressed using the zstd utility). If the compression format is selected explicitly then
extension will be added to the filename unless the extension is already present, or the
fix_path
option has been selected.
use compress_io::{
compress::CompressIo,
compress_type::CompressType,
};
// Open a compressed writer to `stdout`, using [zstd] to compress the stream
let mut wrt1 = CompressIo::new().ctype(CompressType::Zstd).writer()?;
// Open a compressed buffered writer to the file `foo.lzma` using lzma to decompress
let mut wrt2 = CompressIo::new().path("foo").ctype(CompressType::Lzma).bufwriter()?;
Several of the possible compression formats can be generated by multiple utilities, and this allows alternate utilities to be used if the standard utility is not available.
For example, the standard utility for xz compression is the xz tool, however zstd can also perform xz compression and will be substituted by the library if xz is not available. Note the if bgzip compression is requested then only the bgzip utility will be used; even though bgzip compression is compatible with the gzip format and can be decoded by any compressor that handles gzip, extra information is added during compression by bgzip that other utilities do not generate.
For compression, certain of the utilities are multi-threaded. If multiple utilities are
available to perform a given compression type, preference will be given to multi-threaded
versions. For example, if gzip compression is requested and the pigz utility is available
in the current $PATH
then this will be used in favour gzip. For compression the user can
specify a preference for threading (where available) using cthreads
.
use compress_io::{
compress::CompressIo,
compress_type::{CompressType, CompressThreads},
};
// Open a compressed buffered writer to `foo.zstd`, using [zstd] to compress the stream
// using 4 threads
let mut wrt = CompressIo::new().ctype(CompressType::Zstd)
.cthreads(CompressThreads::Set(4)).bufwriter()?;
Usage
For usage with synchronous code only, add compress_io
as a dependency in your Cargo.toml
to
use from crates.io:
[dependencies ]
compress_io = "0.2"
For use with asynchronous code then the async
feature should be enabled:
[dependencies ]
compress_io = { version = "0.2", features = ["async"] }
Dependencies
~2–12MB
~143K SLoC