1 unstable release
new 0.1.0 | Nov 25, 2024 |
---|
#643 in Algorithms
Used in lrge
105KB
2K
SLoC
liblrge
This is a Rust library for estimating genome size from long read overlaps. The library is used by the lrge
command
line tool documented in the root of this repository.
See the documentation for example usage and API documentation.
lib.rs
:
liblrge
liblrge
is a Rust library that provides utilities for estimating genome size for a given set
of reads.
You can find a command-line interface (CLI) tool that uses this library in the lrge
crate.
Usage
The library provides two strategies for estimating genome size:
TwoSetStrategy
The two-set strategy uses two (random) sets of reads to estimate the genome size. The query set, which is generally smaller, is overlapped against a target set of reads. A genome size estimate is generated for each read in the query set, based on the number of overlaps and the average read length. The median of these estimates is taken as the final genome size estimate.
use liblrge::{Estimate, TwoSetStrategy};
use liblrge::twoset::{Builder, DEFAULT_TARGET_NUM_READS, DEFAULT_QUERY_NUM_READS};
let input = "path/to/reads.fastq";
let mut strategy = Builder::new()
.target_num_reads(DEFAULT_TARGET_NUM_READS)
.query_num_reads(DEFAULT_QUERY_NUM_READS)
.threads(4)
.build(input);
let est_result = strategy.estimate(false, None, None).expect("Failed to generate estimate");
let estimate = est_result.estimate;
// do something with the estimate
AvaStrategy
The all-vs-all (ava) strategy takes a (random) set of reads and overlaps it against itself to estimate the genome size. The genome size estimate is generated for each read in the set, based on the number of overlaps and the average read length - minus the read being assessed. The median of these estimates is taken as the final genome size estimate.
use liblrge::{Estimate, AvaStrategy};
use liblrge::ava::{Builder, DEFAULT_AVA_NUM_READS};
let input = "path/to/reads.fastq";
let mut strategy = Builder::new()
.num_reads(DEFAULT_AVA_NUM_READS)
.threads(4)
.build(input);
let est_result = strategy.estimate(false, None, None).expect("Failed to generate estimate");
let estimate = est_result.estimate;
// do something with the estimate
Features
This library includes optional support for compressed file formats, controlled by feature flags.
By default, the compression
feature is enabled, which activates support for all included
compression formats.
Available Features
- compression (default): Enables all available compression formats (
gzip
,zstd
,bzip2
,xz
). - gzip: Enables support for gzip-compressed files (
.gz
) using theflate2
crate. - zstd: Enables support for zstd-compressed files (
.zst
) using thezstd
crate. - bzip2: Enables support for bzip2-compressed files (
.bz2
) using thebzip2
crate. - xz: Enables support for xz-compressed files (
.xz
) using theliblzma
crate.
Enabling and Disabling Features
By default, all compression features are enabled. However, you can selectively enable or disable them
in your Cargo.toml
to reduce dependencies or target specific compression formats:
To disable all compression features:
liblrge = { version = "0.1.0", default-features = false }
To enable only specific compression formats, list the desired features in Cargo.toml
:
liblrge = { version = "0.1.0", default-features = false, features = ["gzip", "zstd"] }
In this example, only gzip
(flate2
) and zstd
are enabled, so liblrge
will support .gz
and .zst
files.
Compression Detection
The library uses magic bytes at the start of the file to detect its compression format before deciding how to read it. Supported formats include gzip, zstd, bzip2, and xz, with automatic decompression if the appropriate feature is enabled.
Disabling logging
liblrge
will output some logging information via the log
crate. If you wish to
suppress this logging you can configure the logging level in your application. For example, using
the env_logger
crate you can do the following:
use log::LevelFilter;
let mut log_builder = env_logger::Builder::new();
log_builder
.filter(None, LevelFilter::Info)
.filter_module("liblrge", LevelFilter::Off);
log_builder.init();
// Your application code here
This will set the global logging level to Info
and disable all logging from the liblrge
library.
Dependencies
~5–8MB
~139K SLoC