47 releases

0.2.25 May 9, 2024
0.2.24 May 1, 2024
0.1.11 Sep 29, 2023
0.1.7 Aug 18, 2023
0.0.14 Sep 30, 2022

#185 in Biology

Download history 38/week @ 2024-07-03 1/week @ 2024-07-10 4/week @ 2024-07-17 51/week @ 2024-07-24 563/week @ 2024-07-31 59/week @ 2024-09-18 133/week @ 2024-09-25 5/week @ 2024-10-02

197 downloads per month
Used in 2 crates

MIT license

515KB
11K SLoC

MIT licensed actions status codecov Crates.io docs.rs

bedrs

bedtools-like functionality for interval sets in rust

Summary

This is an interval library written in rust that takes advantage of the trait system, generics, monomorphization, and procedural macros, for high efficiency interval operations with nice quality of life features for developers.

It focuses around the Coordinates trait, which once implemented on and arbitrary interval type allows for a wide range of genomic interval arithmetic.

It also introduces a new collection type, IntervalContainer, which acts as a collection of Coordinates and has many set operations implemented.

Interval arithmetic can be thought of as set theoretic operations (like intersection, union, difference, complement, etc.) on intervals with associated chromosomes, strands, and other genomic markers.

This library facilitates the development of these types of operations on arbitrary types and lets the user tailor their structures to minimize computational overhead, but also remains a flexible library for general interval operations.

Usage

The main benefit of this library is that it is trait-based. So you can define your own types - but if they implement the Coordinates trait they can use the other functions within the library.

For detailed usage and examples please review the documentation.

Coordinates Trait

The library centers around the Coordinates trait.

This trait defines some minimal functions that are required for all set operations. This includes things like getting the chromosome ID of an interval, or the start and endpoints of that interval, or the strand.

This can be implemented by hand, or if you follow common naming conventions used in the library (chr, start, end, strand) then you can [derive(Coordinates)] on your custom interval type.

use bedrs::prelude::*;

// define a custom interval struct for testing
#[derive(Default, Coordinates)]
struct MyInterval {
    chr: usize,
    start: usize,
    end: usize,
}

Interval Types

While you can create your own interval types, there are plenty of 'batteries-included' types you can use in your own libraries already.

These include:

These are pre-built interval types and can be used in many usecases:

use bedrs::prelude::*;

// An interval on chromosome 1 and spanning base 20 <-> 40
let a = Bed3::new(1, 20, 40);

// An interval on chromosome 1 and spanning base 30 <-> 50
let b = Bed3::new(1, 30, 50);

// Find the intersecting interval of the two
// This returns an Option<Bed3> because they may not intersect.
let c = a.intersect(&b).unwrap();

assert_eq!(c.chr(), &1);
assert_eq!(c.start(), 30);
assert_eq!(c.end(), 40);

Interval Operations

Interval Set Operations

Set operations are performed using the methods of the IntervalContainer.

We can build an IntervalContainer easily on any collection of intervals:

use bedrs::prelude::*;

let set = IntervalContainer::new(vec![
    Bed3::new(1, 20, 30),
    Bed3::new(1, 30, 40),
    Bed3::new(1, 40, 50),
]);

assert_eq!(set.len(), 3);

For more details on each of these and more please explore the IntervalContainer for all associated methods.

  • Bound
  • Closest
  • Complement
  • Find
  • Internal
  • Merge
  • Sample
  • Intersect
  • Segment
  • Subtract

Other Work

This library is heavily inspired by other interval libraries in rust which are listed below:

It also was motivated by the following interval toolkits in C++ and C respectively:

Dependencies

~1.5–2.7MB
~48K SLoC