24 releases
0.7.0-alpha.3 | Jul 1, 2024 |
---|---|
0.7.0-alpha.2 | Jun 26, 2023 |
0.7.0-alpha.1 | Feb 28, 2023 |
0.5.6 | Sep 14, 2022 |
0.1.0 | Jul 21, 2021 |
#220 in Algorithms
1,326 downloads per month
Used in 2 crates
(via sigalign-impl)
84KB
2K
SLoC
LtFmIndex
LtFmIndex
is a Rust library for building and using a FM-index that contains a lookup table of the first k-mer of a pattern. This index can be used to (1) count the number of occurrences and (2) locate the positions of a pattern in an indexed text.
Usage
Add to dependency
To use this library, add lt_fm_index
to your Cargo.toml
:
[dependencies]
lt_fm_index = "0.7.0-alpha"
- About
fastbwt
features- This feature can accelerate the indexing, but needs
cmake
to buildlibdivsufsort
and cannot be built as WASM.
- This feature can accelerate the indexing, but needs
Example code
use lt_fm_index::LtFmIndex;
use lt_fm_index::blocks::Block2; // `Block2` can index 3 types of characters.
// (1) Define characters to use
let characters_by_index: &[&[u8]] = &[
&[b'A', b'a'], // 'A' and 'a' are treated as the same
&[b'C', b'c'], // 'C' and 'c' are treated as the same
&[b'G', b'g'], // 'G' and 'g' are treated as the same
];
// Alternatively, you can use this simpler syntax:
let characters_by_index: &[&[u8]] = &[
b"Aa", b"Cc", b"Gg"
];
// (2) Build index
let text = b"CTCCGTACACCTGTTTCGTATCGGAXXYYZZ".to_vec();
let lt_fm_index= LtFmIndex::<u32, Block2<u128>>::build(
text,
characters_by_index,
2,
4,
).unwrap();
// (3) Match with pattern
let pattern = b"TA";
// - count
let count = lt_fm_index.count(pattern);
assert_eq!(count, 2);
// - locate
let mut locations = lt_fm_index.locate(pattern);
locations.sort(); // The locations may not be in order.
assert_eq!(locations, vec![5,18]);
// All unindexed characters are treated as the same character.
// In the text, X, Y, and Z can match any other unindexed character
let mut locations = lt_fm_index.locate(b"UNDEF");
locations.sort();
// Using the b"XXXXX", b"YYYYY", or b"!@#$%" gives the same result.
assert_eq!(locations, vec![25,26]);
// (4) Save and load
let mut buffer = Vec::new();
lt_fm_index.save_to(&mut buffer).unwrap();
let loaded = LtFmIndex::load_from(&buffer[..]).unwrap();
assert_eq!(lt_fm_index, loaded);
Repository
https://github.com/baku4/lt-fm-index
API Doc
Reference
- Ferragina, P., et al. (2004). An Alphabet-Friendly FM-Index, Springer Berlin Heidelberg: 150-160.
- Anderson, T. and T. J. Wheeler (2021). An optimized FM-index library for nucleotide and amino acid search, Cold Spring Harbor Laboratory.
- Wang, Y., X. Li, D. Zang, G. Tan and N. Sun (2018). Accelerating FM-index Search for Genomic Data Processing, ACM.
- Yuta Mori.
libdivsufsort
Dependencies
~18MB
~308K SLoC