10 releases (6 breaking)
0.9.1 | Sep 2, 2024 |
---|---|
0.9.0 | Sep 2, 2024 |
0.8.0 | Jun 28, 2024 |
0.7.3 | Jun 17, 2024 |
0.3.0 | Jun 25, 2023 |
#249 in Biology
94 downloads per month
Used in lightmotif-py
1.5MB
4.5K
SLoC
🎼🧬 lightmotif-tfmpvalue
A Rust port of the TFMPvalue algorithm for the lightmotif
crate..
🗺️ Overview
TFMPvalue is an algorithm proposed by Touzet & Varré[1] for computing a p-value from a score obtained with a position weight matrix. It uses discretization to compute an approximation of the score distribution for the position weight matrix, iterating with growing levels of accuracy until convergence is reached. This approach outperforms dynamic-programming based methods such as LazyDistrib by Beckstette et al.[2].
lightmotif-tfmpvalue
provides an implementation of the TFMPvalue algorithm
to use with position weight matrices from the lightmotif
crate.
💡 Example
Use lightmotif
to create a position specific scoring matrix, and then use
the TFMPvalue algorithm to compute the exact P-value for a given score, or
a score threshold for a given P-value:
extern crate lightmotif;
extern crate lightmotif_tfmpvalue;
use lightmotif::pwm::CountMatrix;
use lightmotif::abc::Dna;
use lightmotif::seq::EncodedSequence;
use lightmotif_tfmpvalue::TfmPvalue;
// Use a ScoringMatrix from `lightmotif`
let pssm = CountMatrix::<Dna>::from_sequences(&[
EncodedSequence::encode("GTTGACCTTATCAAC").unwrap(),
EncodedSequence::encode("GTTGATCCAGTCAAC").unwrap(),
])
.unwrap()
.to_freq(0.25)
.to_scoring(None);
// Initialize the TFMPvalue algorithm for the given PSSM
// (the `pssm` reference must outlive `tfmp`).
let mut tfmp = TfmPvalue::new(&pssm);
// Compute the exact p-value for a given score
let pvalue = tfmp.pvalue(19.3);
assert_eq!(pvalue, 1.4901161193847656e-08);
// Compute the exact score for a given p-value
let score = tfmp.score(pvalue);
assert_eq!(score, 19.3);
Note that in the example above, the computation is not bounded, so for certain
particular matrices the algorithm may require a large amount of memory to
converge. Use the TfmPvalue::approximate_pvalue
and TfmPvalue::approximate_score
methods to obtain an iterator over the algorithm iterations, allowing you to stop at
any given time based on external criterion such as total memory usage.
💭 Feedback
⚠️ Issue Tracker
Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.
📋 Changelog
This project adheres to Semantic Versioning and provides a changelog in the Keep a Changelog format.
⚖️ License
This library is provided under the open-source GNU General Public License v3.0. The original TFMPvalue implementation was written by the BONSAI team of CRISTaL, Université de Lille and is available under the terms of the GNU General Public License v2.0.
This project is in no way not affiliated, sponsored, or otherwise endorsed by the original TFMPvalue authors. It was developed by Martin Larralde during his PhD project at the European Molecular Biology Laboratory in the Zeller team.
📚 References
- Touzet, Hélène and Jean-Stéphane Varré. ‘Efficient and accurate P-value computation for Position Weight Matrices’. Algorithms for Molecular Biology 2, 1–12 (2007). doi:10.1186/1748-7188-2-15.
- Beckstette, Michael, Robert Homann, and Robert Giegerich. ‘Fast index based algorithms and software for matching position specific scoring matrices’. BMC Bioinformatics 7, 389 (2006). doi:10.1186/1471-2105-7-389.