3 releases (breaking)

0.3.0 Mar 18, 2024
0.2.0 Oct 31, 2022
0.1.0 Oct 10, 2022

#246 in Biology

21 downloads per month

MIT/Apache

25KB
438 lines

ffforf

ffforf is a Rust library and command-line tool for identifying open reading frames. It uses the jetscii crate for efficient searching of stop codons, needletail for fast FASTA parsing, and translates the ORFs into amino acid sequences.

Installation

To use the ORF Finder library in your Rust project, add it to your Cargo.toml file:

[dependencies]
ffforf = "0.3.0"

To install the ffforf binary, just cargo install ffforf

Running the binary

To run

ffforf genome.fna > translated_sequences.faa

Output looks like:

>Chr18_rc_2_3557_3872
ISTNLCTFLCSDTEFTPRVTNAKDSDTFDGILTLNNRQKHAERIAYNRGAGSGIGGGRGPGRPPITEIPLEELLACEEPEAKAARTRRRGATLALTALGRYIFN

Which is the landmark (contig/chromosome), whether reverse complement or not, reading frame, and start and end genomic coordinates.

Note

  • It does not fail gracefully, but if more people use it I will add in more command line arguments, help messages, etc...

  • Min ORF size is 50, can be changed by altering the source code, for now. Please open an issue and I'll fix it right away.

  • Unknown sequence gets read through, such as TTFLYLNYIITXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXVIGSYKEHFSVSTRDKPHVTKGRKERCNGNRITYYVIQNNFPALPTVSILYSLFQTQMIGRKNFA

Using the Library

To use the ORF Finder library in your Rust code, import the crate and use the find_all_orfs function:

use orf_finder::{find_all_orfs, Orf, Strand};

fn main() {
    let sequence = b"ATGCTAGTAACTAGCGTAA";
    let min_orf_length = 5;
    let orfs = find_all_orfs(sequence, min_orf_length);

    for orf in orfs {
        println!(
            "ORF: Start: {}, End: {}, Strand: {}, Reading Frame: {}",
            orf.start, orf.end, orf.strand, orf.reading_frame
        );
    }
}

You can also check out src/bin/ffforf.rs for more potential usage.

License

This project is licensed under the MIT License. A copy can typically be found easily...

Dependencies

~3MB
~55K SLoC