#gtf-gff #gtf #gff #transcript-assembly

app tuni

Unify transcripts across different samples

2 releases

0.1.1 Jun 9, 2024
0.1.0 Jun 8, 2024

#1 in #gff

MIT license

36KB
595 lines

tuni

crates ci

The goal of tuni is to unify transcripts across different samples.

Overview

Transcript assembly tools can generate arbitary transcript IDs, which may lead to the same transcript being labelled with a different ID across samples.

For example, given two samples sample_1.gtf and sample_2.gtf:

sample_1.gtf

chr1 test transcript 1 100 . + . transcript_id "A"; 
chr1 test exon 1 40 . + . transcript_id "A"; 
chr1 test exon 50 100 . + . transcript_id "A";
--snip-- 

sample_2.gtf

chr1 test transcript 1 100 . + . transcript_id "B"; 
chr1 test exon 1 40 . + . transcript_id "B"; 
chr1 test exon 50 100 . + . transcript_id "B";
--snip-- 

The transcript displayed above is identical between the two samples, however the provided transcript_id is different for each sample, "A" vs "B".

tuni generates a .tuni.gtf/.tuni.gff for each input .gtf/.gff. These output files will contain an additional attribute field tuni_id which contains a unified ID that will be same for identical transcripts across different samples.

sample_1.tuni.gtf

chr1 test transcript 1 100 . + . transcript_id "A"; tuni_id "tuni_0";
chr1 test exon 1 40 . + . transcript_id "A"; tuni_id "tuni_0";
chr1 test exon 50 100 . + . transcript_id "A"; tuni_id "tuni_0";
--snip-- 

sample_2.tuni.gtf

chr1 test transcript 1 100 . + . transcript_id "B"; tuni_id "tuni_0";
chr1 test exon 1 40 . + . transcript_id "B"; tuni_id "tuni_0";
chr1 test exon 50 100 . + . transcript_id "B"; tuni_id "tuni_0";
--snip-- 

Installation

Binary

Download the latest binary for Linux or macOS (ARM) from releases.

Cargo

Install Rust then run:

cargo install tuni

Usage

Usage: tuni [OPTIONS] --gtf-gff-path <*.txt> --output-dir </output/dir/>

Options:
  -g, --gtf-gff-path <*.txt>       A text file containing GTF/GFF paths
  -o, --output-dir </output/dir/>  Directory where outputted GTF/GFFs will be stored
  -v, --verbose                    Print log messages
  -h, --help                       Print help
  -V, --version                    Print version

Note: currently, only version 2 .gff files are accepted by tuni.

Dependencies

~1.4–2MB
~38K SLoC