1 unstable release
0.1.1 | Aug 26, 2024 |
---|
#70 in Biology
37KB
778 lines
Background
FORGe [Pritt2018] is a model and a software tool for variant prioritisation and filtration to be included in the pangenome reference. It scores each variant's "expected positive and negative impacts on alignment accuracy and computational overhead" based on population frequency, graph repetitiveness, and/or variant proximity. Variants are then ranked by these scores, and a fraction of them is used to augment the reference genome.
FORGe implementation is, by designed, compatible with HISAT2 or Bowtie workflows and cannot be integrated into other graph construction workflows, such as PGGB or vg out of the box. It also requires the input file describing the variants to be in 1ksnp format which is not as pervasive and straightforward as VCF and imposes an extra step to convert VCF to 1ksnp.
The final ranking file generated by rank.py
is not in a standard format
either, such as a sorted or filtered VCF file. This is where forgers
comes
into play providing the necessary logic to incorporate the FORGe model into
broader workflows.
Introduction
This tool, named forgers
(short for forge-rs
), aims to apply FORGe model to input
VCF files and support VCF manipulation operations based on FORGe ranking. One of
the design decision for forgers
is to work seamlessly work with tools such as
bcftools
enabling the user to pipe VCF output of these tools to forgers
or
vice versa to create a more complex variant filtration pipeline.
Usage
Currently, forgers supports two subcommands: filter
, and resolve
.
Filter
Filter and/or annotate VCF records based on FORGe ranking
USAGE:
forgers filter [FLAGS] [OPTIONS] [input]
FLAGS:
-a, --annotate Annotate the filtered records with FORGe rank
-g, --gzip Gzip output, detected by file extension by default
-h, --help Prints help information
-V, --version Prints version information
-v, --verbose Enable verbose mode
OPTIONS:
-f, --forge-rank <forge-rank> FORGe rank file [default: ordered.txt]
-k, --info-key <info-key> Annotate key for INFO field [default: FORGE]
-o, --output <output> Output file, stdout if not specified [default: -]
-t, --top <top> Top fraction of records to keep, keeps all by default [default: 1.0]
ARGS:
<input> Input VCF file, stdin if not specified [default: -]
Resolve
Resolve overlapping variants based on FORGe ranking; i.e. remove a cluster of variants when they are conflicting and replace them with one with higher ranking. It considers the phasing information when available to determine whether two overlapping variants are co-occurrent in any sample.
USAGE:
forgers resolve [FLAGS] [OPTIONS] [input]
FLAGS:
-g, --gzip Gzip output, detected by file extension by default
-h, --help Prints help information
-V, --version Prints version information
-v, --verbose Enable verbose mode
OPTIONS:
-f, --forge-rank <forge-rank> FORGe rank file [default: ordered.txt]
-o, --output <output> Output file, stdout if not specified [default: -]
ARGS:
<input> Input VCF file, stdin if not specified [default: -]
Dependencies
~6.5MB
~123K SLoC