2 unstable releases
0.2.0 | Jun 24, 2024 |
---|---|
0.1.0 | Nov 10, 2022 |
#104 in Biology
20KB
93 lines
seqdupes
Removes duplicates from FASTA files. Supports filtering based on sequence content or header information.
Installation
Source
Download the source code and run:
cargo install
Usage
Run seqdupes
to process FASTA files. You can specify whether to filter by sequence or by header.
Filtering by Sequence (default)
seqdupes -f path/to/sequence.fastq -j path/to/output.json > no_dupes.fas
Filtering by Header
If you prefer to filter duplicates based on headers rather than sequences, use the --by-header
flag.
seqdupes -f path/to/sequence.fastq -j path/to/output.json --by-header > no_dupes.fas
Arguments
Parameter | Default | Description |
---|---|---|
-f, --fasta | - | The path to the FASTQ file to use. |
-j, --json | - | The output path for listing duplicates. |
-b, --by-header | - | Enables filtering based on headers (optional). |
The tool outputs a FASTA file with duplicates removed to stdout
and a JSON file containing details of the duplicates to the specified path.
Dependencies
~24–37MB
~590K SLoC